ubelt.util_cache module¶
-
class
ubelt.util_cache.
Cacher
(fname, cfgstr=None, dpath=None, appname='ubelt', ext='.pkl', meta=None, verbose=None, enabled=True, log=None, protocol=2)[source]¶ Bases:
object
Cacher designed to be quickly integrated into existing scripts.
Parameters: - fname (str) – A file name. This is the prefix that will be used by the cache. It will alwasys be used as-is.
- cfgstr (str) – indicates the state. Either this string or a hash of this string will be used to identify the cache. A cfgstr should always be reasonably readable, thus it is good practice to hash extremely detailed cfgstrs to a reasonable readable level. Use meta to store make original details persist.
- dpath (str) – Specifies where to save the cache. If unspecified, Cacher defaults to an application resource dir as given by appname.
- appname (str) – application name (default = ‘ubelt’) Specifies a folder in the application resource directory where to cache the data if dpath is not specified.
- ext (str) – extension (default = ‘.pkl’)
- meta (object) – cfgstr metadata that is also saved with the cfgstr. This data is not used in the hash, but if useful to send in if the cfgstr itself contains hashes.
- verbose (int) – level of verbosity. Can be 1, 2 or 3. (default=1)
- enabled (bool) – if set to False, then the load and save methods will do nothing. (default = True)
- log (func) – overloads the print function. Useful for sending output to loggers (e.g. logging.info, tqdm.tqdm.write, …)
- protocol (int) – protocol version used by pickle. If python 2 compatibility is not required, then it is better to use protocol 4. (default=2)
- CommandLine:
- python -m ubelt.util_cache Cacher
Example
>>> import ubelt as ub >>> cfgstr = 'repr-of-params-that-uniquely-determine-the-process' >>> # Create a cacher and try loading the data >>> cacher = ub.Cacher('test_process', cfgstr) >>> cacher.clear() >>> data = cacher.tryload() >>> if data is None: >>> # Put expensive functions in if block when cacher misses >>> myvar1 = 'result of expensive process' >>> myvar2 = 'another result' >>> # Tell the cacher to write at the end of the if block >>> # It is idomatic to put results in a tuple named data >>> data = myvar1, myvar2 >>> cacher.save(data) >>> # Last part of the Cacher pattern is to unpack the data tuple >>> myvar1, myvar2 = data
Example
>>> # The previous example can be shorted if only a single value >>> from ubelt.util_cache import Cacher >>> cfgstr = 'repr-of-params-that-uniquely-determine-the-process' >>> # Create a cacher and try loading the data >>> cacher = Cacher('test_process', cfgstr) >>> myvar = cacher.tryload() >>> if myvar is None: >>> myvar = ('result of expensive process', 'another result') >>> cacher.save(myvar) >>> assert cacher.exists(), 'should now exist'
-
VERBOSE
= 1¶
-
get_fpath
(cfgstr=None)[source]¶ Reports the filepath that the cacher will use. It will attempt to use ‘{fname}_{cfgstr}{ext}’ unless that is too long. Then cfgstr will be hashed.
Example
>>> from ubelt.util_cache import Cacher >>> import pytest >>> with pytest.warns(UserWarning): >>> cacher = Cacher('test_cacher1') >>> cacher.get_fpath() >>> self = Cacher('test_cacher2', cfgstr='cfg1') >>> self.get_fpath() >>> self = Cacher('test_cacher3', cfgstr='cfg1' * 32) >>> self.get_fpath()
-
existing_versions
()[source]¶ Returns data with different cfgstr values that were previously computed with this cacher.
Example
>>> from ubelt.util_cache import Cacher >>> # Ensure that some data exists >>> known_fnames = set() >>> cacher = Cacher('versioned_data', cfgstr='1') >>> cacher.ensure(lambda: 'data1') >>> known_fnames.add(cacher.get_fpath()) >>> cacher = Cacher('versioned_data', cfgstr='2') >>> cacher.ensure(lambda: 'data2') >>> known_fnames.add(cacher.get_fpath()) >>> # List previously computed configs for this type >>> from os.path import basename >>> cacher = Cacher('versioned_data', cfgstr='2') >>> exist_fpaths = set(cacher.existing_versions()) >>> exist_fnames = list(map(basename, exist_fpaths)) >>> print(exist_fnames) >>> assert exist_fpaths == known_fnames
[‘versioned_data_1.pkl’, ‘versioned_data_2.pkl’]
-
tryload
(cfgstr=None, on_error='raise')[source]¶ Like load, but returns None if the load fails due to a cache miss.
Parameters: on_error (str) – how to handle non-io errors errors. Either raise, which re-raises the exception, or clear which clears the cache and returns None.
-
load
(cfgstr=None)[source]¶ Example
>>> from ubelt.util_cache import * # NOQA >>> # Setting the cacher as enabled=False turns it off >>> cacher = Cacher('test_disabled_load', '', enabled=True) >>> cacher.save('data') >>> assert cacher.load() == 'data' >>> cacher.enabled = False >>> assert cacher.tryload() is None
-
save
(data, cfgstr=None)[source]¶ Writes data to path specified by self.fpath(cfgstr).
Metadata containing information about the cache will also be appended to an adjacent file with the .meta suffix.
Example
>>> from ubelt.util_cache import * # NOQA >>> # Normal functioning >>> cfgstr = 'long-cfg' * 32 >>> cacher = Cacher('test_enabled_save', cfgstr) >>> cacher.save('data') >>> assert exists(cacher.get_fpath()), 'should be enabeled' >>> assert exists(cacher.get_fpath() + '.meta'), 'missing metadata' >>> # Setting the cacher as enabled=False turns it off >>> cacher2 = Cacher('test_disabled_save', 'params', enabled=False) >>> cacher2.save('data') >>> assert not exists(cacher2.get_fpath()), 'should be disabled'
-
ensure
(func, *args, **kwargs)[source]¶ Wraps around a function. A cfgstr must be stored in the base cacher.
Parameters: - func (callable) – function that will compute data on cache miss
- *args – passed to func
- **kwargs – passed to func
Example
>>> from ubelt.util_cache import * # NOQA >>> def func(): >>> return 'expensive result' >>> fname = 'test_cacher_ensure' >>> cfgstr = 'func params' >>> cacher = Cacher(fname, cfgstr) >>> cacher.clear() >>> data1 = cacher.ensure(func) >>> data2 = cacher.ensure(func) >>> assert data1 == 'expensive result' >>> assert data1 == data2 >>> cacher.clear()
Example
>>> from ubelt.util_cache import * # NOQA >>> @Cacher(fname, cfgstr).ensure >>> def func(): >>> return 'expensive result'