ubelt.util_cache module

class ubelt.util_cache.Cacher(fname, cfgstr=None, dpath=None, appname='ubelt', ext='.pkl', meta=None, verbose=None, enabled=True, log=None, protocol=2)[source]

Bases: object

Cacher designed to be quickly integrated into existing scripts.

Parameters:
  • fname (str) – A file name. This is the prefix that will be used by the cache. It will alwasys be used as-is.
  • cfgstr (str) – indicates the state. Either this string or a hash of this string will be used to identify the cache. A cfgstr should always be reasonably readable, thus it is good practice to hash extremely detailed cfgstrs to a reasonable readable level. Use meta to store make original details persist.
  • dpath (str) – Specifies where to save the cache. If unspecified, Cacher defaults to an application resource dir as given by appname.
  • appname (str) – application name (default = ‘ubelt’) Specifies a folder in the application resource directory where to cache the data if dpath is not specified.
  • ext (str) – extension (default = ‘.pkl’)
  • meta (object) – cfgstr metadata that is also saved with the cfgstr. This data is not used in the hash, but if useful to send in if the cfgstr itself contains hashes.
  • verbose (int) – level of verbosity. Can be 1, 2 or 3. (default=1)
  • enabled (bool) – if set to False, then the load and save methods will do nothing. (default = True)
  • log (func) – overloads the print function. Useful for sending output to loggers (e.g. logging.info, tqdm.tqdm.write, …)
  • protocol (int) – protocol version used by pickle. If python 2 compatibility is not required, then it is better to use protocol 4. (default=2)
CommandLine:
python -m ubelt.util_cache Cacher

Example

>>> import ubelt as ub
>>> cfgstr = 'repr-of-params-that-uniquely-determine-the-process'
>>> # Create a cacher and try loading the data
>>> cacher = ub.Cacher('test_process', cfgstr)
>>> cacher.clear()
>>> data = cacher.tryload()
>>> if data is None:
>>>     # Put expensive functions in if block when cacher misses
>>>     myvar1 = 'result of expensive process'
>>>     myvar2 = 'another result'
>>>     # Tell the cacher to write at the end of the if block
>>>     # It is idomatic to put results in a tuple named data
>>>     data = myvar1, myvar2
>>>     cacher.save(data)
>>> # Last part of the Cacher pattern is to unpack the data tuple
>>> myvar1, myvar2 = data

Example

>>> # The previous example can be shorted if only a single value
>>> from ubelt.util_cache import Cacher
>>> cfgstr = 'repr-of-params-that-uniquely-determine-the-process'
>>> # Create a cacher and try loading the data
>>> cacher = Cacher('test_process', cfgstr)
>>> myvar = cacher.tryload()
>>> if myvar is None:
>>>     myvar = ('result of expensive process', 'another result')
>>>     cacher.save(myvar)
>>> assert cacher.exists(), 'should now exist'
VERBOSE = 1
get_fpath(cfgstr=None)[source]

Reports the filepath that the cacher will use. It will attempt to use ‘{fname}_{cfgstr}{ext}’ unless that is too long. Then cfgstr will be hashed.

Example

>>> from ubelt.util_cache import Cacher
>>> import pytest
>>> with pytest.warns(UserWarning):
>>>     cacher = Cacher('test_cacher1')
>>>     cacher.get_fpath()
>>> self = Cacher('test_cacher2', cfgstr='cfg1')
>>> self.get_fpath()
>>> self = Cacher('test_cacher3', cfgstr='cfg1' * 32)
>>> self.get_fpath()
exists(cfgstr=None)[source]

Check to see if the cache exists

existing_versions()[source]

Returns data with different cfgstr values that were previously computed with this cacher.

Example

>>> from ubelt.util_cache import Cacher
>>> # Ensure that some data exists
>>> known_fnames = set()
>>> cacher = Cacher('versioned_data', cfgstr='1')
>>> cacher.ensure(lambda: 'data1')
>>> known_fnames.add(cacher.get_fpath())
>>> cacher = Cacher('versioned_data', cfgstr='2')
>>> cacher.ensure(lambda: 'data2')
>>> known_fnames.add(cacher.get_fpath())
>>> # List previously computed configs for this type
>>> from os.path import basename
>>> cacher = Cacher('versioned_data', cfgstr='2')
>>> exist_fpaths = set(cacher.existing_versions())
>>> exist_fnames = list(map(basename, exist_fpaths))
>>> print(exist_fnames)
>>> assert exist_fpaths == known_fnames

[‘versioned_data_1.pkl’, ‘versioned_data_2.pkl’]

clear(cfgstr=None)[source]

Removes the saved cache and metadata from disk

tryload(cfgstr=None, on_error='raise')[source]

Like load, but returns None if the load fails due to a cache miss.

Parameters:on_error (str) – how to handle non-io errors errors. Either raise, which re-raises the exception, or clear which clears the cache and returns None.
load(cfgstr=None)[source]

Example

>>> from ubelt.util_cache import *  # NOQA
>>> # Setting the cacher as enabled=False turns it off
>>> cacher = Cacher('test_disabled_load', '', enabled=True)
>>> cacher.save('data')
>>> assert cacher.load() == 'data'
>>> cacher.enabled = False
>>> assert cacher.tryload() is None
save(data, cfgstr=None)[source]

Writes data to path specified by self.fpath(cfgstr).

Metadata containing information about the cache will also be appended to an adjacent file with the .meta suffix.

Example

>>> from ubelt.util_cache import *  # NOQA
>>> # Normal functioning
>>> cfgstr = 'long-cfg' * 32
>>> cacher = Cacher('test_enabled_save', cfgstr)
>>> cacher.save('data')
>>> assert exists(cacher.get_fpath()), 'should be enabeled'
>>> assert exists(cacher.get_fpath() + '.meta'), 'missing metadata'
>>> # Setting the cacher as enabled=False turns it off
>>> cacher2 = Cacher('test_disabled_save', 'params', enabled=False)
>>> cacher2.save('data')
>>> assert not exists(cacher2.get_fpath()), 'should be disabled'
ensure(func, *args, **kwargs)[source]

Wraps around a function. A cfgstr must be stored in the base cacher.

Parameters:
  • func (callable) – function that will compute data on cache miss
  • *args – passed to func
  • **kwargs – passed to func

Example

>>> from ubelt.util_cache import *  # NOQA
>>> def func():
>>>     return 'expensive result'
>>> fname = 'test_cacher_ensure'
>>> cfgstr = 'func params'
>>> cacher = Cacher(fname, cfgstr)
>>> cacher.clear()
>>> data1 = cacher.ensure(func)
>>> data2 = cacher.ensure(func)
>>> assert data1 == 'expensive result'
>>> assert data1 == data2
>>> cacher.clear()

Example

>>> from ubelt.util_cache import *  # NOQA
>>> @Cacher(fname, cfgstr).ensure
>>> def func():
>>>     return 'expensive result'