ubelt.util_hash module¶

Wrappers around hashlib functions to generate hash signatures for common data.

The hashes should be determenistic across platforms.

Note

The exact hashes generated for data object and files may change in the future. When this happens the HASH_VERSION attribute will be incremented.

ubelt.util_hash.hash_data(data, hasher=NoParam, hashlen=NoParam, base=NoParam)[source]¶

Get a unique hash depending on the state of the data.

Parameters:	data (object) – any sort of loosely organized data hasher (HASH) – hash algorithm from hashlib, defaults to sha512. hashlen (int) – maximum number of symbols in the returned hash. If not specified, all are returned. base (list) – list of symbols or shorthand key. Defaults to base 26
Returns:	text - hash string
Return type:	str

Example

>>> print(hash_data([1, 2, (3, '4')], hashlen=8, hasher='sha512'))
iugjngof

frqkjbsq

ubelt.util_hash.hash_file(fpath, blocksize=65536, stride=1, hasher=NoParam, hashlen=NoParam, base=NoParam)[source]¶

Hashes the data in a file on disk.

Parameters:

fpath (str) – file path string
blocksize (int) – 2 ** 16. Affects speed of reading file
stride (int) – strides > 1 skip data to hash, useful for faster hashing, but less accurate, also makes hash dependant on blocksize.
hasher (HASH) – hash algorithm from hashlib, defaults to sha512.
hashlen (int) – maximum number of symbols in the returned hash. If not specified, all are returned.
base (list) – list of symbols or shorthand key. Defaults to base 26

Notes

For better hashes keep stride = 1 For faster hashes set stride > 1 blocksize matters when stride > 1

References

http://stackoverflow.com/questions/3431825/md5-checksum-of-a-file http://stackoverflow.com/questions/5001893/when-to-use-sha-1-vs-sha-2

Example

>>> import ubelt as ub
>>> from os.path import join
>>> fpath = join(ub.ensure_app_cache_dir('ubelt'), 'tmp.txt')
>>> ub.writeto(fpath, 'foobar')
>>> print(ub.hash_file(fpath, hasher='sha512', hashlen=8))
vkiodmcj