ubelt.util_download module¶
Helpers for downloading data
-
ubelt.util_download.
download
(url, fpath=None, hash_prefix=None, hasher='sha512', chunksize=8192, verbose=1)[source]¶ downloads a url to a fpath.
Parameters: - url (str) – The url to download.
- fpath (PathLike | io.BytesIOtringIO) – The path to download to. Defaults to basename of url and ubelt’s application cache. If this is a io.BytesIO object then information is directly written to this object (note this prevents the use of temporary files).
- hash_prefix (None or str) – If specified, download will retry / error if the file hash does not match this value. Defaults to None.
- hasher (str or Hasher) – If hash_prefix is specified, this indicates the hashing algorithm to apply to the file. Defaults to sha512.
- chunksize (int) – Download chunksize. Defaults to 2 ** 13.
- verbose (int) – Verbosity level 0 or 1. Defaults to 1.
Returns: fpath - file path string
Return type: PathLike
Raises: - URLError - if there is problem downloading the url
- RuntimeError - if the hash does not match the hash_prefix
Notes
Original code taken from pytorch in torch/utils/model_zoo.py and slightly modified.
References
http://blog.moleculea.com/2012/10/04/urlretrieve-progres-indicator/ http://stackoverflow.com/questions/15644964/python-progress-bar-and-downloads http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py
- CommandLine:
- python -m xdoctest ubelt.util_download download:1
Example
>>> # xdoctest: +REQUIRES(--network) >>> from ubelt.util_download import * # NOQA >>> url = 'http://i.imgur.com/rqwaDag.png' >>> fpath = download(url) >>> print(basename(fpath)) rqwaDag.png
Example
>>> # xdoctest: +REQUIRES(--network) >>> import ubelt as ub >>> import io >>> url = 'http://i.imgur.com/rqwaDag.png' >>> file = io.BytesIO() >>> fpath = download(url, file) >>> file.seek(0) >>> data = file.read() >>> assert ub.hash_data(data, hasher='sha1').startswith('f79ea24571')
Example
>>> # xdoctest: +REQUIRES(--network) >>> url = 'http://i.imgur.com/rqwaDag.png' >>> fpath = download(url, hasher='sha1', hash_prefix='f79ea24571da6ddd2ba12e3d57b515249ecb8a35') Downloading url='http://i.imgur.com/rqwaDag.png' to fpath=...rqwaDag.png ... ...1233/1233... rate=... Hz, eta=..., total=..., wall=...
Example
>>> # xdoctest: +REQUIRES(--network) >>> # test download from girder >>> import pytest >>> import ubelt as ub >>> url = 'https://data.kitware.com/api/v1/item/5b4039308d777f2e6225994c/download' >>> ub.download(url, hasher='sha512', hash_prefix='c98a46cb31205cf') >>> with pytest.raises(RuntimeError): >>> ub.download(url, hasher='sha512', hash_prefix='BAD_HASH')
-
ubelt.util_download.
grabdata
(url, fpath=None, dpath=None, fname=None, redo=False, verbose=1, appname=None, hash_prefix=None, hasher='sha512', **download_kw)[source]¶ Downloads a file, caches it, and returns its local path.
Parameters: - url (str) – url to the file to download
- fpath (PathLike) – The full path to download the file to. If unspecified, the arguments dpath and fname are used to determine this.
- dpath (PathLike) – where to download the file. If unspecified appname is used to determine this. Mutually exclusive with fpath.
- fname (str) – What to name the downloaded file. Defaults to the url basename. Mutually exclusive with fpath.
- redo (bool) – if True forces redownload of the file (default = False)
- verbose (bool) – verbosity flag (default = True)
- appname (str) – set dpath to ub.get_app_cache_dir(appname). Mutually exclusive with dpath and fpath.
- hash_prefix (None or str) – If specified, grabdata verifies that this matches the hash of the file, and then saves the hash in a adjacent file to certify that the download was successful. Defaults to None.
- hasher (str or Hasher) – If hash_prefix is specified, this indicates the hashing algorithm to apply to the file. Defaults to sha512.
- **download_kw – additional kwargs to pass to ub.download
Returns: fpath - file path string
Return type: PathLike
Example
>>> # xdoctest: +REQUIRES(--network) >>> import ubelt as ub >>> url = 'http://i.imgur.com/rqwaDag.png' >>> fpath = ub.grabdata(url, fname='mario.png') >>> result = basename(fpath) >>> print(result) mario.png
Example
>>> # xdoctest: +REQUIRES(--network) >>> import ubelt as ub >>> fname = 'foo.bar' >>> url = 'http://i.imgur.com/rqwaDag.png' >>> prefix1 = '944389a39dfb8fa9' >>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1) >>> stamp_fpath = fpath + '.hash' >>> assert open(stamp_fpath, 'r').read() == prefix1 >>> # Check that the download doesn't happen again >>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1) >>> # todo: check file timestamps have not changed >>> # >>> # Check redo works with hash >>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1, redo=True) >>> # todo: check file timestamps have changed >>> # >>> # Check that a redownload occurs when the stamp is changed >>> open(stamp_fpath, 'w').write('corrupt-stamp') >>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1) >>> assert open(stamp_fpath, 'r').read() == prefix1 >>> # >>> # Check that a redownload occurs when the stamp is removed >>> ub.delete(stamp_fpath) >>> open(fpath, 'w').write('corrupt-data') >>> assert not ub.hash_file(fpath, base='hex', hasher='sha512').startswith(prefix1) >>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1) >>> assert ub.hash_file(fpath, base='hex', hasher='sha512').startswith(prefix1) >>> # >>> # Check that requesting new data causes redownload >>> url2 = 'https://data.kitware.com/api/v1/item/5b4039308d777f2e6225994c/download' >>> prefix2 = 'c98a46cb31205cf' >>> fpath = ub.grabdata(url2, fname=fname, hash_prefix=prefix2) >>> assert open(stamp_fpath, 'r').read() == prefix2