ubelt.util_download module

Helpers for downloading data

ubelt.util_download.download(url, fpath=None, hash_prefix=None, hasher='sha512', chunksize=8192, verbose=1)[source]

downloads a url to a fpath.

Parameters:
  • url (str) – The url to download.
  • fpath (PathLike | io.BytesIOtringIO) – The path to download to. Defaults to basename of url and ubelt’s application cache. If this is a io.BytesIO object then information is directly written to this object (note this prevents the use of temporary files).
  • hash_prefix (None or str) – If specified, download will retry / error if the file hash does not match this value. Defaults to None.
  • hasher (str or Hasher) – If hash_prefix is specified, this indicates the hashing algorithm to apply to the file. Defaults to sha512.
  • chunksize (int) – Download chunksize. Defaults to 2 ** 13.
  • verbose (int) – Verbosity level 0 or 1. Defaults to 1.
Returns:

fpath - file path string

Return type:

PathLike

Raises:
  • URLError - if there is problem downloading the url
  • RuntimeError - if the hash does not match the hash_prefix

Notes

Original code taken from pytorch in torch/utils/model_zoo.py and slightly modified.

References

http://blog.moleculea.com/2012/10/04/urlretrieve-progres-indicator/ http://stackoverflow.com/questions/15644964/python-progress-bar-and-downloads http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py

CommandLine:
python -m xdoctest ubelt.util_download download:1

Example

>>> # xdoctest: +REQUIRES(--network)
>>> from ubelt.util_download import *  # NOQA
>>> url = 'http://i.imgur.com/rqwaDag.png'
>>> fpath = download(url)
>>> print(basename(fpath))
rqwaDag.png

Example

>>> # xdoctest: +REQUIRES(--network)
>>> import ubelt as ub
>>> import io
>>> url = 'http://i.imgur.com/rqwaDag.png'
>>> file = io.BytesIO()
>>> fpath = download(url, file)
>>> file.seek(0)
>>> data = file.read()
>>> assert ub.hash_data(data, hasher='sha1').startswith('f79ea24571')

Example

>>> # xdoctest: +REQUIRES(--network)
>>> url = 'http://i.imgur.com/rqwaDag.png'
>>> fpath = download(url, hasher='sha1', hash_prefix='f79ea24571da6ddd2ba12e3d57b515249ecb8a35')
Downloading url='http://i.imgur.com/rqwaDag.png' to fpath=...rqwaDag.png
...
...1233/1233... rate=... Hz, eta=..., total=..., wall=...

Example

>>> # xdoctest: +REQUIRES(--network)
>>> # test download from girder
>>> import pytest
>>> import ubelt as ub
>>> url = 'https://data.kitware.com/api/v1/item/5b4039308d777f2e6225994c/download'
>>> ub.download(url, hasher='sha512', hash_prefix='c98a46cb31205cf')
>>> with pytest.raises(RuntimeError):
>>>     ub.download(url, hasher='sha512', hash_prefix='BAD_HASH')
ubelt.util_download.grabdata(url, fpath=None, dpath=None, fname=None, redo=False, verbose=1, appname=None, hash_prefix=None, hasher='sha512', **download_kw)[source]

Downloads a file, caches it, and returns its local path.

Parameters:
  • url (str) – url to the file to download
  • fpath (PathLike) – The full path to download the file to. If unspecified, the arguments dpath and fname are used to determine this.
  • dpath (PathLike) – where to download the file. If unspecified appname is used to determine this. Mutually exclusive with fpath.
  • fname (str) – What to name the downloaded file. Defaults to the url basename. Mutually exclusive with fpath.
  • redo (bool) – if True forces redownload of the file (default = False)
  • verbose (bool) – verbosity flag (default = True)
  • appname (str) – set dpath to ub.get_app_cache_dir(appname). Mutually exclusive with dpath and fpath.
  • hash_prefix (None or str) – If specified, grabdata verifies that this matches the hash of the file, and then saves the hash in a adjacent file to certify that the download was successful. Defaults to None.
  • hasher (str or Hasher) – If hash_prefix is specified, this indicates the hashing algorithm to apply to the file. Defaults to sha512.
  • **download_kw – additional kwargs to pass to ub.download
Returns:

fpath - file path string

Return type:

PathLike

Example

>>> # xdoctest: +REQUIRES(--network)
>>> import ubelt as ub
>>> url = 'http://i.imgur.com/rqwaDag.png'
>>> fpath = ub.grabdata(url, fname='mario.png')
>>> result = basename(fpath)
>>> print(result)
mario.png

Example

>>> # xdoctest: +REQUIRES(--network)
>>> import ubelt as ub
>>> fname = 'foo.bar'
>>> url = 'http://i.imgur.com/rqwaDag.png'
>>> prefix1 = '944389a39dfb8fa9'
>>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1)
>>> stamp_fpath = fpath + '.hash'
>>> assert open(stamp_fpath, 'r').read() == prefix1
>>> # Check that the download doesn't happen again
>>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1)
>>> # todo: check file timestamps have not changed
>>> #
>>> # Check redo works with hash
>>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1, redo=True)
>>> # todo: check file timestamps have changed
>>> #
>>> # Check that a redownload occurs when the stamp is changed
>>> open(stamp_fpath, 'w').write('corrupt-stamp')
>>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1)
>>> assert open(stamp_fpath, 'r').read() == prefix1
>>> #
>>> # Check that a redownload occurs when the stamp is removed
>>> ub.delete(stamp_fpath)
>>> open(fpath, 'w').write('corrupt-data')
>>> assert not ub.hash_file(fpath, base='hex', hasher='sha512').startswith(prefix1)
>>> fpath = ub.grabdata(url, fname=fname, hash_prefix=prefix1)
>>> assert ub.hash_file(fpath, base='hex', hasher='sha512').startswith(prefix1)
>>> #
>>> # Check that requesting new data causes redownload
>>> url2 = 'https://data.kitware.com/api/v1/item/5b4039308d777f2e6225994c/download'
>>> prefix2 = 'c98a46cb31205cf'
>>> fpath = ub.grabdata(url2, fname=fname, hash_prefix=prefix2)
>>> assert open(stamp_fpath, 'r').read() == prefix2