ubelt.util_zip module

Abstractions for working with zipfiles and archives

This may be renamed to util_archive in the future.

The ubelt.split_archive() works with paths that reference a file inside of an archive (e.g. a zipfile). It splits it into two parts, the full path to the archive and then the path to the file inside of the archive. By convention these are separated with either a pathsep or a colon.

The ubelt.zopen() works to open a file that lives inside of an archive without the user needing to worry about extracting it first. When possible it will read it directly from the archive, but in some cases it may extract it to a temporary directory first.

class ubelt.util_zip.zopen(fpath, mode='r', seekable=False, ext='.zip')[source]

Bases: ubelt.util_mixins.NiceRepr

An abstraction of the normal open() function that can also handle reading data directly inside of zipfiles.

This is a file-object like interface [FileObj] — i.e. it supports the read and write methods to an underlying resource.

Can open a file normally or open a file within a zip file (readonly). Tries to read from memory only, but will extract to a tempfile if necessary.

Just treat the zipfile like a directory, e.g. /path/to/myzip.zip/compressed/path.txt OR? e.g. /path/to/myzip.zip:compressed/path.txt

References

FileObj

https://docs.python.org/3/glossary.html#term-file-object

Todo

  • [ ] Fast way to open a base zipfile, query what is inside, and

    then choose a file to further zopen (and passing along the same open zipfile reference maybe?).

  • [ ] Write mode in some restricted setting?

Parameters
  • fpath (str | PathLike) – path to a file, or a special path that denotes both a path to a zipfile and a path to a archived file inside of the zipfile.

  • mode (str) – Currently only “r” - readonly mode is supported

  • seekable (bool) – If True, attempts to force “seekability” of the underlying file-object, for compressed files this will first extract the file to a temporary location on disk. If False, any underlying compressed file will be opened directly which may result in the object being non-seekable.

  • ext (str) – The extension of the zipfile. Modify this is a non-standard extension is used (e.g. for torch packages).

Example

>>> from ubelt.util_zip import *  # NOQA
>>> import pickle
>>> import ubelt as ub
>>> dpath = ub.ensure_app_cache_dir('ubelt/tests/util_zip')
>>> dpath = ub.Path(dpath)
>>> data_fpath = dpath / 'test.pkl'
>>> data = {'demo': 'data'}
>>> with open(str(data_fpath), 'wb') as file:
>>>     pickle.dump(data, file)
>>> # Write data
>>> import zipfile
>>> zip_fpath = dpath / 'test_zip.archive'
>>> stl_w_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='w')
>>> stl_w_zfile.write(os.fspath(data_fpath), os.fspath(data_fpath.relative_to(dpath)))
>>> stl_w_zfile.close()
>>> stl_r_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='r')
>>> stl_r_zfile.namelist()
>>> stl_r_zfile.close()
>>> # Test zopen
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> print(self._split_archive())
>>> print(self.namelist())
>>> self.close()
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> recon1 = pickle.loads(self.read())
>>> self.close()
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> recon2 = pickle.load(self)
>>> self.close()
>>> assert recon1 == recon2
>>> assert recon1 is not recon2

Example

>>> # Test we can load json data from a zipfile
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import json
>>> import zipfile
>>> dpath = ub.ensure_app_cache_dir('ubelt/tests/util_zip')
>>> infopath = join(dpath, 'info.json')
>>> ub.writeto(infopath, '{"x": "1"}')
>>> zippath = join(dpath, 'infozip.zip')
>>> internal = 'folder/info.json'
>>> with zipfile.ZipFile(zippath, 'w') as myzip:
>>>     myzip.write(infopath, internal)
>>> fpath = zippath + '/' + internal
>>> # Test context manager
>>> with zopen(fpath, 'r') as self:
>>>     info2 = json.load(self)
>>>     assert info2['x'] == '1'
>>> # Test outside of context manager
>>> self = zopen(fpath, 'r')
>>> print(self._split_archive())
>>> info2 = json.load(self)
>>> assert info2['x'] == '1'
>>> # Test nice repr (with zfile)
>>> print('self = {!r}'.format(self))
>>> self.close()

Example

>>> # Coverage tests --- move to unit-test
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import json
>>> import zipfile
>>> dpath = ub.ensure_app_cache_dir('ubelt/tests/util_zip')
>>> textpath = join(dpath, 'seekable_test.txt')
>>> text = chr(10).join(['line{}'.format(i) for i in range(10)])
>>> ub.writeto(textpath, text)
>>> zippath = join(dpath, 'seekable_test.zip')
>>> internal = 'folder/seekable_test.txt'
>>> with zipfile.ZipFile(zippath, 'w') as myzip:
>>>     myzip.write(textpath, internal)
>>> ub.delete(textpath)
>>> fpath = zippath + '/' + internal
>>> # Test seekable
>>> self_seekable = zopen(fpath, 'r', seekable=True)
>>> assert self_seekable.seekable()
>>> self_seekable.seek(8)
>>> assert self_seekable.readline() == 'ne1' + chr(10)
>>> assert self_seekable.readline() == 'line2' + chr(10)
>>> self_seekable.seek(8)
>>> assert self_seekable.readline() == 'ne1' + chr(10)
>>> assert self_seekable.readline() == 'line2' + chr(10)
>>> # Test non-seekable?
>>> # Sometimes non-seekable files are still seekable
>>> maybe_seekable = zopen(fpath, 'r', seekable=False)
>>> if maybe_seekable.seekable():
>>>     maybe_seekable.seek(8)
>>>     assert maybe_seekable.readline() == 'ne1' + chr(10)
>>>     assert maybe_seekable.readline() == 'line2' + chr(10)
>>>     maybe_seekable.seek(8)
>>>     assert maybe_seekable.readline() == 'ne1' + chr(10)
>>>     assert maybe_seekable.readline() == 'line2' + chr(10)

Example

>>> # More coverage tests --- move to unit-test
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import pytest
>>> dpath = ub.ensure_app_cache_dir('ubelt/tests/util_zip')
>>> with pytest.raises(OSError):
>>>     self = zopen('', 'r')
>>> # Test open non-zip exsting file
>>> existing_fpath = join(dpath, 'exists.json')
>>> ub.writeto(existing_fpath, '{"x": "1"}')
>>> self = zopen(existing_fpath, 'r')
>>> assert self.read() == '{"x": "1"}'
>>> # Test dir
>>> dir(self)
>>> # Test nice
>>> print(self)
>>> print('self = {!r}'.format(self))
>>> self.close()
>>> # Test open non-zip non-existing file
>>> nonexisting_fpath = join(dpath, 'does-not-exist.txt')
>>> ub.delete(nonexisting_fpath)
>>> with pytest.raises(OSError):
>>>     self = zopen(nonexisting_fpath, 'r')
>>> with pytest.raises(NotImplementedError):
>>>     self = zopen(nonexisting_fpath, 'w')
>>> # Test nice-repr
>>> self = zopen(existing_fpath, 'r')
>>> print('self = {!r}'.format(self))
>>> # pathological
>>> self = zopen(existing_fpath, 'r')
>>> self._handle = None
>>> dir(self)
property zfile

Access the underlying archive file

namelist()[source]

Lists the contents of this zipfile

ubelt.util_zip.split_archive(fpath, ext='.zip')[source]

If fpath specifies a file inside a zipfile, it breaks it into two parts the path to the zipfile and the internal path in the zipfile.

Example

>>> split_archive('/a/b/foo.txt')
>>> split_archive('/a/b/foo.zip/bar.txt')
>>> split_archive('/a/b/foo.zip/baz/biz.zip/bar.py')
>>> split_archive('archive.zip')
>>> import ubelt as ub
>>> split_archive(ub.Path('/a/b/foo.zip/baz/biz.zip/bar.py'))
>>> split_archive('/a/b/foo.zip/baz.pt/bar.zip/bar.zip', '.pt')

Todo

Fix got/want for win32

(None, None) (‘/a/b/foo.zip’, ‘bar.txt’) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘archive.zip’, None) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘/a/b/foo.zip/baz.pt’, ‘bar.zip/bar.zip’)