ubelt.util_zip module

class ubelt.util_zip.zopen(fpath: str | PathLike, mode: str = 'r', seekable: bool = False, ext: str = '.zip')[source]

Bases: NiceRepr

An abstraction of the normal open() function that can also handle reading data directly inside of zipfiles.

This is a file-object like interface [FileObj] — i.e. it supports the read and write methods to an underlying resource.

Can open a file normally or open a file within a zip file (readonly). Tries to read from memory only, but will extract to a tempfile if necessary.

Just treat the zipfile like a directory, e.g. /path/to/myzip.zip/compressed/path.txt OR? e.g. /path/to/myzip.zip:compressed/path.txt

References

Todo

  • [ ] Fast way to open a base zipfile, query what is inside, and

    then choose a file to further zopen (and passing along the same open zipfile reference maybe?).

  • [ ] Write mode in some restricted setting?

Variables:

name (str | PathLike) – path to a file or reference to an item in a zipfile.

Example

>>> from ubelt.util_zip import *  # NOQA
>>> import pickle
>>> import ubelt as ub
>>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir()
>>> dpath = ub.Path(dpath)
>>> data_fpath = dpath / 'test.pkl'
>>> data = {'demo': 'data'}
>>> with open(str(data_fpath), 'wb') as file:
>>>     pickle.dump(data, file)
>>> # Write data
>>> import zipfile
>>> zip_fpath = dpath / 'test_zip.archive'
>>> stl_w_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='w')
>>> stl_w_zfile.write(os.fspath(data_fpath), os.fspath(data_fpath.relative_to(dpath)))
>>> stl_w_zfile.close()
>>> stl_r_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='r')
>>> stl_r_zfile.namelist()
>>> stl_r_zfile.close()
>>> # Test zopen
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> print(self._split_archive())
>>> print(self.namelist())
>>> self.close()
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> recon1 = pickle.loads(self.read())
>>> self.close()
>>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive')
>>> recon2 = pickle.load(self)
>>> self.close()
>>> assert recon1 == recon2
>>> assert recon1 is not recon2

Example

>>> # Test we can load json data from a zipfile
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import json
>>> import zipfile
>>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir()
>>> infopath = join(dpath, 'info.json')
>>> ub.writeto(infopath, '{"x": "1"}')
>>> zippath = join(dpath, 'infozip.zip')
>>> internal = 'folder/info.json'
>>> with zipfile.ZipFile(zippath, 'w') as myzip:
>>>     myzip.write(infopath, internal)
>>> fpath = zippath + '/' + internal
>>> # Test context manager
>>> with zopen(fpath, 'r') as self:
>>>     info2 = json.load(self)
>>>     assert info2['x'] == '1'
>>> # Test outside of context manager
>>> self = zopen(fpath, 'r')
>>> print(self._split_archive())
>>> info2 = json.load(self)
>>> assert info2['x'] == '1'
>>> # Test nice repr (with zfile)
>>> print('self = {!r}'.format(self))
>>> self.close()

Example

>>> # Coverage tests --- move to unit-test
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import json
>>> import zipfile
>>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir()
>>> textpath = join(dpath, 'seekable_test.txt')
>>> text = chr(10).join(['line{}'.format(i) for i in range(10)])
>>> ub.writeto(textpath, text)
>>> zippath = join(dpath, 'seekable_test.zip')
>>> internal = 'folder/seekable_test.txt'
>>> with zipfile.ZipFile(zippath, 'w') as myzip:
>>>     myzip.write(textpath, internal)
>>> ub.delete(textpath)
>>> fpath = zippath + '/' + internal
>>> # Test seekable
>>> self_seekable = zopen(fpath, 'r', seekable=True)
>>> assert self_seekable.seekable()
>>> self_seekable.seek(8)
>>> assert self_seekable.readline() == 'ne1' + chr(10)
>>> assert self_seekable.readline() == 'line2' + chr(10)
>>> self_seekable.seek(8)
>>> assert self_seekable.readline() == 'ne1' + chr(10)
>>> assert self_seekable.readline() == 'line2' + chr(10)
>>> # Test non-seekable?
>>> # Sometimes non-seekable files are still seekable
>>> maybe_seekable = zopen(fpath, 'r', seekable=False)
>>> if maybe_seekable.seekable():
>>>     maybe_seekable.seek(8)
>>>     assert maybe_seekable.readline() == 'ne1' + chr(10)
>>>     assert maybe_seekable.readline() == 'line2' + chr(10)
>>>     maybe_seekable.seek(8)
>>>     assert maybe_seekable.readline() == 'ne1' + chr(10)
>>>     assert maybe_seekable.readline() == 'line2' + chr(10)

Example

>>> # More coverage tests --- move to unit-test
>>> from ubelt.util_zip import *  # NOQA
>>> import ubelt as ub
>>> import pytest
>>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir()
>>> with pytest.raises(OSError):
>>>     self = zopen('', 'r')
>>> # Test open non-zip existing file
>>> existing_fpath = join(dpath, 'exists.json')
>>> ub.writeto(existing_fpath, '{"x": "1"}')
>>> self = zopen(existing_fpath, 'r')
>>> assert self.read() == '{"x": "1"}'
>>> # Test dir
>>> dir(self)
>>> # Test nice
>>> print(self)
>>> print('self = {!r}'.format(self))
>>> self.close()
>>> # Test open non-zip non-existing file
>>> nonexisting_fpath = join(dpath, 'does-not-exist.txt')
>>> ub.delete(nonexisting_fpath)
>>> with pytest.raises(OSError):
>>>     self = zopen(nonexisting_fpath, 'r')
>>> with pytest.raises(NotImplementedError):
>>>     self = zopen(nonexisting_fpath, 'w')
>>> # Test nice-repr
>>> self = zopen(existing_fpath, 'r')
>>> print('self = {!r}'.format(self))
>>> # pathological
>>> self = zopen(existing_fpath, 'r')
>>> self._handle = None
>>> dir(self)
Parameters:
  • fpath (str | PathLike) – path to a file, or a special path that denotes both a path to a zipfile and a path to a archived file inside of the zipfile.

  • mode (str) – Currently only “r” - readonly mode is supported

  • seekable (bool) – If True, attempts to force “seekability” of the underlying file-object, for compressed files this will first extract the file to a temporary location on disk. If False, any underlying compressed file will be opened directly which may result in the object being non-seekable.

  • ext (str) – The extension of the zipfile. Modify this is a non-standard extension is used (e.g. for torch packages).

fpath: str | PathLike
ext: str
name: str | PathLike
mode: str
_zfpath: str | None
_temp_dpath: str | None
_zfile_read: Any | None
_handle: Any | None
property zfile: Any

Access the underlying archive file

namelist() list[str][source]

Lists the contents of this zipfile

_cleanup() None[source]
_split_archive() tuple[str | None, str | None][source]
_open() None[source]

This logic sets the “_handle” to the appropriate backend object such that zopen can behave like a standard IO object.

In read-only mode:
  • If fpath is a normal file, _handle is the standard open object

  • If fpath is a seekable zipfile, _handle is an IOWrapper pointing

    to the internal data

  • If fpath is a non-seekable zipfile, the data is extracted behind

    the scenes and a standard open object to the extracted file is given.

In write mode:
  • NotImpelemented

ubelt.util_zip.split_archive(fpath: str | PathLike, ext: str = '.zip') tuple[str | None, str | None][source]

If fpath specifies a file inside a zipfile, it breaks it into two parts the path to the zipfile and the internal path in the zipfile.

Parameters:
  • fpath (str | PathLike) – path that specifies a path inside of an archive

  • ext (str) – archive extension

Returns:

Tuple[str, str | None]

Example

>>> split_archive('/a/b/foo.txt')
>>> split_archive('/a/b/foo.zip/bar.txt')
>>> split_archive('/a/b/foo.zip/baz/biz.zip/bar.py')
>>> split_archive('archive.zip')
>>> import ubelt as ub
>>> split_archive(ub.Path('/a/b/foo.zip/baz/biz.zip/bar.py'))
>>> split_archive('/a/b/foo.zip/baz.pt/bar.zip/bar.zip', '.pt')

Todo

Fix got/want for win32

(None, None) (‘/a/b/foo.zip’, ‘bar.txt’) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘archive.zip’, None) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘/a/b/foo.zip/baz.pt’, ‘bar.zip/bar.zip’)