ubelt.util_zip module¶
- class ubelt.util_zip.zopen(fpath: str | PathLike, mode: str = 'r', seekable: bool = False, ext: str = '.zip')[source]¶
Bases:
NiceReprAn abstraction of the normal
open()function that can also handle reading data directly inside of zipfiles.This is a file-object like interface [FileObj] — i.e. it supports the read and write methods to an underlying resource.
Can open a file normally or open a file within a zip file (readonly). Tries to read from memory only, but will extract to a tempfile if necessary.
Just treat the zipfile like a directory, e.g. /path/to/myzip.zip/compressed/path.txt OR? e.g. /path/to/myzip.zip:compressed/path.txt
References
Todo
- [ ] Fast way to open a base zipfile, query what is inside, and
then choose a file to further zopen (and passing along the same open zipfile reference maybe?).
[ ] Write mode in some restricted setting?
- Variables:
name (str | PathLike) – path to a file or reference to an item in a zipfile.
Example
>>> from ubelt.util_zip import * # NOQA >>> import pickle >>> import ubelt as ub >>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir() >>> dpath = ub.Path(dpath) >>> data_fpath = dpath / 'test.pkl' >>> data = {'demo': 'data'} >>> with open(str(data_fpath), 'wb') as file: >>> pickle.dump(data, file) >>> # Write data >>> import zipfile >>> zip_fpath = dpath / 'test_zip.archive' >>> stl_w_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='w') >>> stl_w_zfile.write(os.fspath(data_fpath), os.fspath(data_fpath.relative_to(dpath))) >>> stl_w_zfile.close() >>> stl_r_zfile = zipfile.ZipFile(os.fspath(zip_fpath), mode='r') >>> stl_r_zfile.namelist() >>> stl_r_zfile.close() >>> # Test zopen >>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive') >>> print(self._split_archive()) >>> print(self.namelist()) >>> self.close() >>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive') >>> recon1 = pickle.loads(self.read()) >>> self.close() >>> self = zopen(zip_fpath / 'test.pkl', mode='rb', ext='.archive') >>> recon2 = pickle.load(self) >>> self.close() >>> assert recon1 == recon2 >>> assert recon1 is not recon2
Example
>>> # Test we can load json data from a zipfile >>> from ubelt.util_zip import * # NOQA >>> import ubelt as ub >>> import json >>> import zipfile >>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir() >>> infopath = join(dpath, 'info.json') >>> ub.writeto(infopath, '{"x": "1"}') >>> zippath = join(dpath, 'infozip.zip') >>> internal = 'folder/info.json' >>> with zipfile.ZipFile(zippath, 'w') as myzip: >>> myzip.write(infopath, internal) >>> fpath = zippath + '/' + internal >>> # Test context manager >>> with zopen(fpath, 'r') as self: >>> info2 = json.load(self) >>> assert info2['x'] == '1' >>> # Test outside of context manager >>> self = zopen(fpath, 'r') >>> print(self._split_archive()) >>> info2 = json.load(self) >>> assert info2['x'] == '1' >>> # Test nice repr (with zfile) >>> print('self = {!r}'.format(self)) >>> self.close()
Example
>>> # Coverage tests --- move to unit-test >>> from ubelt.util_zip import * # NOQA >>> import ubelt as ub >>> import json >>> import zipfile >>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir() >>> textpath = join(dpath, 'seekable_test.txt') >>> text = chr(10).join(['line{}'.format(i) for i in range(10)]) >>> ub.writeto(textpath, text) >>> zippath = join(dpath, 'seekable_test.zip') >>> internal = 'folder/seekable_test.txt' >>> with zipfile.ZipFile(zippath, 'w') as myzip: >>> myzip.write(textpath, internal) >>> ub.delete(textpath) >>> fpath = zippath + '/' + internal >>> # Test seekable >>> self_seekable = zopen(fpath, 'r', seekable=True) >>> assert self_seekable.seekable() >>> self_seekable.seek(8) >>> assert self_seekable.readline() == 'ne1' + chr(10) >>> assert self_seekable.readline() == 'line2' + chr(10) >>> self_seekable.seek(8) >>> assert self_seekable.readline() == 'ne1' + chr(10) >>> assert self_seekable.readline() == 'line2' + chr(10) >>> # Test non-seekable? >>> # Sometimes non-seekable files are still seekable >>> maybe_seekable = zopen(fpath, 'r', seekable=False) >>> if maybe_seekable.seekable(): >>> maybe_seekable.seek(8) >>> assert maybe_seekable.readline() == 'ne1' + chr(10) >>> assert maybe_seekable.readline() == 'line2' + chr(10) >>> maybe_seekable.seek(8) >>> assert maybe_seekable.readline() == 'ne1' + chr(10) >>> assert maybe_seekable.readline() == 'line2' + chr(10)
Example
>>> # More coverage tests --- move to unit-test >>> from ubelt.util_zip import * # NOQA >>> import ubelt as ub >>> import pytest >>> dpath = ub.Path.appdir('ubelt/tests/util_zip').ensuredir() >>> with pytest.raises(OSError): >>> self = zopen('', 'r') >>> # Test open non-zip existing file >>> existing_fpath = join(dpath, 'exists.json') >>> ub.writeto(existing_fpath, '{"x": "1"}') >>> self = zopen(existing_fpath, 'r') >>> assert self.read() == '{"x": "1"}' >>> # Test dir >>> dir(self) >>> # Test nice >>> print(self) >>> print('self = {!r}'.format(self)) >>> self.close() >>> # Test open non-zip non-existing file >>> nonexisting_fpath = join(dpath, 'does-not-exist.txt') >>> ub.delete(nonexisting_fpath) >>> with pytest.raises(OSError): >>> self = zopen(nonexisting_fpath, 'r') >>> with pytest.raises(NotImplementedError): >>> self = zopen(nonexisting_fpath, 'w') >>> # Test nice-repr >>> self = zopen(existing_fpath, 'r') >>> print('self = {!r}'.format(self)) >>> # pathological >>> self = zopen(existing_fpath, 'r') >>> self._handle = None >>> dir(self)
- Parameters:
fpath (str | PathLike) – path to a file, or a special path that denotes both a path to a zipfile and a path to a archived file inside of the zipfile.
mode (str) – Currently only “r” - readonly mode is supported
seekable (bool) – If True, attempts to force “seekability” of the underlying file-object, for compressed files this will first extract the file to a temporary location on disk. If False, any underlying compressed file will be opened directly which may result in the object being non-seekable.
ext (str) – The extension of the zipfile. Modify this is a non-standard extension is used (e.g. for torch packages).
- _open() None[source]¶
This logic sets the “_handle” to the appropriate backend object such that zopen can behave like a standard IO object.
- In read-only mode:
If fpath is a normal file, _handle is the standard open object
- If fpath is a seekable zipfile, _handle is an IOWrapper pointing
to the internal data
- If fpath is a non-seekable zipfile, the data is extracted behind
the scenes and a standard open object to the extracted file is given.
- In write mode:
NotImpelemented
- ubelt.util_zip.split_archive(fpath: str | PathLike, ext: str = '.zip') tuple[str | None, str | None][source]¶
If fpath specifies a file inside a zipfile, it breaks it into two parts the path to the zipfile and the internal path in the zipfile.
- Parameters:
fpath (str | PathLike) – path that specifies a path inside of an archive
ext (str) – archive extension
- Returns:
Tuple[str, str | None]
Example
>>> split_archive('/a/b/foo.txt') >>> split_archive('/a/b/foo.zip/bar.txt') >>> split_archive('/a/b/foo.zip/baz/biz.zip/bar.py') >>> split_archive('archive.zip') >>> import ubelt as ub >>> split_archive(ub.Path('/a/b/foo.zip/baz/biz.zip/bar.py')) >>> split_archive('/a/b/foo.zip/baz.pt/bar.zip/bar.zip', '.pt')
Todo
Fix got/want for win32
(None, None) (‘/a/b/foo.zip’, ‘bar.txt’) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘archive.zip’, None) (‘/a/b/foo.zip/baz/biz.zip’, ‘bar.py’) (‘/a/b/foo.zip/baz.pt’, ‘bar.zip/bar.zip’)