ziptools - augment Python's zipfile with extra tools
Version: 1.1, Mar-2020 - permissions, auto-globs, stats, 2.X
License: provided freely, but with no warranties of any kind
Author: © M. Lutz (learning-python.com) 2017-2020
Web page: https://learning-python.com/ziptools.html (visit)
Run with: Python 3.X or 2.X, on Windows, Mac OS X, Linux, and others
Install: Unzip this package's code, no third-party libraries are used
(This is an abbreviated and plain-test version of _README.html)
Summary:
This package wraps Python's zipfile module for common use cases, and
extends it with extra features, including support for adding folders,
modtime and permissions propagation, symlinks on Unix and Windows,
cruft-file handling, and long Windows paths. ziptools comes with:
+ Command-line scripts for general use (the zip-*.py files here)
+ Library tools for use in programs (module ziptools/ziptools.py)
Both foster flexible and portable content management using zipfiles.
This file is ziptools' first-level user guide; source-code files here
provide additional and lower-level details. If you have used ziptools
in the past, see VERSIONS at the end of this file for recent upgrades.
===============================================================================
QUICKSTART
===============================================================================
In the following:
- "py" is your installed Python's name (e.g., "py -3", "python3")
- "mycontent" and "myunzipdir" may be relative or absolute pathnames
- "myarchive.zip" may be located at a relative or absolute path
- "$Z" is the path to the unzipped "ziptools/" folder on your device
- "." is the current folder in create source or extract destination
Command lines (console, script, IDE):
$ py $Z/zip-create.py myarchive.zip mycontent -skipcruft
# store all of folder mycontent in new zipfile myarchive.zip
$ py $Z/zip-extract.py myarchive.zip myunzipdir
# extract the contents of myarchive.zip to folder myunzipdir
$ py $Z/zip-extract.py myarchive.zip myunzipdir -permissions
# ditto, but also propagate Unix-style permissions to all
$ py $Z/zip-create.py ../myarchive.zip * -skipcruft
# store folder contents as top-level items, on Unix or Windows
$ py $Z/zip-create.py mysoucecode.zip page.js gui.pyw *.py *.c
# store individual files as top-level items, on Unix or Windows
$ py $Z/zip-list.py myarchive.zip
# list the contents of zipfile myarchive.zip
$ py $Z/zip-create.py
# interactive mode: input parameters at prompts (see ahead)
General formats ([]=optional):
[python] zip-create.py [zipfile source [source...] [-skipcruft] [-atlinks]]
[python] zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions]]
Program library (Python code):
$ export PYTHONPATH=$Z:$PYTHONPATH
$ py
>>> import ziptools, glob
>>> cruftdflt = ziptools.cruft_skip_keep
>>> ziptools.createzipfile('myarchive.zip', ['mycontent'], cruftpatts=cruftdflt)
>>> ziptools.extractzipfile('myarchive.zip', pathto='myunzipdir')
>>> ziptools.extractzipfile('myarchive.zip', pathto='myunzipdir', permissions=True)
>>> ziptools.createzipfile('../myarchive.zip', glob.glob('*'), cruftpatts=cruftdflt)
Related tools ($C=code-install folder):
$ py $C/mergeall/mergeall.py mycontent myunzipdir/mycontent -report -skipcruft
$ py $C/mergeall/diffall.py mycontent myunzipdir/mycontent -skipcruft
# verify results, per https://learning-python.com/mergeall.html (visit)
===============================================================================
OVERVIEW
===============================================================================
Python's standard zipfile module does great low-level work, but this
package adds both much-needed features and higher-level access points,
and documents some largely undocumented dark corners of Python's zipfile
along the way. Among its features, ziptools:
+ Adds entire folder trees to zipfiles automatically
+ Propagates original modtimes for files, folders, and links
+ Can either include or skip system "cruft" files on request
+ Supports symlinks to files and folders on Unix and Windows
+ Supports long pathnames on Windows beyond its normal limits
+ Propagates file-access permissions for all items on request
Although Python's shutil module has simple zipfile wrappers that add
folders (make_archive()) and extract all items (unpack_archive()), they
don't do anything about modtimes, permissions, symlinks, system cruft,
and long Windows paths supported here. With ziptools:
Folders
are added to zipfiles as a whole automatically with extra code,
a sorely missed feature of Python's standard module. Folders
are also automatically extracted from zipfiles in full, and no
user steps are required to enable folder zips and unzips.
Modtimes
for all items (files, folders, and symlinks) are propagated to
and from zipfiles, another glaring omission in Python's standard
module. This is crucial when unzipped results are used with tools
that rely on file timestamps (e.g., Mergeall incremental backups).
No user actions are necessary to enable modtime propagation.
Cruft-file skipping
can be used to avoid adding platform-specific metadata files to
cross-platform zipfile archives. This avoids propagating the
chronic ".DS_Store" Finder files on Macs, "Desktop.ini" files on
Windows, and other unwanted ".*" hidden files on Unix, and works
for files, folders, and symlinks.
Cruft is identified with either custom skip and keep filename
patterns, or a provided general-purpose default (used autmatically
by the command-line scripts). To omit cruft items in zips, use
the "-skipcruft" command-line switch and corresponding function
argument; see zip-create.py and ziptools.py for more details.
Symlinks (symbolic links)
to both files and folders are supported on both Unix and Windows.
By default, links are always copied verbatim to and from zipfiles,
but clients may elect to copy referenced items instead with the
"-atlinks" switch and argument. Highlights of symlink support:
- When links are copied verbatim, they are by default also made
portable between Unix and Windows by automatically adjusting link
paths for the hosting platform's path separators: simply zip and
unzip to transport symlinks across platforms. The "-nofixlinks"
switch suppresses this adjustment if required.
- When links are followed to copy items referenced with "-atlinks",
recursive links are detected and copied verbatim to avoid loops,
on platforms that support inode-like identifiers. Recursion
detection works on all Unix, and Windows Pythons 3.2+.
See zip-create.py and ziptools.py for more on "-atlinks", and
zip-extract.py and ziptools.py for more on "-nofixlinks".
Note that to create symlinks on Windows, you may need to obtain
extra permissions (e.g., by right-clicking Command Prompt and
selecting "Run as administrator"). For more details, see section
"Symlinks—Copied, not Followed" in Mergeall's User Guide (visit).
Nit: due to known limitations, Windows symlinks do not retain their
original modtimes on unzips (and permissions are moot: see ahead).
Long pathnames on Windows
are allowed to exceed the normal 260/248-character length limit on
that platform, by automatically prefixing paths with '\\?\' as needed
when they are passed to the underlying Python zipfile module's tools.
No user action is required for this fix. On all versions of Windows,
it supports files and folders at long Windows paths both when adding
to and extracting from zip archives. Among other things, this is
useful for unzipping and rezipping long-path items zipped on Unix.
Permissions [1.1]
for all items (files, folders, and symlinks) can be propagated to and
from zipfiles. Permissions are always propagated _to_ zipfiles on
creates (zips), including permissions of symlinks in 1.1. Permissions
are optionally propagated _from_ zipfiles on extracts (unzips) in 1.1,
when explicitly requested by using the new "-permissions" command-line
argument, or its function-call equivalent.
Due to interoperability issues, the new extracts option should generally
be used only when unzipping from zipfiles known to have originated on
Unix, and when unzipping back to Unix. Most use cases that require
permissions to survive trips to/from zips probably satisfy this rule.
Propagation is harmless where Unix-style permissions are not supported.
Scope limitation: not all filesystems support Unix-style permissions.
On exFAT, for instance, permission updates silently change nothing,
and the extracts option has no effect. It's okay to copy a zipfile
to/from an exFAT drive as a whole, but don't _unzip_ on exFAT if you
care about retaining Unix permissions.
Beyond its features, this package also provides free command-line zip
and unzip programs that work portably on Windows, Mac OS X, Linux, and
more; runs all its code on either Python 3.X or 2.X; and comes with
complete, open-source, and changeable Python source code.
See zip-create.py and zip-extract.py for more details omitted here,
and ziptools/ziptools.py for lower-level implementation details.
===============================================================================
USAGE
===============================================================================
ziptools/ziptools.py is the main utility module, and the zip-*.py console
scripts wrap it for command-line use: creation, extraction, and listing.
All code in this package works under both Python 3.X and 2.X, and on both
Unix and Windows; see PORTABILITY ahead for more interoperability details.
The test-case folders here:
- selftest/
- cmdtest/
- moretests/
all give example usage and runs, and each script and module in this package
includes in-depth documentation strings with details omitted here for space.
See also the folder here:
- docetc/1.1-upgrades/
for demos of using the new features in version 1.1, and Mergeall's folder
test/test-symlinks/ for similar symlink support and tests.
In general, items added to zip archives are recorded with the relative or
absolute paths given, less any leading drive, UNC, and relative-path syntax.
Items are later unzipped to these paths relative to an unzip target folder.
See the create and extract scripts' docstrings for more path usage details.
Quick examples by usage mode:
PROGRAM MODE-------------------------------------------------------------------
See ziptools/ziptools.py for more on program usage.
import ziptools
ziptools.createzipfile(zipto, sources)
ziptools.extractzipfile(zipfrom, unzipto)
ziptools.createzipfile('test-1-2.zip', ['test1', 'test2'])
ziptools.extractzipfile('test-1-2.zip', '.')
from ziptools.zipcruft import cruft_skip_keep
ziptools.createzipfile('website.zip', ['website'], cruftpatts=cruft_skip_keep)
ziptools.extractzipfile('website.zip', '~/public_html', permissions=True)
ziptools.createzipfile('devtree.zip', ['dev'])
ziptools.extractzipfile('devtree.zip', '.', permissions=True)
ziptools.extractzipfile('nonportable_devtree.zip', '.', nofixlinks=True)
ziptools.createzipfile('filledintree.zip', ['skeleton'], atlinks=True)
from glob import glob
ziptools.createzipfile('allsourcecode.zip', glob('*.py') + glob('*.c'))
COMMAND-LINE MODE--------------------------------------------------------------
See zip-create.py and zip-extract.py for more on command-line usage.
# Test folders
c:\...\ziptools> zip-create.py cmdtest\ziptest.zip selftest\test1 selftest\test2
c:\...\ziptools> zip-list.py cmdtest\ziptest.zip
c:\...\ziptools> zip-extract.py cmdtest\ziptest.zip cmdtest\unzipped
# Websites
...local$ python3 $Z/zip-create.py ~/website.zip . -skipcruft
...remote$ python2 $Z/zip-extract.py ~/website.zip public_html -permissions
# Distributions
...devdir$ python3 $Z/zip-create.py program.zip programdir -skipcruft
...usedir$ python3 $Z/zip-extract.py program.zip .
# Development
...dir1$ python $Z/zip-create.py devtree.zip dev -skipcruft
...dir2$ python $Z/zip-extract.py devtree.zip . -permissions
# Special cases: populating from links, retaining link separators
...dir1$ python $Z/zip-create.py devtree.zip dev -skipcruft -atlinks
...dir2$ python $Z/zip-extract.py devtree.zip . -nofixlinks
# Individual items
...here$ python $Z/zip-create.py allcode.zip a.py b.py c.py d.py
...here$ python $Z/zip-create.py allcode.zip a.py b.py folder -skipcruft
# Shell pattern expansion: supported on all platforms in [1.1]
...here$ python $Z/zip-create.py allcode.zip *.py
...here$ python $Z/zip-create.py allcode.zip *.py test[12].txt doc?.html
# Use items in a folder as top-level items, not nested in their folder
--cd source dir
...src$ python $Z/zip-create.py ../allcode.zip * -skipcruft
--cd dest dir, copy allcode.zip to .
...dst$ python $Z/zip-extract.py allcode.zip . -permissions
INTERACTIVE MODE---------------------------------------------------------------
In the following, substitute "\" for all "/" when working on Windows.
Extract an existing zipfile to ".", the current directory
.../test-symlinks$ $Z/zip-extract.py
Zip file to extract? save-test1-test2.zip
Folder to extract in (use . for here) ? .
Do not localize symlinks (y=yes)?
Retain access permissions (y=yes)?
About to UNZIP
save-test1-test2.zip,
to .,
localizing any links,
not retaining permissions
Confirm with 'y'? y
Clean target folder first (yes=y)? n
Unzipping from save-test1-test2.zip to .
Extracted test1/
=> test1
...etc...
Create a new zipfile in a folder, from items in a folder
/Code/ziptools$ zip-create.py
Zip file to create? cmdtest/ziptest
Items to zip (comma separated)? selftest/test1, selftest/test2
Skip cruft items (y=yes)? y
Follow links to targets (y=yes)? n
About to ZIP
['selftest/test1', 'selftest/test2'],
to cmdtest/ziptest.zip,
skipping cruft,
not following links
Confirm with 'y'? y
Zipping ['selftest/test1', 'selftest/test2'] to cmdtest/ziptest.zip
Cruft patterns: {'skip': ['.*', '[dD]esktop.ini', 'Thumbs.db', '~*', '$*', '*.py[co]'], 'keep': ['.htaccess']}
Adding folder selftest/test1
--Skipped cruft file selftest/test1/.DS_Store
...etc...
Extract the zipfile just created, to another folder
/Code/ziptools$ py3 zip-extract.py
Zip file to extract? cmdtest/ziptest
Folder to extract in (use . for here) ? cmdtest/target
Do not localize symlinks (y=yes)?
Retain access permissions (y=yes)? y
About to UNZIP
cmdtest/ziptest.zip,
to cmdtest/target,
localizing any links,
retaining permissions
Confirm with 'y'? y
Clean target folder first (yes=y)? y
Removing cmdtest/target/selftest
Unzipping from cmdtest/ziptest.zip to cmdtest/target
Extracted selftest/test1/
=> cmdtest/target/selftest/test1
...etc...
List the created zipfile's contents
/Code/ziptools> zip-list.py
Zipfile to list? cmdtest/ziptest.zip
File Name Modified Size
selftest/test1/ 2016-10-02 09:01:58 0
selftest/test1/d1/ 2016-09-30 16:41:12 0
selftest/test1/d1/fa1.txt 2014-02-07 16:38:58 0
selftest/test1/d3/ 2016-10-02 09:05:02 0
selftest/test1/d3/.htaccess 2015-03-31 16:55:44 271
...etc...
Extract using absolute paths, on Unix and Windows
/...$ py3 /Code/ziptools/zip-extract.py
Zip file to extract? /Users/blue/Desktop/website.zip
Folder to extract in (use . for here) ? /Users/blue/Desktop/temp/website
Do not localize symlinks (y=yes)? n
Retain access permissions (y=yes)? n
About to UNZIP
/Users/blue/Desktop/website.zip,
to /Users/blue/Desktop/temp/website,
localizing any links,
not retaining permissions
Confirm with 'y'? y
...etc...
c:\...> py -3 C:\Code\ziptools\zip-extract.py
Zip file to extract? C:\Users\me\Desktop\website.zip
Folder to extract in (use . for here) ? C:\Users\me\Desktop\temp\website
Do not localize symlinks (y=yes)? n
Retain access permissions (y=yes)? n
About to UNZIP
C:\Users\me\Desktop\website.zip,
to C:\Users\me\Desktop\temp\website,
localizing any links,
not retaining permissions
Confirm with 'y'? y
...etc...
===============================================================================
PORTABILITY
===============================================================================
ziptools runs well on both Python 3.X and 2.X, and on both the Windows
and Unix platforms. Its zipfiles can be unzipped in most other zip
tools, and it can unzip most other tools' zipfiles. As an example,
zips created by 2.X are correctly unzipped by 3.X - and vice versa.
That said, interoperability is rarely perfect, and a few footnotes
apply, all of which reflect fixed Python and/or platform constraints:
Platforms
ziptools works equally well on Unix and Windows, but a handful of
well-known platform idiosynchrasies can impact zip results:
On Unix
ziptools works well with no notable caveats. Still, it inherits
a legacy Windows constraint: by spec, zip archives mimic the
"local time" modtime scheme of Windows FAT, instead of Unix UTC.
This can skew times across timezone changes, and yield different
time results from different unzip tools across DST phase changes.
Search for "DST" in ziptools/ziptools.py for more on this topic;
while ziptools 1.1 does not fully lift the built-in modtime limits
of zipfiles, zipfiles are both practical and useful nonetheless.
On Windows
ziptools works well (and even has Windows-specific support for
long paths and command-line argument globbing), but some utility
is limited (e.g., Unix-style permissions propagation), and some
features may require extra steps (e.g., symlinks require admin
permissions). Windows symlinks to files and folders work under
3.X (only) but do not propagate modtimes or permissions, and link
cycles are detected on Windows by only Python 3.2 or later.
Android is essentially Unix with proprietary access constraints: you
must zip to and unzip from folders accessible to your Python app. The
rules behind this morph frequently, and are too complex to cover here;
see learning-python.com/mergeall-android-scripts/_README.html#toc9
Pythons
ziptools works almost identically on Python 3.X and 2.X, and the two
Pythons perform equally well in almost all regards - including support
for non-ASCII filenames as of 1.1. Still, Python 3.X holds a slight
advantage for archives containing symlinks on Unix, and Python 3.X is
fully required for archives with symlinks on Windows.
If you do not zip symlinks, Python 2.X is as good a choice as 3.X.
Being coded in Python, though, ziptools' symlinks story is convoluted
and constrained by Python's own uneven support across platforms today:
On Unix
Python 3.X can read and write symlinks, and can propagate their
permissions and modtimes. Python 2.X can read and write symlinks,
and can propagate their permissions but not their modtimes.
On Windows
Python 3.X can read and write symlinks, but is unable to propagate
either their modtimes or their permissions. Python 2.X cannot
read or write symlinks, and hence does not support them at all.
And as noted above, only Python 3.2 or later detect link cycles.
See also
ziptools/ziptools.py
has more info on Python's symlink constraints, including a
version/platform support table; see "PYTHON SYMLINKS SUPPORT".
docetc/1.1-upgrades/py-2.X-3.X-zipoff.txt
demos ziptools' results under Python 2.X on Unix, and compares them
to Python 3.X (spoiler: they're identical, sans symlink modtimes).
doctetc/1.1-upgrades/py-2.X-fixes.txt
chronicles ziptools 1.1's fixes for non-ASCII filenames under Python
2.X. Prior to 1.1, zipfiles made under Python 2.X could yield munged
non-ASCII filenames in some unzip contexts on both Unix and Windows.
1.1 solved this by forcing filenames to Unicode in creates, to invoke
encoding in 2.X's zipfile that's more interoperable (e.g., with 3.X).
===============================================================================
VERSIONS
===============================================================================
Search for "[N.M]" in code files to see changes applied by recent releases.
Future plans
Version 1.2 will likely include an upgrade that makes zipfile modtimes
immune to changes in timezones or DST, by storing UTC timestamps in
extra fields; search for "DST" in ziptools/ziptools.py for details.
[1.1], March 28, 2020
See folder docetc/1.1-upgrades/ for demos of these 1.1 enhancements:
Permissions
Extracts propagate Unix-style permissions for all items on request
Symlinks
Creates save per-link permissions for symlinks, instead of a constant
Auto-globs
zip-create.py expands source-filename wildcard patterns on all platforms
Interface
The create and extract scripts have better error detection and reporting
Statistics
Extracts and creates both return item counts, printed by zip-*.py scripts
exFAT fix
Extracts work around an obscure Mac OS exFAT bug: force folder modtimes
Symlinks fix
Extracts fix an obscure abort: allow extracting unnested symlinks to "."
Python 2.X fixes
2.X zips non-ASCII filenames interoperably: see py-2.X-fixes.txt above
[1.0], June 12, 2017
The initial version, developed and released in parallel with the Mergeall
3.0 package, but now available and developed separately. See folders
selftest/, cmdtest/, and moretests/ for demos of 1.0 (and later) usage.