ziptools - augment Python's zipfile with extra tools

Version:  1.1, Mar-2020 - permissions, auto-globs, stats, 2.X
License:  provided freely, but with no warranties of any kind
Author:   © M. Lutz (learning-python.com) 2017-2020
Web page: https://learning-python.com/ziptools.html (visit)
Run with: Python 3.X or 2.X, on Windows, Mac OS X, Linux, and others
Install:  Unzip this package's code, no third-party libraries are used

(This is an abbreviated and plain-test version of _README.html)

Summary:

This package wraps Python's zipfile module for common use cases, and 
extends it with extra features, including support for adding folders, 
modtime and permissions propagation, symlinks on Unix and Windows, 
cruft-file handling, and long Windows paths.  ziptools comes with:

   + Command-line scripts for general use (the zip-*.py files here)
   + Library tools for use in programs (module ziptools/ziptools.py)

Both foster flexible and portable content management using zipfiles.

This file is ziptools' first-level user guide; source-code files here 
provide additional and lower-level details.  If you have used ziptools
in the past, see VERSIONS at the end of this file for recent upgrades.


===============================================================================
QUICKSTART
===============================================================================

In the following:
   - "py" is your installed Python's name (e.g., "py -3", "python3")
   - "mycontent" and "myunzipdir" may be relative or absolute pathnames
   - "myarchive.zip" may be located at a relative or absolute path
   - "$Z" is the path to the unzipped "ziptools/" folder on your device
   - "." is the current folder in create source or extract destination

Command lines (console, script, IDE):

   $ py $Z/zip-create.py myarchive.zip mycontent -skipcruft
         # store all of folder mycontent in new zipfile myarchive.zip

   $ py $Z/zip-extract.py myarchive.zip myunzipdir
         # extract the contents of myarchive.zip to folder myunzipdir

   $ py $Z/zip-extract.py myarchive.zip myunzipdir -permissions
         # ditto, but also propagate Unix-style permissions to all

   $ py $Z/zip-create.py ../myarchive.zip * -skipcruft
         # store folder contents as top-level items, on Unix or Windows 

   $ py $Z/zip-create.py mysoucecode.zip page.js gui.pyw *.py *.c
         # store individual files as top-level items, on Unix or Windows 

   $ py $Z/zip-list.py myarchive.zip
         # list the contents of zipfile myarchive.zip

   $ py $Z/zip-create.py
         # interactive mode: input parameters at prompts (see ahead)

General formats ([]=optional):

   [python] zip-create.py [zipfile source [source...] [-skipcruft] [-atlinks]]
   [python] zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions]]

Program library (Python code):

   $ export PYTHONPATH=$Z:$PYTHONPATH
   $ py
   >>> import ziptools, glob
   >>> cruftdflt = ziptools.cruft_skip_keep

   >>> ziptools.createzipfile('myarchive.zip', ['mycontent'], cruftpatts=cruftdflt)

   >>> ziptools.extractzipfile('myarchive.zip', pathto='myunzipdir')

   >>> ziptools.extractzipfile('myarchive.zip', pathto='myunzipdir', permissions=True)

   >>> ziptools.createzipfile('../myarchive.zip', glob.glob('*'), cruftpatts=cruftdflt)

Related tools ($C=code-install folder):

   $ py $C/mergeall/mergeall.py mycontent myunzipdir/mycontent -report -skipcruft 
   $ py $C/mergeall/diffall.py  mycontent myunzipdir/mycontent -skipcruft
         # verify results, per https://learning-python.com/mergeall.html (visit)


===============================================================================
OVERVIEW
===============================================================================

Python's standard zipfile module does great low-level work, but this 
package adds both much-needed features and higher-level access points, 
and documents some largely undocumented dark corners of Python's zipfile
along the way.  Among its features, ziptools:

   + Adds entire folder trees to zipfiles automatically 
   + Propagates original modtimes for files, folders, and links
   + Can either include or skip system "cruft" files on request
   + Supports symlinks to files and folders on Unix and Windows
   + Supports long pathnames on Windows beyond its normal limits 
   + Propagates file-access permissions for all items on request 

Although Python's shutil module has simple zipfile wrappers that add 
folders (make_archive()) and extract all items (unpack_archive()), they
don't do anything about modtimes, permissions, symlinks, system cruft,
and long Windows paths supported here.  With ziptools:

Folders
   are added to zipfiles as a whole automatically with extra code,
   a sorely missed feature of Python's standard module.  Folders
   are also automatically extracted from zipfiles in full, and no
   user steps are required to enable folder zips and unzips.

Modtimes
   for all items (files, folders, and symlinks) are propagated to 
   and from zipfiles, another glaring omission in Python's standard
   module.  This is crucial when unzipped results are used with tools
   that rely on file timestamps (e.g., Mergeall incremental backups).
   No user actions are necessary to enable modtime propagation.

Cruft-file skipping
   can be used to avoid adding platform-specific metadata files to 
   cross-platform zipfile archives.  This avoids propagating the 
   chronic ".DS_Store" Finder files on Macs, "Desktop.ini" files on 
   Windows, and other unwanted ".*" hidden files on Unix, and works
   for files, folders, and symlinks.

   Cruft is identified with either custom skip and keep filename 
   patterns, or a provided general-purpose default (used autmatically
   by the command-line scripts).  To omit cruft items in zips, use 
   the "-skipcruft" command-line switch and corresponding function 
   argument; see zip-create.py and ziptools.py for more details.

Symlinks (symbolic links)
   to both files and folders are supported on both Unix and Windows.
   By default, links are always copied verbatim to and from zipfiles,
   but clients may elect to copy referenced items instead with the 
   "-atlinks" switch and argument.  Highlights of symlink support:

   - When links are copied verbatim, they are by default also made 
     portable between Unix and Windows by automatically adjusting link
     paths for the hosting platform's path separators: simply zip and
     unzip to transport symlinks across platforms.  The "-nofixlinks" 
     switch suppresses this adjustment if required.

   - When links are followed to copy items referenced with "-atlinks",
     recursive links are detected and copied verbatim to avoid loops, 
     on platforms that support inode-like identifiers.  Recursion 
     detection works on all Unix, and Windows Pythons 3.2+.

   See zip-create.py and ziptools.py for more on "-atlinks", and
   zip-extract.py and ziptools.py for more on "-nofixlinks".  

   Note that to create symlinks on Windows, you may need to obtain
   extra permissions (e.g., by right-clicking Command Prompt and 
   selecting "Run as administrator").  For more details, see section  
   "Symlinks—Copied, not Followed" in Mergeall's User Guide (visit). 
   Nit: due to known limitations, Windows symlinks do not retain their
   original modtimes on unzips (and permissions are moot: see ahead).

Long pathnames on Windows
   are allowed to exceed the normal 260/248-character length limit on 
   that platform, by automatically prefixing paths with '\\?\' as needed
   when they are passed to the underlying Python zipfile module's tools.  

   No user action is required for this fix.  On all versions of Windows,
   it supports files and folders at long Windows paths both when adding 
   to and extracting from zip archives.  Among other things, this is  
   useful for unzipping and rezipping long-path items zipped on Unix.

Permissions [1.1]
   for all items (files, folders, and symlinks) can be propagated to and
   from zipfiles.  Permissions are always propagated _to_ zipfiles on 
   creates (zips), including permissions of symlinks in 1.1.  Permissions
   are optionally propagated _from_ zipfiles on extracts (unzips) in 1.1,
   when explicitly requested by using the new "-permissions" command-line 
   argument, or its function-call equivalent.

   Due to interoperability issues, the new extracts option should generally
   be used only when unzipping from zipfiles known to have originated on 
   Unix, and when unzipping back to Unix.  Most use cases that require 
   permissions to survive trips to/from zips probably satisfy this rule.
   Propagation is harmless where Unix-style permissions are not supported. 

   Scope limitation: not all filesystems support Unix-style permissions.  
   On exFAT, for instance, permission updates silently change nothing, 
   and the extracts option has no effect.  It's okay to copy a zipfile 
   to/from an exFAT drive as a whole, but don't _unzip_ on exFAT if you 
   care about retaining Unix permissions.

Beyond its features, this package also provides free command-line zip 
and unzip programs that work portably on Windows, Mac OS X, Linux, and 
more; runs all its code on either Python 3.X or 2.X; and comes with 
complete, open-source, and changeable Python source code.

See zip-create.py and zip-extract.py for more details omitted here, 
and ziptools/ziptools.py for lower-level implementation details.


===============================================================================
USAGE
===============================================================================

ziptools/ziptools.py is the main utility module, and the zip-*.py console
scripts wrap it for command-line use: creation, extraction, and listing.

All code in this package works under both Python 3.X and 2.X, and on both
Unix and Windows; see PORTABILITY ahead for more interoperability details.

The test-case folders here:

   - selftest/
   - cmdtest/
   - moretests/ 

all give example usage and runs, and each script and module in this package
includes in-depth documentation strings with details omitted here for space.
See also the folder here:

   - docetc/1.1-upgrades/ 

for demos of using the new features in version 1.1, and Mergeall's folder
test/test-symlinks/ for similar symlink support and tests.

In general, items added to zip archives are recorded with the relative or 
absolute paths given, less any leading drive, UNC, and relative-path syntax.  
Items are later unzipped to these paths relative to an unzip target folder.  
See the create and extract scripts' docstrings for more path usage details.

Quick examples by usage mode:


PROGRAM MODE-------------------------------------------------------------------

See ziptools/ziptools.py for more on program usage.

   import ziptools
   ziptools.createzipfile(zipto, sources)
   ziptools.extractzipfile(zipfrom, unzipto)

   ziptools.createzipfile('test-1-2.zip', ['test1', 'test2'])
   ziptools.extractzipfile('test-1-2.zip', '.')

   from ziptools.zipcruft import cruft_skip_keep
   ziptools.createzipfile('website.zip', ['website'], cruftpatts=cruft_skip_keep)
   ziptools.extractzipfile('website.zip', '~/public_html', permissions=True)

   ziptools.createzipfile('devtree.zip', ['dev'])
   ziptools.extractzipfile('devtree.zip', '.', permissions=True)

   ziptools.extractzipfile('nonportable_devtree.zip', '.', nofixlinks=True)
   ziptools.createzipfile('filledintree.zip', ['skeleton'], atlinks=True)

   from glob import glob
   ziptools.createzipfile('allsourcecode.zip', glob('*.py') + glob('*.c'))


COMMAND-LINE MODE--------------------------------------------------------------

See zip-create.py and zip-extract.py for more on command-line usage.

   # Test folders
   c:\...\ziptools> zip-create.py cmdtest\ziptest.zip selftest\test1 selftest\test2
   c:\...\ziptools> zip-list.py cmdtest\ziptest.zip
   c:\...\ziptools> zip-extract.py cmdtest\ziptest.zip cmdtest\unzipped

   # Websites
   ...local$  python3 $Z/zip-create.py ~/website.zip . -skipcruft 
   ...remote$ python2 $Z/zip-extract.py ~/website.zip public_html -permissions

   # Distributions
   ...devdir$ python3 $Z/zip-create.py program.zip programdir -skipcruft
   ...usedir$ python3 $Z/zip-extract.py program.zip .

   # Development
   ...dir1$ python $Z/zip-create.py devtree.zip dev -skipcruft
   ...dir2$ python $Z/zip-extract.py devtree.zip . -permissions

   # Special cases: populating from links, retaining link separators
   ...dir1$ python $Z/zip-create.py devtree.zip dev -skipcruft -atlinks
   ...dir2$ python $Z/zip-extract.py devtree.zip . -nofixlinks

   # Individual items
   ...here$ python $Z/zip-create.py allcode.zip a.py b.py c.py d.py 
   ...here$ python $Z/zip-create.py allcode.zip a.py b.py folder -skipcruft

   # Shell pattern expansion: supported on all platforms in [1.1]
   ...here$ python $Z/zip-create.py allcode.zip *.py 
   ...here$ python $Z/zip-create.py allcode.zip *.py test[12].txt doc?.html 

   # Use items in a folder as top-level items, not nested in their folder
   --cd source dir
   ...src$ python $Z/zip-create.py ../allcode.zip * -skipcruft 
   --cd dest dir, copy allcode.zip to .
   ...dst$ python $Z/zip-extract.py allcode.zip . -permissions


INTERACTIVE MODE---------------------------------------------------------------

In the following, substitute "\" for all "/" when working on Windows.

Extract an existing zipfile to ".", the current directory

   .../test-symlinks$ $Z/zip-extract.py
   Zip file to extract? save-test1-test2.zip
   Folder to extract in (use . for here) ? .
   Do not localize symlinks (y=yes)? 
   Retain access permissions (y=yes)? 
   About to UNZIP
         save-test1-test2.zip,
         to .,
         localizing any links,
         not retaining permissions
   Confirm with 'y'? y
   Clean target folder first (yes=y)? n
   Unzipping from save-test1-test2.zip to .
   Extracted test1/
                => test1
   ...etc...

Create a new zipfile in a folder, from items in a folder

   /Code/ziptools$ zip-create.py
   Zip file to create? cmdtest/ziptest
   Items to zip (comma separated)? selftest/test1, selftest/test2                   
   Skip cruft items (y=yes)? y
   Follow links to targets (y=yes)? n
   About to ZIP
         ['selftest/test1', 'selftest/test2'],
         to cmdtest/ziptest.zip,
         skipping cruft,
         not following links
   Confirm with 'y'? y
   Zipping ['selftest/test1', 'selftest/test2'] to cmdtest/ziptest.zip
   Cruft patterns: {'skip': ['.*', '[dD]esktop.ini', 'Thumbs.db', '~*', '$*', '*.py[co]'], 'keep': ['.htaccess']}
   Adding folder selftest/test1
   --Skipped cruft file selftest/test1/.DS_Store
   ...etc...

Extract the zipfile just created, to another folder

   /Code/ziptools$ py3 zip-extract.py 
   Zip file to extract? cmdtest/ziptest
   Folder to extract in (use . for here) ? cmdtest/target
   Do not localize symlinks (y=yes)? 
   Retain access permissions (y=yes)? y
   About to UNZIP
         cmdtest/ziptest.zip,
         to cmdtest/target,
         localizing any links,
         retaining permissions
   Confirm with 'y'? y
   Clean target folder first (yes=y)? y
   Removing cmdtest/target/selftest
   Unzipping from cmdtest/ziptest.zip to cmdtest/target
   Extracted selftest/test1/
                => cmdtest/target/selftest/test1
   ...etc...

List the created zipfile's contents

   /Code/ziptools> zip-list.py
   Zipfile to list? cmdtest/ziptest.zip
   File Name                                             Modified             Size
   selftest/test1/                                2016-10-02 09:01:58            0
   selftest/test1/d1/                             2016-09-30 16:41:12            0
   selftest/test1/d1/fa1.txt                      2014-02-07 16:38:58            0
   selftest/test1/d3/                             2016-10-02 09:05:02            0
   selftest/test1/d3/.htaccess                    2015-03-31 16:55:44          271
   ...etc...

Extract using absolute paths, on Unix and Windows

   /...$ py3 /Code/ziptools/zip-extract.py 
   Zip file to extract? /Users/blue/Desktop/website.zip
   Folder to extract in (use . for here) ? /Users/blue/Desktop/temp/website
   Do not localize symlinks (y=yes)? n
   Retain access permissions (y=yes)? n
   About to UNZIP
         /Users/blue/Desktop/website.zip,
         to /Users/blue/Desktop/temp/website,
         localizing any links,
         not retaining permissions
   Confirm with 'y'? y
   ...etc...

   c:\...> py -3 C:\Code\ziptools\zip-extract.py 
   Zip file to extract? C:\Users\me\Desktop\website.zip
   Folder to extract in (use . for here) ? C:\Users\me\Desktop\temp\website
   Do not localize symlinks (y=yes)? n
   Retain access permissions (y=yes)? n
   About to UNZIP
         C:\Users\me\Desktop\website.zip,
         to C:\Users\me\Desktop\temp\website,
         localizing any links,
         not retaining permissions
   Confirm with 'y'? y
   ...etc...


===============================================================================
PORTABILITY
===============================================================================

ziptools runs well on both Python 3.X and 2.X, and on both the Windows 
and Unix platforms.  Its zipfiles can be unzipped in most other zip 
tools, and it can unzip most other tools' zipfiles.  As an example, 
zips created by 2.X are correctly unzipped by 3.X - and vice versa.
  
That said, interoperability is rarely perfect, and a few footnotes 
apply, all of which reflect fixed Python and/or platform constraints:

Platforms
   ziptools works equally well on Unix and Windows, but a handful of
   well-known platform idiosynchrasies can impact zip results:

   On Unix
      ziptools works well with no notable caveats.  Still, it inherits 
      a legacy Windows constraint: by spec, zip archives mimic the 
      "local time" modtime scheme of Windows FAT, instead of Unix UTC.
      This can skew times across timezone changes, and yield different 
      time results from different unzip tools across DST phase changes.  
      Search for "DST" in ziptools/ziptools.py for more on this topic;
      while ziptools 1.1 does not fully lift the built-in modtime limits
      of  zipfiles, zipfiles are both practical and useful nonetheless. 

   On Windows
      ziptools works well (and even has Windows-specific support for 
      long paths and command-line argument globbing), but some utility
      is limited (e.g., Unix-style permissions propagation), and some 
      features may require extra steps (e.g., symlinks require admin 
      permissions).  Windows symlinks to files and folders work under 
      3.X (only) but do not propagate modtimes or permissions, and link
      cycles are detected on Windows by only Python 3.2 or later.

   Android is essentially Unix with proprietary access constraints: you 
   must zip to and unzip from folders accessible to your Python app.  The
   rules behind this morph frequently, and are too complex to cover here; 
   see learning-python.com/mergeall-android-scripts/_README.html#toc9

Pythons
   ziptools works almost identically on Python 3.X and 2.X, and the two 
   Pythons perform equally well in almost all regards - including support
   for non-ASCII filenames as of 1.1.  Still, Python 3.X holds a slight 
   advantage for archives containing symlinks on Unix, and Python 3.X is 
   fully required for archives with symlinks on Windows.

   If you do not zip symlinks, Python 2.X is as good a choice as 3.X.
   Being coded in Python, though, ziptools' symlinks story is convoluted 
   and constrained by Python's own uneven support across platforms today:

   On Unix
      Python 3.X can read and write symlinks, and can propagate their 
      permissions and modtimes.  Python 2.X can read and write symlinks, 
      and can propagate their permissions but not their modtimes. 

   On Windows
      Python 3.X can read and write symlinks, but is unable to propagate 
      either their modtimes or their permissions.  Python 2.X cannot 
      read or write symlinks, and hence does not support them at all.
      And as noted above, only Python 3.2 or later detect link cycles. 
  
See also
   ziptools/ziptools.py 
      has more info on Python's symlink constraints, including a 
      version/platform support table; see "PYTHON SYMLINKS SUPPORT".

   docetc/1.1-upgrades/py-2.X-3.X-zipoff.txt 
      demos ziptools' results under Python 2.X on Unix, and compares them 
      to Python 3.X (spoiler: they're identical, sans symlink modtimes).

   doctetc/1.1-upgrades/py-2.X-fixes.txt 
      chronicles ziptools 1.1's fixes for non-ASCII filenames under Python
      2.X.  Prior to 1.1, zipfiles made under Python 2.X could yield munged
      non-ASCII filenames in some unzip contexts on both Unix and Windows. 
      1.1 solved this by forcing filenames to Unicode in creates, to invoke 
      encoding in 2.X's zipfile that's more interoperable (e.g., with 3.X).


===============================================================================
VERSIONS
===============================================================================

Search for "[N.M]" in code files to see changes applied by recent releases.

Future plans
   Version 1.2 will likely include an upgrade that makes zipfile modtimes 
   immune to changes in timezones or DST, by storing UTC timestamps in
   extra fields; search for "DST" in ziptools/ziptools.py for details. 

[1.1], March 28, 2020
   See folder docetc/1.1-upgrades/ for demos of these 1.1 enhancements:

   Permissions
      Extracts propagate Unix-style permissions for all items on request
   Symlinks
      Creates save per-link permissions for symlinks, instead of a constant 
   Auto-globs
      zip-create.py expands source-filename wildcard patterns on all platforms
   Interface
      The create and extract scripts have better error detection and reporting
   Statistics
      Extracts and creates both return item counts, printed by zip-*.py scripts 
   exFAT fix
      Extracts work around an obscure Mac OS exFAT bug: force folder modtimes
   Symlinks fix
      Extracts fix an obscure abort: allow extracting unnested symlinks to "."
   Python 2.X fixes
      2.X zips non-ASCII filenames interoperably: see py-2.X-fixes.txt above

[1.0], June 12, 2017
   The initial version, developed and released in parallel with the Mergeall 
   3.0 package, but now available and developed separately.  See folders 
   selftest/, cmdtest/, and moretests/ for demos of 1.0 (and later) usage.