File: ziptools/ziptools/zip-create.py

#!/usr/bin/python
"""
==============================================================================
zip-create.py - a ziptools command-line client for zipping zipfiles.
See ziptools' ./_README.html for license, attribution, and other logistics.

Create (zip) a zip file, with:

   <python> zip-create.py 
       [zipfile source [source...] [-skipcruft] [-atlinks] [-zip@path] [-nocompress]]

Where:  
   "zipfile" is the pathname of the new zipfile to be created (a ".zip"
   is appended to the end of this if missing)

   each "source" (one or more) is the relative or absolute pathname of a
   file, link, or folder to be added to the zipfile

   "-skipcruft", if used, avoids adding hidden or platform-specific
   items to the zipfile (else, nothing is skipped, as described ahead)

   "-atlinks", if used, adds items that any symlinks refer to instead
   of the symlinks themselves (else, links are always added verbatim)

   "-zip@path", if used, gives an alternate path to be used as the root
   of all items in the zip (and later unzips); path "." means unnested [1.2]

   "-nocompress", if used, disables the standard compression used for 
   zipped items; use for faster zips and unzips of large content sets [1.3]

Arguments are input at the console if not listed on the command line.
The script's output lists all items added to the zipfile; when using 
"-zip@path", it also lists zip paths that differ from source paths
on a second output line that begins with a tabbed "=>" sequence.
A "*" in a source expands into multiple sources on all platforms. 
Control-c anywhere in interactive mode terminates a run not started.

<python> is your platform's optional Python identifier string.  It may be 
"python", "python3", or an alias on Unix; and "python", "py -3", or "py" 
on Windows.  It can also be omitted on Windows (to use a default), and on 
Unix given executable permission for this script (e.g., post "chmod +x").
Some frozen app/executable packages may also omit <python>; see your docs.

Examples:
   python zip-create.py                                  # input args
   python zip-create.py tests.zip test1 test2 test3      # zip 3 dirs
   python zip-create.py -skipcruft upload.zip webdir     # skip cruft
   python zip-create.py newzip dir -skipcruft -atlinks   # follow links
   python zip-create.py allcode.zip *.py test?.txt       # wildcards
   python zip-create.py ../allcode.zip * -skipcruft      # items in dir 
   python zip-create.py allcode.zip folder/* -zip@.      # remove dir nesting
   python zip-create.py folder.zip folder -nocompress    # uncompressed/fast

ABOUT CRUFT SKIPPING:
   The optional "-skipcruft" argument can appear anywhere if used.  When
   used, it prevents normally-hidden system metadata files and folders
   from being included in the generated zipfile.  Cruft defaults to all
   items whose names start with a "." (the Unix convention), plus a handful
   of others as defined in the pattern lists imported from module file
   zipcruft.py; customize these lists here or in the module as desired.

   Most end-user zips should pass "-skipcruft" to enable cruft skipping.
   This functionality is especially useful on a Mac, to avoid common files
   like ".DS_Store" and "._somename" in zips used to distribute software
   or upload websites.  If "-skipcruft" is _not_ used, every file and
   folder named in a 'source' is included in the zipfile.  For more
   background on cruft, see the overview in Mergeall's documentation
   usage pointers, at learning-python.com/mergeall/UserGuide.html.
  
   Note that cruft skipping is implemented in this create script and the
   ziptools function is uses, but not in the extract script or function.
   This is by design: the create/extract tools work together as a pair.
   To remove cruft after unzipping a file created by other tools, see
   Mergeall's nuke-cruft-files.py script.

ABOUT LINKS AND OTHER FILE TYPES:
   By default, the ziptools package zips and unzips symbolic links to both
   files and dirs themselves, not the items they refer to; use "-atlinks"
   (which also can appear anywhere) at creation time here to zip and unzip
   items that links refer to instead.  This package also always skips FIFOs
   and other exotica.  See ziptools.py for more details.

ABOUT SOURCE PATHS:
   Path separators in created zipfiles always use Unix '/', even on Windows.
   This is in accordance with the zip standard, and ensures interoperability.

   This script allows source items to be named by either relative or absolute
   pathnames, and generally stores items in the zip file with the paths given.
   When extracted, items are stored at their recreated paths relative to an
   unzip target folder (see zip-extract.py for the extract side of this story).
  
   In more detail, this script does nothing itself about any absolute paths
   (e.g., "/dir"), relative path up-references (e.g., "..\dir"), or drive
   and UNC network names on Windows (e.g.,"C:\", "\\server") on creates.  
   The Python zipfile module used here (and ziptools' symlink adder that 
   parrots it) strips any leading slashes and removes both drive and network
   names and embedded ".." on archive writes, but other oddities, including 
   leading "..", will be retained in the created zip file's item names.

   Some zip tools may have issues with this (e.g., WinZip chokes on ".."),
   but the companion script "zip-extract.py" here will always remove all
   of these special-case syntaxes, including leading "..", to make item 
   extract paths relative to (and hence stored in) the unzip destination 
   folder, regardless of their origin.  See that script for more details.

   Still, if you're going to use this script's output in other zip tools,
   for best results run it from the folder containing the items you wish
   to zip (or its parent), avoiding ".."-rooted paths:

      c:\> cd YOUR-STUFF
      c:\YOUR-STUFF> py -3 scriptpath\zip-create.py thezip x y z

   The zipfile module's write() also allows an extra 'arcname' argument
   to give an archive (and hence extract) pathname for an item that differs
   from its filename, but it's not exposed for end-users here (it is used
   by ziptools, but only internally to distinguish local-file from archive
   paths as part of the support for '\\?'-prefixed long paths on Windows).
   [See the update ahead: "-zip@path" now supports a zip-wide 'arcname'.]

   Python's os.path.commonpath() (available in 3.5 and later only) or other
   might be used to remove common path prefixes as an option if all items
   are known to be in the same path, but it is not employed here - the full
   paths listed on the command line are stored in the zipfile and will be
   recreated in later extracts relative to an extract target dir.

   For example, a file named as 'a/b/c/f.txt' is zipped and unzipped to
   an extract target folder E as 'E/a/b/c/f.txt', even if all other items
   zipped are in 'a', 'a/b', or 'a/b/c'.  Hence, if you wish to minimize
   common path prefixes in the zip, cd to a common folder of zip sources
   before running this script, if warranted in a given use case.

   UPDATE [1.2] - ALTERNATE ZIP PATHS:
      ziptools version 1.2 adds a "-zip@path" command-line option (and its
      corresponding function-call argument), which replaces the given path 
      in all zipped items with an alternate path.  This can be used to change,
      expand, shorten, or fully remove the paths of zipped items, and hence
      the paths at which they are unzipped.  A "-zip@.", for instance, makes
      items top-level and unnested, and renders pre-zip cd commands largely 
      optional.  For details and examples, see: "_README.html#altpaths12".

MORE ABOUT SOURCES: WILDCARDS AND DOT
   Also note that source arguments can include any number of folders, 
   files, or both.  Any Unix-style "*"s in sources are applied before 
   this script runs, and may expand to either file or folder names.  If 
   you list just simple files as sources and no folders (with or without
   any Unix "*" expansions), no folder nesting occurs in the created 
   zipfile or its extraction (the zipfile will be all top-level files). 
   If you list folders, they will be recreated in the extract.  See 
   test-simple-files/ in moretests/ for an example of file-only zips.

   Example: you can include the entire contents of a folder as unnested
   top-level items in the zip, by running a zip with a "*" source after
   a cd into the subject folder, and using a zipfile target path outside 
   the folder being zipped (including a zip in itself may get stuck):

      cd dir; $TOOLS/zip-create.py ../allhere.zip * -skipcruft

   This avoids folder nesting on extracts for all items in the folder:
   the zipfile can be extracted directly in its files' destination,
   and items need not be moved or copied after the extract.

   By contrast, a source "dir/*" or "dir" will instead record items as
   nested in the zip, and extract the items within their "dir" folder. 
   This is better for multiple folders that may have same-named items, 
   and may be safer (an accidental unzip won't trash files in ".").

   Special case: using "." (the current working directory) as a source 
   argument zips all items in the '." folder as top-level, unnested items.
   This is an implementation artifact, and is roughly the same as "*",
   except that "." will zip ".xxxx" hidden files and "*" (globs) won't.

   UPDATE [1.2] - WILDCARDS AND ALTERNATE ZIP PATHS:
      Items zipped from a folder can also be made unnested in a zipfile 
      with the 1.2 alternate-zip-path extension described above, and 
      without a preliminary cd command.  The equivalent to the above:

      $TOOLS/zip-create.py allhere.zip dir/* -skipcruft -zip@.

   UPDATE [1.1] - WILDCARDS ON WINDOWS: 
      As an accommodation to Windows usage, this script now automatically 
      expands (a.k.a. 'globs') any "*" wildcards in sources, if not expanded
      by the shell.  It also matches any remaining "?" single-character and 
      "[]" range operators in sources.  This means you can use "*" and the
      others in a Windows DOS shell, and elsewhere, to expand into matching 
      file and folder names.  

      Although primarily meant for Windows users who don't want to use their
      Linux subsystem, this also works in interactive mode, and for quoted
      operators in Unix shells, applying the Python "glob" module uniformly
      on all platforms to expand source patterns.  This may also be useful
      in IDEs that support command lines but don't pass them through shells.

      For more on allowed patterns, see: 
          https://docs.python.org/3.5/library/glob.html.  
      The glob is case sensitive only on OSs that are too (i.e., Unix).
      Note that auto-globs are performed only for command lines here; calls 
      to the ziptools.createzipfile() must glob.glob() sources manually. 

      [Former update: you may also use "*" expansions on Windows by running
      ziptools' scripts from the bash shell in the Windows Subsystem for 
      Linux that's now part of Windows 10; see the web for pointers.]

      [Former caveat: this could support "*" expansion on Windows too,
      by running source arguments through glob.glob(), though Windows can 
      run Unix-like shells (e.g., via cygwin).  If required, write a simple
      launcher script that runs this script with os.system(), and send it 
      the ' '.join() for glob.glob() or os.listdir() run on sources.]

ABOUT LARGE FILES
   ziptools always uses the ZIP64 option of Python's zipfile module to 
   support files larger than zip's former size limits, both for zips and 
   unzips (i.e., creates and extracts).  Unfortunately, some Unix "unzip" 
   command-line programs may fail or refuse to extract zipfiles created 
   here that are larger than 2 (or 4) G.  Both the zip-extract.py script 
   here and Finder clicks on Mac OS handle such files correctly, and other 
   third-party unzippers may as well.  If none of these are an option you 
   may need to split your zip into halves/parts, but this is a last resort;
   if you can find or install any recent Python 2.X or 3.X on the unzip host, 
   it will generally suffice to run ziptools' zip-extract.py for large files.

ABOUT PERMISSIONS
   Permissions are requested on extracts only, not for creations here;
   create always stores permissions in the zip, even for symlinks in [1.1].
   See the README and extract script for more on permissions propagation.

See zip-extract.py for usage details on the zip-extraction companion script.
See ziptools/ziptools.py's docstring for more on this script's utility.

Coding notes, [1.1] auto-glob:
   - The addition expands any unexpanded * and ? wildcards and [] ranges:
     sources ['*.py', 'FILELINK?', 'plain', 'FILELINK[12]', 'FILE*INK?']
     expand the same as they do unquoted in a Unix shell, on any platform
     (though glob.glob filters out nonexistent names and omits any '.*').  
   - A "t = []; list(map(t.extend, globs))" would do the same as code below.
==============================================================================
""" 

from __future__ import print_function         # py 2.X, currently optional here
import ziptools, sys, os                      # get ziptools/ package here

from __version__ import showVersion           # [1.3] display version number
showVersion()

# portability 
RunningOnPython2 = sys.version.startswith('2')
RunningOnWindows = sys.platform.startswith('win')

if RunningOnPython2:
    input = raw_input                         # py 2.X compatibilty

import glob, operator                         # [1.1] autoglobs
if not RunningOnPython2:                      # reduce import required in py 3.X
    from functools import reduce              # import ok but spurious in py 2.X

# defaults: customize as desired
from ziptools import cruft_skip_keep 

# avoid Windows Unicode printing errors by munging [1.2] 
from ziptools import print        

usage = 'Usage: ' \
    '<python> zip-create.py ' \
        '[zipfile source [source...] [-skipcruft] [-atlinks] [-zip@path] [-nocompress]]'

interactive = False

# It makes no sense to try to keep the Windows console open on exit unless 
# interactive: command-line args imply that this is not an icon-click run.
# ['PROMPT' not in os.environ] loosely IDs icon click, but is overkill here.

def error_exit(message):
    print(message + ', run cancelled.')
    print(usage)
    if interactive and RunningOnWindows:
         input('Press enter to close.')       # clicked on Windows: stay up
    sys.exit(1)

def okay_exit(message):
    print(message + '.')
    if interactive and RunningOnWindows:
         input('Press Enter to close.')       # ditto: stay open on Win
    sys.exit(0)                               # or os.isatty(sys.std{in,out})

def reply(prompt=''):
    if prompt: prompt += ' '
    try:
        return input(prompt)                  # exit gracefully on control+c [1.3] 
    except KeyboardInterrupt:
        okay_exit('\nRun aborted by control-c') 

# command-line mode
if len(sys.argv) >= 3:                        # 3 = script zipto source...

    skipcruft = {}
    if '-skipcruft' in sys.argv:              # anywhere in argv
        skipcruft = cruft_skip_keep
        sys.argv.remove('-skipcruft')

    atlinks = False
    if '-atlinks' in sys.argv:                # anywhere in argv
        atlinks = True
        sys.argv.remove('-atlinks')

    nocompress = False
    if '-nocompress' in sys.argv:             # anywhere in argv
        nocompress = True
        sys.argv.remove('-nocompress')

    # zip-at path [1.2]
    zipat = None
    zipix = [ix for (ix, val) in enumerate(sys.argv) if val.startswith('-zip@')]
    if len(zipix) > 1:
        error_exit('Only one -zip@ allowed')
    elif zipix:
        ziparg = sys.argv.pop(zipix[0])
        zipat = ziparg.split('@')[1]          # okay if empty: '' same as '.'
        
    if len(sys.argv) < 3:
        error_exit('Too few arguments')

    # [1.3] rstrip zipto, else makes dir/.zip (sources okay)
    zipto, sources = sys.argv[1], sys.argv[2:]
    zipto = zipto.rstrip(os.sep)
    zipto += '' if zipto[-4:].lower() == '.zip' else '.zip'
    
# some args, but not enough [1.1]
elif len(sys.argv) > 1:
    error_exit('Too few arguments')

# interactive mode (e.g., some IDEs)
else: 
    interactive = True
    zipto = reply('Zip file to create?').strip() or '_default'    # [1.1] +strip, dflt
    zipto = zipto.rstrip(os.sep)                                  # [1.3] else dir/.zip
    zipto += '' if zipto[-4:].lower() == '.zip' else '.zip'
    
    sources = reply('Items to zip (comma separated)?')
    sources = [source.strip() for source in sources.split(',')]

    skipcruft = reply('Skip cruft items (y=yes)?').lower() == 'y'
    skipcruft = cruft_skip_keep if skipcruft else {}

    atlinks = reply('Follow links to targets (y=yes)?').lower() == 'y'

    zipat = reply('Alternate zip path (.=unnested, enter=none) ?') or None    # [1.2]

    nocompress = reply('Disable item compression? (y=yes)?').lower() == 'y'   # [1.3]

    # use print() to avoid Unicode aborts in input() [1.2]
    print("About to ZIP\n"
                   "\t%s,\n"
                   "\tto %s,\n"
                   "\t%s cruft,\n"
                   "\t%sfollowing links,\n"
                   "\tzip@ path %s,\n"
                   "\t%scompressing items\n"
          "Confirm with 'y'? "

               % (sources, zipto,
                      'skipping' if skipcruft else 'keeping',
                      '' if atlinks else 'not ',
                      '(unused)' if zipat == None else repr(zipat),
                      'not ' if nocompress else ''), 
          end='')

    verify = reply()
    if verify.lower() != 'y':
        okay_exit('Run cancelled')

# catch user errors asap [1.1]
for source in sources:
    if not any(c in source for c in '*?[') and not os.path.exists(source):
        error_exit('Source file "%s" does not exist' % source)

# auto-glob: expand unexpanded *, ?, [] (see coding notes above) [1.1]
sources = reduce(operator.add, [glob.glob(source) for source in sources])

# post glob removals of invalids
if not sources:
    error_exit('No existing source files provided')

# os.remove(zipto) not required: zipfile opens it in 'wb' mode

# the zip bit
stats = ziptools.createzipfile(
            zipto, sources, 
            cruftpatts=skipcruft, atlinks=atlinks, zipat=zipat, nocompress=nocompress)

okay_exit('Create finished: ' + str(stats))   # [1.1]



[Home page] Books Code Blog Python Author Train Find ©M.Lutz