File: mergeall-products/unzipped/test/ziptools/zip-extract.py

#!/usr/bin/python
"""
=============================================================================
zip-extract.py - a ziptools command-line client for unzipping zipfiles.
See ziptools' ./_README.html for license, attribution, and other logistics.

Extract (unzip) a zip file, with:

   <python> zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions] [-nomangle]]

Where:
   "zipfile" is the pathname of an existing zipfile (a ".zip" is appended to
   the end of this if missing)

   "unzipto" is the pathname of a possibly existing folder where all unzipped
   items will be stored (the default is ".", the current working directory)

   "-nofixlinks", if used, prevents symbolic-link path separators from being
   adjusted for the local platform (else they are, to make links portable)

   "-permissions", if used, causes file-access permissions to be propagated to
   extracted items; use this when unzipping files that originated on Unix [1.1]

   "-nomangle", if used, prevents automatic replacement of nonportable filename
   characters with "_" when extracts fail with unmangled names [1.3]

Arguments are input at console prompts if not listed on the command line.
For each item, the script's output lists both zipfile (from) and extracted
(to) name, the latter after a "=>" on a new line.  Exception: as of Aug-2021,
items whose from and to pathnames are the same are displayed as a single 
line to reduce output volume; this is common when extracting to the current 
directory (".").  All other cases still display two output lines as before.
Control-c anywhere in interactive mode terminates a run not yet started.

<python> is your platform's optional Python identifier string.  It may be 
"python", "python3", or an alias on Unix; and "python", "py -3", or "py" 
on Windows.  It can also be omitted on Windows (to use a default), and on 
Unix given executable permission for this script (e.g., post "chmod +x").
Some frozen app/executable packages may also omit <python>; see your docs.

The "unzipto" folder is created automatically if needed, but is cleaned
of its contents before the extract only if using interactive-prompts 
mode here and cleaning is confirmed.  Neither the base extract function 
nor non-interactive mode here do any such cleaning.  Remove the unzipto 
folder's contents manually if needed before running this script.

Caution: cleaning may not make sense for ".", the current working dir.
This case is verified with prompts in interactive mode only, but that 
is the only context in which auto-cleaning occurs.

Examples:
   python zip-extract.py                             # input args
   python zip-extract.py tests.zip                   # unzip to '.'
   python zip-extract.py download.zip dirpath        # unzip to other dir
   python zip-extract.py dev.zip  . -nofixlinks      # don't adjust links
   python zip-extract.py pkg.zip dirto -permissions  # propagate permissions 
   python zip-extract.py pkg.zip dirto -nomangle     # don't try to fix names 

ABOUT LINKS AND OTHER FILE TYPES:
   For symbolic links to both files and dirs, the ziptools package either
   zips links themselves (by default), or the items they refer to (upon
   request); this extract simply recreates whatever was added to the zip.
   FIFOs and other exotica are never zipped or unzipped.
 
   To make links more portable, path separators in link paths are automatically
   adjusted for the hosting platform by default (e.g., '/' becomes '\' on
   Windows); use "-nofixlinks" (which can appear anywhere on the command line)
   to suppress this if you are unzipping on one platform for use on another.
   See ziptools.py's main docstring for more details.

ABOUT TARGET PATHS:
   For extracts, the Python zipfile module underlying this script discards
   any special syntax in the archive's item names, including leading slashes,
   Windows drive and UNC network names, and ".." up-references.  The ziptools
   symlink adder parrots the same behavior.

   Hence, paths that were either absolute, rooted in a drive or network, or
   parent-relative at zip time become relative to (and are created in) the
   "unzipto" path here.  Items zipped as "dir0", "/dir1", "C:\dir2", and
   "..\dir3" are extracted to "dir0", "dir1", "dir2", and "dir3" in "unzipto".

   Technically, zipfile's write() removes leading slashes, drive and
   network names, and embedded ".." (they won't be in the zipfile), and its
   extract() used here removes everything special, including leading "..".  
   Other zip tools may store anything in a zipfile, and may or may not be as 
   forgiving about leading "..", but the zip-create and zip-extract scripts 
   here are meant to work as a team.

   Note that all top-level items in the zipfile are extracted as top-level
   items in the "unzipto" folder.  A zipfile that contains just files will
   not create nested folders in "unzipto"; a zipfile with folders will.
   Caution: top-level items may silently overwrite items in "unzipto", 
   even in "."; unzip to temporary folders to avoid unwanted overwrites. 
   See also the 1.2 "-zip@path" create option for collapsing zip paths.

   Also note that ziptools assumes that path separators in zipfiles use 
   Unix '/' in accordance with the zip standard, and uses '/' in its own 
   creates (zips) on Windows.  Tools which instead use '\' on Windows are  
   buggy and should be avoided; a '\' is a valid filename character on
   Unix, and hence cannot be interpreted as a separator interoperably.

ABOUT LARGE FILES:
   ziptools always uses the ZIP64 option of Python's zipfile module to 
   support files larger than zip's former size limits, both for zipping and 
   unzipping.  Unix "unzip" may not.  See zip-create.py for more details.

ABOUT PERMISSIONS:
   (Former caveat: extracts here did not preserve Unix permissions due to 
   a Python zipfile bug; see extractzipfile() in ziptools/ziptools.py.)
  
   UPDATE: as of [1.1], extracts (unzips) now do propagate Unix permissions
   for files, folders, and symlinks, but only if this is requested with the 
   new "-permissions" argument.  This should generally be used only on Unix
   zipfiles; see ziptools/ziptools.py's extractzipfile() for more details.

   Notes: some examples from prior releases may not show this new option;
   permissions are requested on extracts only, not creations (which always
   save permissions); and this option has no effect on filesystems that do 
   not support Unix-style permissions (including exFAT: you can copy a 
   zipfile to/from exFAT, but unzip on a different drive for permissions).

ABOUT MODTIMES:
   (Former caveat: extracts here deferred to Python libraries to adjust 
   modtimes of zipped items for the local DST phase, which may or may not
   have agreed with other tools, and did not address timezone changes.)

   UPDATE: as of [1.2], ziptools now stores UTC timestamps for item
   modtimes in zip extra fields, and uses them instead of zip's "local 
   time" on extracts.  This means that modtimes of zipfiles zipped and 
   unzipped by ziptools are immune to changes in both DST and timezone.  
   For more details, see the README's "_README.html#utctimestamps12".
   The former local-time scheme is still used for zipfiles without UTC. 

ABOUT FILENAME MANGLES:
   Filenames containing "|", "?", ":", and others are nonportable, and
   cannot be saved on some filesystems by unzips.  As of 1.3, this script
   by default automatically mangles (sanitizes) names that fail on Windows
   only, by replacing all nonportable filename characters with "_" and 
   trying to  extract again.
   
   This name mangling allows saves, but has a rare potential to overwrite 
   other files and break later syncs.  To make this transparent, ziptools 
   reports mangles and their tallies in run output.  To avoid mangles in 
   full, pass "-nomangle" and run the included fix-nonportable-filenames.py
   to analyze and fix nonportable names manually before content transfers. 
   With "-nomangle", unmangled filenames with characters illegal on the 
   unzip target will fail to extract and be skipped with a message.

   Shared storage in some versions of Android has filename constraints 
   similar to Windows, but no auto-mangling is performed on this platform 
   due to an Android 11 bug; run the fixer script before unzipping to 
   shared storage.  See also _README.html#nomangle for more details.

CAVEAT - PORTABILITY: 
   Unzipped symlinks work on Windows, but don't retain modtimes on
   that platform (they are stamped with the unzip time instead), due to 
   known/fixed limitations.  See the _README.html's symlinks coverage.

   UPDATE: as of [1.1], there's more thorough coverage of portability 
   issues like this in the README's "_README.html#Portability".

See zip-create.py for usage details on the zip-creation companion script.
See ziptools/ziptools.py's docstring for more on this script's utility.
Coding note: the "Do not localize" negative logic is too late to change...
=============================================================================
"""

from __future__ import print_function         # py 2.X, currently optional here
import ziptools, sys, os

from __version__ import showVersion           # [1.3] display version number
showVersion()

# portability 
RunningOnPython2 = sys.version.startswith('2')
RunningOnWindows = sys.platform.startswith('win')

if RunningOnPython2:
    input = raw_input                         # py 2.X compatibility

# avoid Windows Unicode printing errors by munging [1.2] 
from ziptools import print        

usage = 'Usage: ' \
    '<python> zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions] [-nomangle]]'

interactive = False

# see zip-create.py note about Windows icon clicks and exits

def error_exit(message):
    print(message + ', run cancelled.')
    print(usage)
    if interactive and RunningOnWindows:
         input('Press enter to close.')       # clicked on Windows: stay up
    sys.exit(1)

def okay_exit(message):
    print(message + '.')
    if interactive and RunningOnWindows:
         input('Press Enter to close.')       # ditto: stay open on Win
    sys.exit(0)                               # or os.isatty(sys.std{in,out})

def reply(prompt=''):
    if prompt: prompt += ' '
    try:
        return input(prompt)                  # exit gracefully on control+c [1.3] 
    except KeyboardInterrupt:
        okay_exit('\nRun aborted by control-c') 

# command-line mode
if len(sys.argv) >= 2:                        # 2 = script zipfile...
    nofixlinks = permissions = nomangle = False

    if '-nofixlinks' in sys.argv:             # anywhere in argv
        nofixlinks = True
        sys.argv.remove('-nofixlinks')

    if '-permissions' in sys.argv:            # anywhere in argv
        permissions = True
        sys.argv.remove('-permissions')

    if '-nomangle' in sys.argv:               # anywhere in argv
        nomangle = True
        sys.argv.remove('-nomangle')

    if len(sys.argv) not in [2, 3]:
        error_exit('Too few or too many arguments')

    zipfrom = sys.argv[1]
    zipfrom += '' if zipfrom[-4:].lower() == '.zip' else '.zip'
    unzipto = '.' if len(sys.argv) == 2 else sys.argv[2]

    if unzipto.startswith('-'):
        error_exit('Too few or too many arguments')

# interactive mode (e.g., some IDEs)
else:
    interactive = True
    zipfrom     = reply('Zip file to extract?').strip() or '_default'   # [1.1] +stp/dft
    zipfrom    += '' if zipfrom[-4:].lower() == '.zip' else '.zip'
    unzipto     = reply('Folder to extract in (use . for here) ?').strip() or '.'
    nofixlinks  = reply('Do not localize symlinks (y=yes)?').lower() == 'y'
    permissions = reply('Retain access permissions (y=yes)?').lower() == 'y'
    nomangle    = reply('Do not mangle filenames (y=yes)?').lower() == 'y'

    # use print() to avoid Unicode aborts in input() [1.2]
    print("About to UNZIP\n"
                   "\t%s,\n"
                   "\tto %s,\n"
                   "\t%slocalizing any links,\n"
                   "\t%sretaining permissions,\n"
                   "\t%smangling filenames\n"
          "Confirm with 'y'? "
   
               % (zipfrom, unzipto, 
                      'not ' if nofixlinks else '',
                      'not ' if not permissions else '',
                      'not ' if nomangle else ''), 
          end='')

    verify = reply()
    if verify.lower() != 'y':
        okay_exit('Run cancelled')

# catch user errors asap [1.1]
if not os.path.exists(zipfrom):
    error_exit('Zipfile "%s" does not exist' % zipfrom) 

if not os.path.exists(unzipto):
    # no need to create here: zipfile.extract() does os.makedirs(unzipto)
    pass
else:
    # in interactive mode, offer to clean target folder (ziptools.py doesn't);
    # removing only items to be written requires scanning the zipfile: pass;
    if (interactive and
        reply('Clean target folder first (yes=y)?').lower() == 'y'):
        # okay, but really?
        if (unzipto in ['.', os.getcwd()] and
            reply('Target = "." cwd - really clean (yes=y)?').lower() != 'y'):
            # a very bad thing to do silently!
            pass
        else:
            # proceed with cleaning
            for item in os.listdir(unzipto):
                itempath = os.path.join(unzipto, item)
                if os.path.isfile(itempath) or os.path.islink(itempath):
                    os.remove(ziptools.FWP(itempath))
                elif os.path.isdir(itempath):
                    ziptools.tryrmtree(itempath)

# the zip bit
stats = ziptools.extractzipfile(
            zipfrom, unzipto, 
            nofixlinks=nofixlinks, permissions=permissions, nomangle=nomangle)

okay_exit('Extract finished: ' + str(stats))    # [1.1]



[Home page] Books Code Blog Python Author Train Find ©M.Lutz