File: ziptools/ziptools/zip-extract.py
#!/usr/bin/python
r"""dec24 (else py3.13 makes \? escapes errors)
=============================================================================
zip-extract.py - a ziptools command-line client for unzipping zipfiles.
See ziptools' ./_README.html for license, attribution, and other logistics.
Extract (unzip) a zip file, with:
<python> zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions] [-nomangle]]
Where:
"zipfile" is the pathname of an existing zipfile (a ".zip" is appended to
the end of this if missing)
"unzipto" is the pathname of a possibly existing folder where all unzipped
items will be stored (the default is ".", the current working directory)
"-nofixlinks", if used, prevents symbolic-link path separators from being
adjusted for the local platform (else they are, to make links portable)
"-permissions", if used, causes file-access permissions to be propagated to
extracted items; use this when unzipping files that originated on Unix [1.1]
"-nomangle", if used, prevents automatic replacement of nonportable filename
characters with "_" when extracts fail with unmangled names [1.3]
Arguments are input at console prompts if not listed on the command line.
For each item, the script's output lists both zipfile (from) and extracted
(to) name, the latter after a "=>" on a new line. Exception: as of Aug-2021,
items whose from and to pathnames are the same are displayed as a single
line to reduce output volume; this is common when extracting to the current
directory ("."). All other cases still display two output lines as before.
Control-c anywhere in interactive mode terminates a run not yet started.
<python> is your platform's optional Python identifier string. It may be
"python", "python3", or an alias on Unix; and "python", "py -3", or "py"
on Windows. It can also be omitted on Windows (to use a default), and on
Unix given executable permission for this script (e.g., post "chmod +x").
Some frozen app/executable packages may also omit <python>; see your docs.
The "unzipto" folder is created automatically if needed, but is cleaned
of its contents before the extract only if using interactive-prompts
mode here and cleaning is confirmed. Neither the base extract function
nor non-interactive mode here do any such cleaning. Remove the unzipto
folder's contents manually if needed before running this script.
Caution: cleaning may not make sense for ".", the current working dir.
This case is verified with prompts in interactive mode only, but that
is the only context in which auto-cleaning occurs.
Examples:
python zip-extract.py # input args
python zip-extract.py tests.zip # unzip to '.'
python zip-extract.py download.zip dirpath # unzip to other dir
python zip-extract.py dev.zip . -nofixlinks # don't adjust links
python zip-extract.py pkg.zip dirto -permissions # propagate permissions
python zip-extract.py pkg.zip dirto -nomangle # don't try to fix names
ABOUT LINKS AND OTHER FILE TYPES:
For symbolic links to both files and dirs, the ziptools package either
zips links themselves (by default), or the items they refer to (upon
request); this extract simply recreates whatever was added to the zip.
FIFOs and other exotica are never zipped or unzipped.
To make links more portable, path separators in link paths are automatically
adjusted for the hosting platform by default (e.g., '/' becomes '\' on
Windows); use "-nofixlinks" (which can appear anywhere on the command line)
to suppress this if you are unzipping on one platform for use on another.
See ziptools.py's main docstring for more details.
ABOUT TARGET PATHS:
For extracts, the Python zipfile module underlying this script discards
any special syntax in the archive's item names, including leading slashes,
Windows drive and UNC network names, and ".." up-references. The ziptools
symlink adder parrots the same behavior.
Hence, paths that were either absolute, rooted in a drive or network, or
parent-relative at zip time become relative to (and are created in) the
"unzipto" path here. Items zipped as "dir0", "/dir1", "C:\dir2", and
"..\dir3" are extracted to "dir0", "dir1", "dir2", and "dir3" in "unzipto".
Technically, zipfile's write() removes leading slashes, drive and
network names, and embedded ".." (they won't be in the zipfile), and its
extract() used here removes everything special, including leading "..".
Other zip tools may store anything in a zipfile, and may or may not be as
forgiving about leading "..", but the zip-create and zip-extract scripts
here are meant to work as a team.
Note that all top-level items in the zipfile are extracted as top-level
items in the "unzipto" folder. A zipfile that contains just files will
not create nested folders in "unzipto"; a zipfile with folders will.
Caution: top-level items may silently overwrite items in "unzipto",
even in "."; unzip to temporary folders to avoid unwanted overwrites.
See also the 1.2 "-zip@path" create option for collapsing zip paths.
Also note that ziptools assumes that path separators in zipfiles use
Unix '/' in accordance with the zip standard, and uses '/' in its own
creates (zips) on Windows. Tools which instead use '\' on Windows are
buggy and should be avoided; a '\' is a valid filename character on
Unix, and hence cannot be interpreted as a separator interoperably.
ABOUT LARGE FILES:
ziptools always uses the ZIP64 option of Python's zipfile module to
support files larger than zip's former size limits, both for zipping and
unzipping. Unix "unzip" may not. See zip-create.py for more details.
ABOUT PERMISSIONS:
(Former caveat: extracts here did not preserve Unix permissions due to
a Python zipfile bug; see extractzipfile() in ziptools/ziptools.py.)
UPDATE: as of [1.1], extracts (unzips) now do propagate Unix permissions
for files, folders, and symlinks, but only if this is requested with the
new "-permissions" argument. This should generally be used only on Unix
zipfiles; see ziptools/ziptools.py's extractzipfile() for more details.
Notes: some examples from prior releases may not show this new option;
permissions are requested on extracts only, not creations (which always
save permissions); and this option has no effect on filesystems that do
not support Unix-style permissions (including exFAT: you can copy a
zipfile to/from exFAT, but unzip on a different drive for permissions).
ABOUT MODTIMES:
(Former caveat: extracts here deferred to Python libraries to adjust
modtimes of zipped items for the local DST phase, which may or may not
have agreed with other tools, and did not address timezone changes.)
UPDATE: as of [1.2], ziptools now stores UTC timestamps for item
modtimes in zip extra fields, and uses them instead of zip's "local
time" on extracts. This means that modtimes of zipfiles zipped and
unzipped by ziptools are immune to changes in both DST and timezone.
For more details, see the README's "_README.html#utctimestamps12".
The former local-time scheme is still used for zipfiles without UTC.
ABOUT FILENAME MANGLES:
Filenames containing "|", "?", ":", and others are nonportable, and
cannot be saved on some filesystems by unzips. As of 1.3, this script
by default automatically mangles (sanitizes) names that fail on Windows
only, by replacing all nonportable filename characters with "_" and
trying to extract again.
This name mangling allows saves, but has a rare potential to overwrite
other files and break later syncs. To make this transparent, ziptools
reports mangles and their tallies in run output. To avoid mangles in
full, pass "-nomangle" and run the included fix-nonportable-filenames.py
to analyze and fix nonportable names manually before content transfers.
With "-nomangle", unmangled filenames with characters illegal on the
unzip target will fail to extract and be skipped with a message.
Shared storage in some versions of Android has filename constraints
similar to Windows, but no auto-mangling is performed on this platform
due to an Android 11 bug; run the fixer script before unzipping to
shared storage. See also _README.html#nomangle for more details.
CAVEAT - PORTABILITY:
Unzipped symlinks work on Windows, but don't retain modtimes on
that platform (they are stamped with the unzip time instead), due to
known/fixed limitations. See the _README.html's symlinks coverage.
UPDATE: as of [1.1], there's more thorough coverage of portability
issues like this in the README's "_README.html#Portability".
See zip-create.py for usage details on the zip-creation companion script.
See ziptools/ziptools.py's docstring for more on this script's utility.
Coding note: the "Do not localize" negative logic is too late to change...
=============================================================================
"""
from __future__ import print_function # py 2.X, currently optional here
import ziptools, sys, os
from __version__ import showVersion # [1.3] display version number
showVersion()
# portability
RunningOnPython2 = sys.version.startswith('2')
RunningOnWindows = sys.platform.startswith('win')
if RunningOnPython2:
input = raw_input # py 2.X compatibility
# avoid Windows Unicode printing errors by munging [1.2]
from ziptools import print
usage = 'Usage: ' \
'<python> zip-extract.py [zipfile [unzipto] [-nofixlinks] [-permissions] [-nomangle]]'
interactive = False
# see zip-create.py note about Windows icon clicks and exits
def error_exit(message):
print(message + ', run cancelled.')
print(usage)
if interactive and RunningOnWindows:
input('Press enter to close.') # clicked on Windows: stay up
sys.exit(1)
def okay_exit(message):
print(message + '.')
if interactive and RunningOnWindows:
input('Press Enter to close.') # ditto: stay open on Win
sys.exit(0) # or os.isatty(sys.std{in,out})
def reply(prompt=''):
if prompt: prompt += ' '
try:
return input(prompt) # exit gracefully on control+c [1.3]
except KeyboardInterrupt:
okay_exit('\nRun aborted by control-c')
# command-line mode
if len(sys.argv) >= 2: # 2 = script zipfile...
nofixlinks = permissions = nomangle = False
if '-nofixlinks' in sys.argv: # anywhere in argv
nofixlinks = True
sys.argv.remove('-nofixlinks')
if '-permissions' in sys.argv: # anywhere in argv
permissions = True
sys.argv.remove('-permissions')
if '-nomangle' in sys.argv: # anywhere in argv
nomangle = True
sys.argv.remove('-nomangle')
if len(sys.argv) not in [2, 3]:
error_exit('Too few or too many arguments')
zipfrom = sys.argv[1]
zipfrom += '' if zipfrom[-4:].lower() == '.zip' else '.zip'
unzipto = '.' if len(sys.argv) == 2 else sys.argv[2]
if unzipto.startswith('-'):
error_exit('Too few or too many arguments')
# interactive mode (e.g., some IDEs)
else:
interactive = True
zipfrom = reply('Zip file to extract?').strip() or '_default' # [1.1] +stp/dft
zipfrom += '' if zipfrom[-4:].lower() == '.zip' else '.zip'
unzipto = reply('Folder to extract in (use . for here) ?').strip() or '.'
nofixlinks = reply('Do not localize symlinks (y=yes)?').lower() == 'y'
permissions = reply('Retain access permissions (y=yes)?').lower() == 'y'
nomangle = reply('Do not mangle filenames (y=yes)?').lower() == 'y'
# use print() to avoid Unicode aborts in input() [1.2]
print("About to UNZIP\n"
"\t%s,\n"
"\tto %s,\n"
"\t%slocalizing any links,\n"
"\t%sretaining permissions,\n"
"\t%smangling filenames\n"
"Confirm with 'y'? "
% (zipfrom, unzipto,
'not ' if nofixlinks else '',
'not ' if not permissions else '',
'not ' if nomangle else ''),
end='')
verify = reply()
if verify.lower() != 'y':
okay_exit('Run cancelled')
# catch user errors asap [1.1]
if not os.path.exists(zipfrom):
error_exit('Zipfile "%s" does not exist' % zipfrom)
if not os.path.exists(unzipto):
# no need to create here: zipfile.extract() does os.makedirs(unzipto)
pass
else:
# in interactive mode, offer to clean target folder (ziptools.py doesn't);
# removing only items to be written requires scanning the zipfile: pass;
if (interactive and
reply('Clean target folder first (yes=y)?').lower() == 'y'):
# okay, but really?
if (unzipto in ['.', os.getcwd()] and
reply('Target = "." cwd - really clean (yes=y)?').lower() != 'y'):
# a very bad thing to do silently!
pass
else:
# proceed with cleaning
for item in os.listdir(unzipto):
itempath = os.path.join(unzipto, item)
if os.path.isfile(itempath) or os.path.islink(itempath):
os.remove(ziptools.FWP(itempath))
elif os.path.isdir(itempath):
ziptools.tryrmtree(itempath)
# the zip bit
stats = ziptools.extractzipfile(
zipfrom, unzipto,
nofixlinks=nofixlinks, permissions=permissions, nomangle=nomangle)
okay_exit('Extract finished: ' + str(stats)) # [1.1]