#!/usr/bin/env python3 # -*- coding: utf8 -*- """ ================================================================================ ziptools.py - the main library module of the ziptools system. See ziptools' ../_README.html for license, attribution, and other logistics. Tools to create and extract zipfiles containing a set of files, folders, and symbolic links. All functions here are callable, but the main top-level entry points are these two (see ahead for more on their arguments): createzipfile(zipname, [addnames], storedirs=True, cruftpatts={}, atlinks=False, trace=print, zipat=None, nocompress=False) extractzipfile(zipname, pathto='.', nofixlinks=False, trace=print, permissions=False, nomangle=False) Pass "trace=lambda *p, **k: None" to silence most messages from these calls. See also scripts zip-create.py and zip-extract.py for command-line clients, and zipcruft.cruft_skip_keep for a default "cruftpatts" cruft-file definition. All of these have additional documentation omitted here. This ziptools package mostly extends Python's zipfile module with top-level convenience tools that add some important missing features: * For folders, adds the folder's entire tree to the zipfile automatically * For zipfile creates, filters out cruft (hidden metadata) files on request * For zipfile extracts, retains original modtimes for files, folders, links * For symlinks, adds/recreates the link itself to/from zipfiles, by default * For Windows, supports long pathnames by lifting the normal length limit * For zipfile extracts, optionally retains access permissions for all items * For all items, adds UTC-timestamp modtimes immune to DST and timezone Docs which span creates and extracts (see these functions for more): CRUFT HANDLING: This script sidesteps other tools' issues with ".*" cruft files (metadata that is normally hidden): by default, they are not silently/implicitly omitted in zips here for completeness, but can be omitted by passing a filename-patterns definition structure to the optional "cruftpatts" argument. See zipcruft.py for pattern defaults to import and pass, and zipfile-create.py for more background. Most end-user zips should skip cruft files (see Mergeall: cruft can be a major issue on Mac OS in data to be transferred elsewhere). WINDOWS LONG PATHS: This program, like Mergeall, supports pathnames that are not restricted to the usual 260/248-character length limit on all versions of Windows. To lift the limit, pathnames are automatically prefixed with '\\?\' as needed, both when adding to and extracting from archives. This allows archives created on platforms without such limits to be unzipped and rezipped on Windows machines. The \\?\-prefix magic is internal, and hidden from the user as much as possible. It also makes paths absolute, but relative paths are still fully supported: absolute paths are not propagated to the archive when creating, and impact only output messages on Windows when extracting (where we strip the prefix and try to map back to relative as needed to keep the messages coherent). SYMLINKS SUPPORT: This package also supports adding symlinks (symbolic links) to and extracting them from zip archives, on both Unix and Windows with Python 3.X, but only on Unix with Python 2.X. Windows requires admin permissions and NTFS filesystem destinations to create symlinks from a zip file; Unix does not. The underlying Python zipfile module doesn't support symlinks directly today, short of employing the very low-level magic used in ziptools_symlinks.py here, and there is an open bug report to improve this: https://bugs.python.org/issue18595 https://mail.python.org/pipermail/python-list/2005-June/322179.html https://duckduckgo.com/?q=python+zipfile+symlink Symlinks customize messages with "~" characters in creation and "(Link)" prefixes in extraction, because they are a special-enough case to call out in logs, and may require special permission and handling to unzip and use on Windows. For example, link creation and extraction messages are as follows: Adding link ~folder test1/dirlink # create message (Link) Extracted test1/dirlink # extract message By default, zipfile creation zips links themselves verbatim, not the items they refer to. Pass True to the "atlinks" function argument to instead follow links and zip the items they refer to. Unzipping restores whatever was zipped. When links are copied verbatim, extracts adjust the text of a link's path to use the hosting platform's separators - '\' for Windows and '/ for Unix. This provides some degree of link portability between Unix and Windows, but is switchable with "nofixlinks" because it may not be desirable in all contexts (e.g., when unzipping to a drive to be used elsewhere). Symlinks will still be nonportable if they contain other platform-specific syntax, such as Windows drive letters or UNC paths, or use absolute references to extra-archive items. When "atlinks" is used to follow links and copy items they refer to, recursive links are detected on platforms and Pythons that support stat objects' st_ino (a.k.a. inode) unique directory identifiers. This includes all Unix contexts, and Windows as of Python 3.2 (other contexts fails on path or memory errors). Recursive links are copied themselves, verbatim, to avoid loops and errors. Besides symlinks, FIFOs and other exotic items are always skipped and ignored; items with system state can't be transferred in a zipfile (or Mergeall mirror). See also the top-level README for more on symlinks. Due to known limitations, symlinks on Windows function correctly but do not retain original modtimes. PYTHON SYMLINKS SUPPORT: The following table summarizes Python's current symlink support, which wholly determines that of ziptools. In it, IO means basic os.{readlink, symlink} read/write calls, lchmod is the os module's permissions writer, and S_F_S stands for the os.supports_follow_symlinks registration table of tools that can work on symlinks instead of their referents. utime is required for modtimes, and chmod or lchmod for permissions, and there is no lutime: Platform Python IO lchmod S_F_S S_F_S content Windows 2.X no no no n/a Windows 3.X yes no yes os.stat only Unix 2.X yes yes no n/a Unix 3.X yes yes yes os.{utime, chmod,...} Per this table, Unix gets the best coverage, though its 2.X cannot propagate symlink modtimes because it has no symlink utime, and must use lchmod instead of chmod. Windows support is spottier: 2.X has none, and 3.X cannot update modtimes or permissions. Worse, os.path.islink() simply returns False for symlinks on Windows+Python 2.X, which means symlinks cannot be detected, and are followed on zips (creates); use 3.X if you care. Nits: Python 3.X has S_F_S only in 3.3 and later, and all of this is may change (and hopefully will...). PERMISSIONS SUPPORT: As of ziptools version [1.1], extracts (unzips) now propagate Unix permissions for files, folders, and symlinks. Due to interoperability concerns, however, this option only works if it is explicitly selected, and should generally be used only for Unix from+to. See the extract function below for more details. FAILURES POLICY: Per the user guide, ziptools' general policy is to report but ignore failures of permissions, modtimes, and symlinks (because they are metadata), but end the run for failures of files and folders (because they are core data). ================================================================================ """ from __future__ import print_function # py 2.X compatibility import os, sys, shutil from zipfile import ZipFile # stdlib base support from zipfile import ZIP_DEFLATED, ZIP_STORED # compressed, or not # nested package so same if used as main script or package in py3.X # default cruft-file patterns, import here for importers ({}=don't skip) from .zipcruft import cruft_skip_keep, isCruft # [1.1] isCruft moved # major workaround to support links: split this narly code off to a module... from .zipsymlinks import addSymlink, isSymlink, extractSymlink # also major: fix any too-long Windows paths on archive adds and extracts from .ziplongpaths import FWP, UFWP # UTC timestamp zip/unzip [1.2] from .zipmodtimeutc import addModtimeUTC, getModtimeUTCorLocal # interoperability nits [1.1] RunningOnPython2 = sys.version.startswith('2') RunningOnMacOS = sys.platform.startswith('darwin') RunningOnWindows = sys.platform.startswith('win') #=============================================================================== _builtinprint = print def print(*pargs, **kargs): r""" ----------------------------------------------------------------------- [1.1] Avoid print() exceptions on Python 2.X and Windows. This code redefines print for the rest of this module only, but the custom print becomes the default for "trace" arguments here; you can import this and pass it back in, but shouldn't need to ("trace" is mostly for silencing). This code addresses two potentials for aborts (print() exceptions): 1) On Python 2.X, prints of non-ASCII Unicode to a pipe on Unix can throw a to-ASCII encoding exception, even though the same print to the console works fine. Work around by manually encoding unicode arguments to UTF-8 when printed. This was escalated by forcing unicode filenames for zip interoperability, but the exceptions also happened if 2.X unzipped a 3.X zip and received decoded unicode for non-ASCII names. 2) On Windows, printing Unicode is perilous in general. The user may have set PYTHONIOENCODING to UTF-8, but this is too much to require; the Windows default CP437 cannot be assumed because it may not apply on the host; and printing unsupported characters triggers an encoding exception in any case. To avoid exceptions, replace all non-ASCII characters for message display only, and specialize for 3.X / 2.X str (e.g., 'Li[\u241][\u241]ux.png' / 'Li[\xC3][\xB1][\xC3][\xB1]ux.png'). This could use ascii() in 3.X (only), but opted to match 2.X displays. mungestr2X doesn't work on 3.X bytes, but this doesn't have to care. [1.2] This is now also imported and used by interactive-mode prints in main scripts, to avoid similar (though unlikely) non-ASCII print aborts. This applies only to prints on Windows in this context: input() text is a str in 2.X, and so need not be down-converted from unicode. Creating the abort required pasting a Unicode filename into scripts. ----------------------------------------------------------------------- """ usersetting = os.environ.get('PYTHONIOENCODING') # not currently used # 3.X str: decoded codepoints mungestr3X = \ lambda s: u''.join(c if ord(c) <= 127 else ('[\\u%d]' % ord(c)) for c in s) # 2.X str: encoded bytes mungestr2X = \ lambda s: b''.join(c if ord(c) <= 127 else ('[\\x%X]' % ord(c)) for c in s) if RunningOnPython2: pargs = [parg.encode('UTF-8') if type(parg) is unicode else parg for parg in pargs] if RunningOnWindows: pargs = [mungestr2X(parg) if type(parg) is str else parg for parg in pargs] elif RunningOnWindows: pargs = [mungestr3X(parg) if type(parg) is str else parg for parg in pargs] try: _builtinprint(*pargs, **kargs) except UnicodeEncodeError: print('--Cannot print filename: message skipped') # we tried; punt! #=============================================================================== def tryrmtree(folder, trace=print): """ ----------------------------------------------------------------------- Utility: remove a folder by pathname if needed before unzipping to it. Optionally run by zip-extract.py in interactive mode, but not by the base extractzipfile() function here: manually clean targets as needed. Python's shutil.rmtree() can sometimes fail on Windows with a "directory not empty" error, even though the dir _is_ empty when inspected after the error, and running again usually fixes the problem (deletes the folder successfully). Bizarre, yes? See the rmtreeworkaround() onerror handler in Mergeall's backup.py for explanations and fixes. rmtree() can also fail on read-only files, but this is likely intended by users. Update: rmtree() can also fail if macOS auto-deletes an AppleDouble file first, and Windows fails because its deletes may not be atomic. This matters only in interactive zip-extract.py and self-test.py here. ----------------------------------------------------------------------- """ if os.path.exists(FWP(folder)): trace('Removing', folder) try: if os.path.islink(FWP(folder)): os.remove(FWP(folder)) else: shutil.rmtree(FWP(folder, force=True)) # recurs: always \\?\ except Exception as why: print('shutil.rmtree (or os.remove) failed:', why) input('Try running again, and press Enter to exit.') sys.exit(1) #=============================================================================== def isRecursiveLink(dirpath): """ ----------------------------------------------------------------------- Use inodes to identify each part of path leading to a link, on platforms that support inodes. All Unix/Posix do, though Windows Python doesn't until till 3.2 - if absent, allow other error to occur (there are not many more options here; on all Windows, os.path.realpath() is just os.path.abspath()). This is linearly slow in the length of paths to dir links, but links are exceedingly rare, "atlinks" use in ziptools may be rarer, and recursive links are arguably-invalid data. Recursion may be better than os.walk when path history is required, though this incurs overheads only if needed as is. dirpath does not have a \\?\ Windows long-path prefix here; FWP adds one and also calls abspath() redundantly - but only on Windows, and we need abspath() on other platforms too. ----------------------------------------------------------------------- """ trace = lambda *args: None # or print to watch # called iff atlinks: following links if (not os.path.islink(FWP(dirpath)) or # dir item not a link? os.stat(os.getcwd()).st_ino == 0): # platform has no inodes? return False # moot, or hope for best else: # collect inode ids for each path extension except last inodes = [] path = [] parts = dirpath.split(os.sep)[:-1] # all but link at end while parts: trace(path, parts) path += [parts[0]] # add next path part parts = parts[1:] # expand, fetch inode thisext = os.sep.join(path) thispath = os.path.abspath(thisext) inodes.append(os.stat(FWP(thispath)).st_ino) # recursive if points to item with same inode as any item in path linkpath = os.path.abspath(dirpath) trace(inodes, os.stat(FWP(linkpath)).st_ino) return os.stat(FWP(linkpath)).st_ino in inodes #=============================================================================== def isRecursiveLink0(dirpath, visited): """ ----------------------------------------------------------------------- ABANDONED, UNUSED: realpath() cannot be used portably, because it is just abspath() on Windows Python (but why?). Trap recursive links to own parent dir, but allow multiple non-recursive link visits. The logic here is as follows: If we've reached a link that leads to a path we've already reached from a link AND we formerly reached that path from a link located at a path that is a prefix of the new link's path, then the new link must be recursive. No, really... Catches link at visit #2, but avoids overhead for non-links. ----------------------------------------------------------------------- """ # called iff atlinks: following links if not os.path.islink(dirpath): # skip non-links return False # don't note path else: # check links history realpath = os.path.realpath(dirpath) # dereference, abs #print('\t', dirpath, '\n\t', realpath, sep='') if (realpath in visited and any(dirpath.startswith(prior) for prior in visited[realpath])): return True else: # record this link's visit visited[realpath] = visited.get(realpath, []) # add first or next visited[realpath].append(dirpath) return False #=============================================================================== class CreateStats: """ ----------------------------------------------------------------------- Helper for recursive create (zip) stats counters [1.1]. May also pass same mutable instance instead of using +=. ----------------------------------------------------------------------- """ attrs = 'files', 'folders', 'symlinks', 'unknowns', 'crufts' def __init__(self): for attr in self.attrs: setattr(self, attr, 0) # or exec() strs def __iadd__(self, other): # += all attrs in place for attr in self.attrs: setattr(self, attr, getattr(self, attr) + getattr(other, attr)) return self def __repr__(self, format='%s=%%d'): display = ', '.join(format % attr for attr in self.attrs) return display % tuple(getattr(self, attr) for attr in self.attrs) class ExtractStats(CreateStats): """ ----------------------------------------------------------------------- Extract (unzip) stats: unknowns unlikely, no crufts or recursion. [1.3] Add mangled and skipped, but don't display unless nonzero. ----------------------------------------------------------------------- """ attrs = ['files', 'folders', 'symlinks', 'unknowns', 'mangled', 'skipped'] def __repr__(self, format='%s=%%d'): """ Don't show mangled or skipped if 0: rare and too much info """ self.attrs = ExtractStats.attrs[:] # .copy(), but work in py 2.X for attr in ['mangled', 'skipped']: if getattr(self, attr) == 0: self.attrs.remove(attr) return CreateStats.__repr__(self, format) def _testCreateStats(): x = CreateStats() print(x) # files=0, folders=0, symlinks=0, unknowns=0, crufts=0 x.files += 1; x.folders += 2; x.symlinks += 3 print(x) # files=1, folders=2, symlinks=3, unknowns=0, crufts=0 y = CreateStats() y.folders += 10; y.unknowns += 20 x += y print(x) # files=1, folders=12, symlinks=3, unknowns=20, crufts=0 #=============================================================================== def addEntireDir(thisdirpath, # pathname of directory to add (rel or abs) zipfile, # open zipfile.Zipfile object to add to stats, # counters instance, same at all levels [1.1] thiszipatpath, # modified pathname if zipat/zip@ used [1.2] storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print): # trace message router (or lambda *p, **k: None) """ ----------------------------------------------------------------------- Add the full folder at thisdirpath to zipfile by adding all its parts. Python's zipfile module has extractall(), but nothing like an addall() (apart from simple command-line use). The top-level createzipfile() kicks off the recursion here, and docs more of this function's utility. ADDING DIRS: Dirs (a.k.a. folders) don't always need to be written to the zipfile themselves, because extracts add all of a file's dirs if needed (with os.makedirs(), in Python's zipfile module and the local zipsymlinks module). Really, zipfiles don't have folders per se - just individual items with pathnames and metadata. However, dirs MUST be added to the zipfile themselves to either: 1) Retain folders that are empty in the original. 2) Retain the original modtimes of folders (see extract below). When added directly, the zipfile records folders as zero-length items with a trailing "/", and recreates the folder on extracts as needed. Disable folder writes with "storedirs" if this proves incompatible with other tools (but it works fine with WinZip). Note that the os.walk()'s files list is really all non-dirs (which may include non-file items that should likely be excluded on some platforms), and non-link subdirs are always reached by the walker. Dir links are returned in subdir list, but not followed by default. [Update: per ahead[*], os.walk() was later replaced here with an explicit-recursion coding, which visits directories more directly.] SYMLINKS: If atlinks=True, this copies items links reference, not links themselves, and steps into subdirs referenced by links; else, it copies links and doesn't follow them. For links to dirs, os.walk() yields the name of the link (not the dir it references), and this is the name under which the linked subdir is stored in the zip if atlinks (hence, dirs can be present in multiple tree locations). For example, if link 'python' references dir 'python3', the latter is stored under the former name. [Update: the non-os.walk() recoding per ahead[*] behaves this same way.] This also traps recursive link paths to avoid running into memory errors or path limits, by using stat object st_ino unique identifiers to discern loops from valid dir repeats, where inode ids are supported. For more on recursive links detection, see isRecursiveLink() above. For more details on links in os.walk(), see docetc/symlinks/demo*.txt. WINDOWS LONG PATHS: On Windows, very long paths are supported by prefixing all file-tool call paths with '\\?' and making them absolute, and passing these on to zipfile and zipsymlinks APIs for use in file-tool calls. Names without \\?\ or absolute mapping are passed for use in the archive itself; this is required to support relative paths in the archive itself -- if not passed, archive names are created from filenames by running filenames though os.path.splitdrive() which drops the \\?\, but this does not translate from absolute back to relative (when users pass relative). [*]THIS ALSO required replacing the former os.walk() coding with explicit (manual) recursion. os.walk() required the root to have a just-in-case FWP() prefix to support arbitrary depth; which made os.walk() yield dirs that were always \\?\-prefixed and absolute; which in turn made all paths absolute in the zip archive. Supporting relative zip paths AND long-paths requires either explicit recursion (used here) or an os.walk() coding with abs->rel mapping (which is possible, but may be preclusive: see the message display code in the extract ahead). Nit: the explicit-recursion recoding changes the order in which items are visited and added - it's now alphabetical per level on Mac OS HFS, instead of files-then-dirs (roughly). This order is different but completely arbitrary: it impacts the order of messages output, but not the content or utility of the archive zipfile generated. For the prior os.walk() variant, see ../docetc/longpaths/prior-code. Also nit: in the explicit-recursion recoding, links that are invalid (do not point to an existing file or dir) are now an explicit case here. Specifically, links to both nonexistent items and non-file/dir items are added to the zipfile, despite their rareness, and even if "-atlinks" follow-links mode is used and the referent cannot be added. This is done in part because Mergeall and cpall propagate such links too, but also because programs should never silently drop content for you: invalid links may have valid uses, and may refer to items present on another machine. The former os.walk()-based version added such links just because that call returns dirs and non-dirs, and invalid links appear in the latter. Also also nit: more clearly a win, the new coding reports full paths to cruft items; it's difficult to identify drops from basenames alone. See folder _algorithms here for alternative codings for this function. ----------------------------------------------------------------------- """ # # handle this dir # if storedirs and thisdirpath != '.': # add folders too stats.folders += 1 trace2('Adding folder', thisdirpath, thiszipatpath, trace) zipfile.write(filename=FWP(thisdirpath), # fwp for file tools arcname=thiszipatpath) # not \\?\ + abs, -zip@? addModtimeUTC(zipfile, FWP(thisdirpath)) # UTC modtimes [1.2] # # handle items here # for itemname in os.listdir(FWP(thisdirpath)): # list (fixed windows) path itempath = os.path.join(thisdirpath, itemname) # extend real provided path zipatpath = os.path.join(thiszipatpath, itemname) # possibly munged path [1.2] # # handle subdirs (and links to them) # if os.path.isdir(FWP(itempath)): if isCruft(itemname, cruftpatts): # match name, not path # skip cruft dirs stats.crufts += 1 trace('--Skipped cruft dir', itempath) elif atlinks: # following links: follow? + add if isRecursiveLink(itempath): # links to a parent: copy dir link instead stats.symlinks += 1 trace('Recursive link copied', itempath) addSymlink(FWP(itempath), zipatpath, zipfile, trace) else: # recur into dir or link addEntireDir(itempath, zipfile, stats, zipatpath, storedirs, cruftpatts, atlinks, trace) else: # not following links if os.path.islink(FWP(itempath)): # copy dir link stats.symlinks += 1 trace2('Adding link ~folder', itempath, zipatpath, trace) addSymlink(FWP(itempath), zipatpath, zipfile, trace) else: # recur into dir addEntireDir(itempath, zipfile, stats, zipatpath, storedirs, cruftpatts, atlinks, trace) # # handle files (and links to them) # elif os.path.isfile(FWP(itempath)): if isCruft(itemname, cruftpatts): # skip cruft files stats.crufts += 1 trace('--Skipped cruft file', itempath) elif atlinks: # following links: follow? + add stats.files += 1 trace2('Adding file ', itempath, zipatpath, trace) zipfile.write(filename=FWP(itempath), # fwp for file tools arcname=zipatpath) # not \\?\ + abs, -zip@? addModtimeUTC(zipfile, FWP(itempath)) # UTC modtimes [1.2] else: # not following links if os.path.islink(FWP(itempath)): # copy file link stats.symlinks += 1 trace2('Adding link ~file', itempath, zipatpath, trace) addSymlink(FWP(itempath), zipatpath, zipfile, trace) else: # add simple file stats.files += 1 trace2('Adding file ', itempath, zipatpath, trace) zipfile.write(filename=FWP(itempath), # fwp for file tools arcname=zipatpath) # name in archive, -zip@? addModtimeUTC(zipfile, FWP(itempath)) # UTC modtimes [1.2] # # handle non-file/dir links (to nonexistents or oddities) # elif os.path.islink(FWP(itempath)): if isCruft(itemname, cruftpatts): # skip cruft non-file/dir links stats.crufts += 1 trace('--Skipped cruft link', itempath) else: # copy link to other: atlinks or not stats.symlinks += 1 trace2('Adding link ~unknown', itempath, zipatpath, trace) addSymlink(FWP(itempath), zipatpath, zipfile, trace) # # handle oddities (not links to them) # else: # ignore cruft: not adding this stats.unknowns += 1 trace('--Skipped unknown type:', itempath) # skip fifos, etc. # goto next item in this folder #=============================================================================== def zipatmunge(sourcepath, zipat): """ ----------------------------------------------------------------------- [1.2] If zipat is not None, replace the entire dir path in sourcepath with the zipat path string. This implements the "-zip@path" switch and function argument added in 1.2 to allow zipped paths (and hence later unzip paths) to be shortened, expanded, or dropped altogether. Subtle things: - This is called for sources at the top level of a create only; the tree-walk recursion extends the munged path at each level - sourceroot may be empty or '.' for source items in the CWD - zipat may be '.' for no nesting, and may be empty (same as '.') - The long-path prefix for Windows is not part of sourcepath here - For border cases, this relies on the fact that os.path.split('z') returns ('', 'z'), and os.path.join('', 'z') returns 'z'. ----------------------------------------------------------------------- """ if zipat is None: return sourcepath # zip@ not used, or zipat not passed assert isinstance(zipat, str) zipat = zipat.rstrip(os.sep) # drops trailing slash sourceroot, sourceitem = os.path.split(sourcepath) # ditto, but implicit if sourceroot == '': # source has no path: return os.path.join(zipat, sourcepath) # concat zipat, if any elif zipat in ['.', '']: # zipat is '.' or '': return sourceitem # rm root path, if any else: # else replace root path return sourcepath.replace(sourceroot, zipat, 1) # but just at the front #=============================================================================== def trace2(message, filepath, zipatpath, trace): """ ----------------------------------------------------------------------- [1.2] Used by creates. Now that zipat allows zip paths to vary from original file paths, show the zip path in output to clarify what was truly zipped. This generally makes a post-zip list unnecessary to see the create's results in the zipfile. The extra line is not printed if the before/after paths are the same (which matches pre-1.2 output); and it apes the '=>' format used for extracts (which still always show a second line, because target path is more crucial to disclose, even if it == zip path [but see trace3() ahead: extracts now collapse same-path output lines too, for space]). This also parrots most of the zipfile module's (and zipsymlinks.py's) transforms to zipatpath, to avoid spurious diffs (e.g., './x' != 'x'), and make the extra line's zip path match that of post-zip listings. Avoids pretest '\' => '/' on Windows to minimize mismatches/lines, but the goal is a bit gray: true zip paths, or just flag -zip@ diffs? ----------------------------------------------------------------------- """ # mimic what zipfile will do arcname = os.path.splitdrive(zipatpath)[1] arcname = os.path.normpath(arcname) arcname = arcname.lstrip(os.sep + (os.altsep or '')) # but not this: filepath still has '\' on Windows! # arcname = arcname.replace(os.sep, "/") trace(message, filepath) if arcname != filepath: trace('\t\t=> %s' % arcname) # sans leading '/\', '.', most '..', '\', 'c:' #=============================================================================== def createzipfile(zipname, # pathname of new zipfile to create addnames, # sequence of pathnames of items to add storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print, # trace message router (or lambda *p, **k: None) zipat=None, # alternate root zip path for all items [1.2] nocompress=False): # store uncompressed in zipfile for speed [1.3] """ ----------------------------------------------------------------------- Make a zipfile at path "zipname" and add to it all folders and files in "addnames". Its relative or absolute pathnames are propagated to the zipfile, to be used as path suffix when extracting to a target dir. See extractzipfile(), ../zip-create.py, and ../zip-extract.py for more docs on the use of relative and absolute pathnames for zip sources. Pass "trace=lambda *args: None" for silent operation. See function addEntireDir() above for details on "storedirs" (its default is normally desired), and ahead here for "cruftpatts" and "atlinks" (their defaults include all cruft files and folders in the zip, and copy links instead of the items they reference, respectively). This always uses ZIP_DEFLATED, the "usual" zip compression scheme, and the only one supported in Python 2.X (ZIP_STORED is uncompressed). Python's base zipfile module used here supports Unicode filenames automatically (encoded per UTF8). Python's base zipfile module also ensures that path separators in the zipfile always use Unix '/'. UPDATE: 1.3 now also allows compression to be turned off for zip speed, using ZIP_STORED if nocompress=True here (or -nocompress in zip-create.py). Most useful for very large archives: in a 208G use case, a zip takes 1.5 hours with compression, and just 22 minutes without (208G vs 195G zipfile). [1.1] This now returns a CreateStats with #files/folders/symlinks/unknowns and a repr for display when used in shell scripts (see class above). [1.1] ziptools run on Python 2.X forces filenames to unicode, so 2.X's zipfile module stores non-ASCII filenames in zips more portably; this avoids munged names in unzips run on 3.X and other tools. For details, see doctetc/1.1-upgrades/py-2.X-fixes.txt. WILDCARDS [1.1] Unlike the "../zip-create.py" command-line script, this function does not auto-glob addnames with unexpanded "*" (and other) operators. Use Python's glob.glob() to expand names as needed before calling here, and see the script and top-level README for pointers (it's a one-liner). CRUFT: By default, all files and folders are added to the zip. This is by design, because this code was written as a workaround for WinZip's silent file omissions. As an option, though, this function will instead skip normally-hidden cruft files and folders (e.g., ".*") much like Mergeall, so they are not added to zips used to upload websites or otherwise distribute or transfer programs and data. To enable cruft skipping, pass to cruftpatts a dictionary of this form: {'skip': ['pattern', ...], 'keep': ['pattern', ...]} to define fnmatch filename patterns for both items to be skipped, and items to be kept despite matching a skip pattern (e.g., ".htaccess"). If no dictionary is passed, all items are added to the zip; if either list is empty, it fails to match any file. See zipcruft.py for more details, and customizable presets to import and pass to cruftpatts (the default is available as "cruft_skip_keep" from this module too). SYMLINKS: Also by default, if symbolic links are present, they are added to the zip themselves - not the items they reference. Pass atlinks=True to instead follow links and zip the items they reference. This also traps recursive links if atlinks=True, where inodes are supported; see isRecursiveLink() above for more details. As of version [1.1], creates now also properly set per-link permission bits in zipfiles, for extracts. LARGE FILES: allowZip64=True supports files of size > 2G with ZIP64 extensions, that are supported unevenly in other tools, but work fine with the create and extract tools here. It's True by default in Python 3.4+ only; a False would prohibit large files altogether, which avoids "unzip" issues but precludes use in supporting tools. Per testing, some Unix "unzip"s fail with large files made here, but both the extract here and Mac's Finder-click unzips handle them well. Split zips into smaller parts iff large files fail in your tools, and you cannot find or install a recent Python 2.X or 3.X to run ziptools. Example publish-halves.py in learning-python.com/genhtml has pointers. ----------------------------------------------------------------------- """ trace('Zipping', addnames, 'to', zipname) if cruftpatts: trace('Cruft patterns:', cruftpatts) stats = CreateStats() # counts [1.1] # # handle top-level items # compress = ZIP_STORED if nocompress else ZIP_DEFLATED zipfile = ZipFile(zipname, mode='w', compression=compress, allowZip64=True) for addname in addnames: # force Unicode in Python 2.X so non-ASCII interoperable [1.1] if RunningOnPython2: try: addname = addname.decode(encoding='UTF-8') # same as unicode() except: trace('**Cannot decode "%s": skipped' % addname) continue # change zipped paths for top-level sources if -zip@/zipat [1.2] zipatpath = zipatmunge(addname, zipat) if (addname not in ['.', '..'] and isCruft(os.path.basename(addname), cruftpatts)): stats.crufts += 1 trace('--Skipped cruft item', addname) elif os.path.islink(FWP(addname)) and not atlinks: stats.symlinks += 1 trace2('Adding link ~item', addname, zipatpath, trace) addSymlink(FWP(addname), zipatpath, zipfile, trace) elif os.path.isfile(FWP(addname)): stats.files += 1 trace2('Adding file ', addname, zipatpath, trace) zipfile.write(filename=FWP(addname), arcname=zipatpath) addModtimeUTC(zipfile, FWP(addname)) # UTC modtimes [1.2] elif os.path.isdir(FWP(addname)): addEntireDir(addname, zipfile, stats, zipatpath, storedirs, cruftpatts, atlinks, trace) else: # fifo, etc. stats.unknowns += 1 trace('--Skipped unknown type:', addname) zipfile.close() return stats # [1.1] printed at shell #=============================================================================== def showpath(pathto, pathtoWasRelative): """ ----------------------------------------------------------------------- Extract helper: for message-display only, and on Windows only, try to undo the \\?\ prefix and to-absolute mapping for paths. This may or may not be exactly what was given, but is better than always showing an absolute path in messages, and avoiding the just-in-case FWP() described in extract() would require an extensive extract() rewrite. This used to be a nested function; it probably shouldn't have been. [1.3] Python's os.path.relpath() raises an exception on Windows if pathto is on a different drive than an optional second argument which defaults to the current working directory (e.g., pathto on D: but the console in C:). This can arise only when pathto was relative (e.g., for "D:folder" but not for "D:\folder"). Since this is rare and used only for message display (and Python doesn't support a CWD per drive on Windows), skip the exc and show the full path. ----------------------------------------------------------------------- """ if RunningOnWindows: pathto = UFWP(pathto) # strip \\?\ if pathtoWasRelative: try: pathto = os.path.relpath(pathto) # relative to '.' except: pass # abondon ship [1.3] return pathto #=============================================================================== def trace3(zippath, unzippath, trace): """ ----------------------------------------------------------------------- [1.3] Used by extracts. To reduce output volume, ziptools 1.3 (September 2021) now prints just one output line for items whose zipfile and unzip-device pathnames are the same. This happens whenever a zipped item is extracted directly to "." (the CWD), instead of another target given in the command or function call. This mimics the single/double-line output for creates in trace2(). os.path.normpath() might suffice for Windows, but seems overkill; os.path.normcase() might help too, but this is a slippery slope, and this is just a cosmetic issue in output lines. Could path separators in zippath (the zipfile) be '\' on Windows? Yes, but it would reflect a bug in the zip tool used there: path separators in zipfiles must be Unix '/' per the zip standard, and '\' is a valid filename character on Unix. The code here works either way, and a worst-case miss just means 2 output lines. ziptools creates always use '/' on Windows; other zips should too. ----------------------------------------------------------------------- """ # folders: drop trailing slash in zipfile to compare zippathX = zippath.rstrip('/\\') # or r'\/' # Windows: match Unix slashes in zipfile to compare if RunningOnWindows and '/' in zippath: unzippathX = unzippath.replace('\\', '/') else: unzippathX = unzippath if zippathX == unzippathX: # new lite format trace('Extracted %s' % zippath) else: # original format trace('Extracted %s\n\t\t=> %s' % (zippath, unzippath)) #=============================================================================== # Disable zipfile's auto-mangle on Windows. There is no good way to # customize its private _sanitize method, so do this bad way instead. # A class with __getitem__ that raises LookupError is likely slower. ZipFile._windows_illegal_name_trans_table = {None: None} def trymangle(zipinfo, pathto, nomangle=False, trace=print): """ ----------------------------------------------------------------------- [1.3] On extract errors, see if nonportable filename characters could be to blame, and if so try again with all of these replaced with "_" (e.g., 'a|b?c.ext' becomes 'a_b_c.ext') to appease unzip filesystems. This is attempted only on Windows. Android 11 shared storage has a bug which precludes mangling; Linux writes to FAT32 and exFAT drives are allowed to fail with skips to avoid perpetual diffs; and macOS silently munges to/from Unicode privates on FAT32 and exFAT drives. In most cases, fix-nonportable-filenames.py is the better option. Return False if mangling has been disabled or there are no characters to mangle, and True if mangling has been applied to any parts of the unzip pathname in the zipfile. The zipfile.filename must be changed in place here, because zipfile.extract() pulls it from there only; this requires passing the original name in some contexts (symlinks). Mangled names are reported in both run output and tallies. Items that still fail after this attempt are now reported as well and skipped so the rest of the archive is made available. As a bonus, the new coding now mangles symlink names as needed too. Rationale ========= ziptools disables the underlying zipfile module's filename mangling (see _windows_illegal_name_trans_table above), and performs it here instead on failures. The rationale for this is described in the user guide, available both in this module's package and online: - ../_README.htm;#nomangle - learning-python.com/ziptools/ziptools/_README.html#nomangle In short, Python's zipfile module always silently mangles all items' names this way on Windows, despite its potential to overwrite files and break syncs back to the source. The primary purpose for catching and handling mangling here is to make this both explicit and optional: mangles are reported and tallied, and -nomangle turns them off in full. This update also attempted to support mangling for Android shared storage that emulates FAT32, but passed due to a bug in Android 11: folders with nonportable characters are created but accept no files, mangling the full path here would leave doppelgänger folders, and removing or renaming nonportable paths may drop used content. ziptools also provides a fixer script, fix-nonportable-filenames.py, which can be run prior to transferring content to analyze or mangle names, and avoid interoperability issues completely. This is the recommended alternative for Android shared; here, mangling must be limited to Windows, to avoid generating bogus folders on Android. Illegal characters ================== In all cases, over-aggressive mangling just means that '_' replacements will be applied; under-aggressive will result in item skips and reports. Path separators '/' and '\' are subtle, implicit, and special cases: - On Windows, any '/' recorded in the zipfile path are changed to '\' by a prestep, and any '\' are then consumed by a pathname split; hence, both are treated as path separators, and neither will be mangled. This works whether '\' are from the prestep or recorded in the zipfile, and the prestep is performed in zipfile too so this can do no different. - On Unix, '/' is consumed by a pathname split and hence won't be mangled. '\' is legal on Unix and not treated as a path separator; it would be mangled here, but won't cause failures on Unix that trigger this code. - On Android shared storage, '/' is consumed by a pathname split because it's Unix (really, not Windows), but '\' may fail in FUSE drivers. Due to a bug in Android 11, no mangling is performed on Android, so any remaining '\' will trigger failures and skips (of files, not folders). Due to Android's convolution, it's recommended to run the provided filename fixer script before unzipping to its shared storage. More on Backslashes ==================== When a filename contains a backslash on Unix, code in both ziptools and the underlying Python zipfile module it uses will treat the '\' as a path separator on Windows, and create a subdirectory on extracts. This is a flaw in Python's module (https://bugs.python.org/issue36534), so we can do no better here - Python's module will make subdirectories for names that don't need to be mangled and hence never reach ziptools' code, and ziptools mangling on failures must be consistent with this. For reference, here is the current code link to Python's extract code: https://github.com/python/cpython/ blob/99495b8afffdc62145598516dbdf99e64b6249bd/Lib/zipfile.py#L1651 Although this might be improved by mangling '\' before all '/' are replaced with '\' for Windows, this would break invalid zipfile entries which use '\' as a separator, and must await a fix in Python itself in any event. Running the fixer script is the best work-around for this today, and avoids other issues in Python's mangling; back-sync and file-overwrite problems are inherent in any name-mangling scheme. Coding notes ============ This is coded as a brute-force, last-ditch effort to extract a failed item, and is followed by a simple skip if this fails too. The skip is better than unzip termination as before (the rest of the archive can be extracted), though it can leave incomplete data; users should check the tallies at the end of run output for skips, and inspect run messages. Mangling is attempted only after unmangled-name failure, because the underlying Python zipfile module's code is not easily customized, it's difficult to get and interpret a path's filesystem to mangle up front, and the exceptions may vary. For more details, see the Python demos: - docetc/illegal-filenames-demo-1.3/py-windows-ntfs-illegal.txt - docetc/illegal-filenames-demo-1.3/py-android11-shared-app-illegal.txt In brief, filename rules vary slightly between Windows and Android; Android shared storage limits names but its app-specific storage does not; Windows limits folder names but Android shared storage does not; and the exceptions raised on errors differ between the two platforms. Subtle: unlike fix-nonportable-filenames.py, this cannot add a numeric suffix to make mangled names unique: unzips allow existing folders' content to be overwritten, and we cannot tell here if a collision is a mangling accident, or a user-intended overwrite. Yes, yuck. Caveat: like the UTC timestamps extension, this code is tightly bound to the current coding of Python's zipfiles; changes may break this. ----------------------------------------------------------------------- """ if not RunningOnWindows: # only mangle on Windows: Android shared storage botches folders return False elif nomangle: # no changes if disabled in ziptools command or call return False else: # split and consume path separators zippath0 = zipinfo.filename # unzip path recorded in zipfile zippath1 = zippath0.replace('/', os.path.sep) # unix+android no-op, windows /=>\ zippath2 = os.path.splitdrive(zippath1)[1] # drop a c: on windows else :=>_ zipparts = zippath2.split(os.path.sep) # unix+android on /, windows on \ # illegal chars nonportables = ' \x00 / \\ | < > ? * : " ' # for filesystems, not platforms nonportables = nonportables.replace(' ', '') # drop space used for readability if not any(c in part for part in zipparts for c in nonportables): # none found: mangling won't help return False else: # mangle the entire path replacements = {ord(c): '_' for c in nonportables} mangledparts = [part.translate(replacements) for part in zipparts] # join with zip / even on windows: trailing / means dir in zipfile mangledpath = '/'.join(mangledparts) # replace in zipfile structure: required by zipfile.extract() zipinfo.filename = mangledpath message = '--Name mangled:\n from... %s\n to..... %s' trace(message % (zippath0, mangledpath)) return True #=============================================================================== def extractzipfile(zipname, # pathname of zipfile to extract from pathto='.', # pathname of folder to extract to nofixlinks=False, # do not translate symlink separators? trace=print, # trace router (or lambda *p, **k: None) permissions=False, # propagate saved permisssions? [1.1] nomangle=False): # don't mod bad filename chars to '_' on errors? """ ----------------------------------------------------------------------- Unzip an entire zipfile at zipname to "pathto", which is created if it doesn't exist. Items from the archive are stored under "pathto", using whatever subpaths with which they are recorded in the archive. Note that compression is passed for writing, but is auto-detected for reading here. Pass "trace=lambda *p, **k: None" for silent operation. This function does no cruft-file skipping, as it is assumed to operate in tandem with the zip creation tools here; see Mergeall's script nuke-cruft-files.py to remove cruft in other tools' zips if needed. [1.1] This now returns an ExtractStats with #files/folders/symlinks/unknowns and a repr for display when used in shell scripts (see class above). [1.3] The returned stats object now also has #magled/skipped, though these are not printed by its repr unless they are nonzero; they're rare. MODTIMES: At least through the latest 3.X, Python's zipfile library module does record original files' modification times in the zipfiles it creates, but does NOT retain files' original modification time when extracting: their modification times are all set to unzip time. This is clearly a defect, which will hopefully be addressed soon (a similar issue for permissions has been posted - see ahead). The workaround here manually propagates the files' original mod times in the zip as a post-extract step. It's more code than an extractall(pathto), but this version works, and allows extracted files to be listed individually in the script's output, See this file's main docstring for details on symlinks support here; links and their paths are made portable between Unix and Windows by translating their path separators to the hosting platform's scheme, but "nofixlinks" can be used to suppress path separator replacement. UPDATE as of [1.2], errors while writing modtimes are ignored with a message in output (the modtime isn't updated, but the extract proceeds). The only context in which this is known to happen is on Android's 2016 Nougat and earlier. Modtime-update failures are silent elsewhere. It's unlikely that we'll try to update a symlink's modtime on pre-Oreo Android (symlink creates fail and make a stub file), but it's handled. chmod also raises an error before Oreo, but it was already caught and ignored with a message (and can be avoided by not using "-permissions"). FOLDER MODTIMES: Py docs suggest that os.utime() doesn't work for folders' modtime on Windows, but it does. Still, a simple extract would change all non-empty folders' modtimes to the unzip time, just by virtue of writing files into those folders. This isn't an issue for Mergeall: only files compare by modtime, and dirs are just structural. The issue is avoided here, though, by resetting folder modtimes to their original values in the zipfile AFTER all files have been written. The net effect: assuming the zip records folders as individual items (see create above), this preserves original modtimes for BOTH files and folders across zips, unlike many other zip tools. Cut-and-paste, drag-and-drop, and xcopy can also change folder modtimes on Windows, so be sure to zip folders that have not been copied this way if you wish to test this script's folder modtime retention. ABOUT SAVEPATH: The written-to "savepath" returned by zipfile.extract() may not be just os.path.join(pathto, filename). extract() also removes any leading slashes, Windows drive and UNC network names, and ".." up-references in "filename" before appending it to "pathto", to ensure that the item is stored relative to "pathto" regardless of any absolute, drive- or server-rooted, or parent-relative names in the zipfile's items. zipfile.write() drops all but "..", which zipfile.extract() discards. The local extractSymlink() behaves like zipfile.extract() in this regard. WINDOWS LONG PATHS: To support long pathnames on Windows, always prefixes the pathto target dir with '\\?\' on Windows (only), so that all file-tool calls in zipfile and zipsymlinks just work for too-long paths -- the length of paths joined to archive names is unknown here. This internal transform is hidden from users in messages, by dropping the prefix and mapping pathto back to relative if was not given as absolute initially (see showpath()). LARGE FILES: allowZip64=True uses ZIP64 extensions which support very large files. Such files are supported unevenly in other tools, but work with the create and extract tools here. It's True by default in Python 3.4+ only, and seems unused when unzipping (ZIP64 fields are used). See createzipfile() for more. PERMISSIONS: UPDATE: as of [1.1], ziptools now manually propagates permissions on extracts (unzips) for files, folders, and links, but only if this is explicitly enabled via command/function arguments. This option should generally be used only when unzipping from zipfiles known to have originated on Unix, and when unzipping back to Unix. Most use cases that require permissions to survive trips to/from zips probably will satisfy this guideline. More permissions notes: - This option is enabled on unzips only. Zips have saved permissions since 1.0, though zipsymlinks.py required a change in 1.1 to set per-link permission bits (instead of using a constant). - Not all filesystems support Unix-style permissions. On exFAT, for instance, os.chmod() silently fails to change anything, and this upgrade has no effect. It's okay to copy a zipfile to/from exFAT, but don't unzip on exFAT if you care about keeping permissions. - The new permissions argument is last, perhaps inconsistently, for 1.0 compatibility. Use keyword arguments to avoid future flux. - The related Mergeall system gets permissions propagation "for free" because it uses shutil.copystat() to copy metadata from one file to another. That's not an option here, because "from" is a zip entry. See Mergeall's code at https://learning-python.com/mergeall.html. [Former caveat's notes: Python's zipfile module preserves Unix-style permissions on creates (zips) but not extracts (unzips). This is a known Python bug; see: https://bugs.python.org/issue15795. ziptools 1.0 didn't try to work around this one because it's subtle (e.g., Unix permissions cannot be applied if the zip host was not Unix, but the host-type code may not be set reliably or correctly in all zips). Preserving executable permissions on items extracted from zipfiles may also be security risk, but that's not much of an excuse: it's fine for zips that you create.] MAC OS EXFAT BUG FIX: There is a bizarre but real bug on Mac OS (discovered on El Capitan) that requires utime() to be run *twice* to set modtimes on exFAT drive symlinks. Details omitted here for space: see Mergeall's cpall.py script for background (https://learning-python.com/programs). In short, modtime is usually updated by the second call, not first: >>> p = os.readlink('tlink') >>> t = os.lstat('tlink').st_mtime >>> os.symlink(p, '/Volumes/EXT256-1/tlink2') >>> os.utime('/Volumes/EXT256-1/tlink2', (t, t), follow_symlinks=False) >>> os.utime('/Volumes/EXT256-1/tlink2', (t, t), follow_symlinks=False) This incurs a rarely-run and harmless extra call for non-exFAT drives, but suffices to properly propagate modtimes to exFAT on Mac OS. UPDATE [1.1]: permissions aren't impacted by this bug (os.chmod() is a no-op on exFAT, per above), but it also impacts exFAT _folder_ modtimes on Mac OS (only files work properly on exFAT). Fixed in 1.1 to uname() twice for folders on Mac OS too. There are a handful of ways to check for exFAT explicitly on Unix (e.g., lsvfs, mount, and df -T exfat), but all require brittle output parsing, and none are portable to Windows; accept the trivial uname()*2 speed hit on Mac OS instead. MODTIMES - DST AND TIMEZONE: UPDATE: as of [1.2], ziptools now stores UTC timestamps for item modtimes in zip extra fields, and uses them instead of zip's "local time" on extracts. This means that modtimes of zipfiles zipped and unzipped by ziptools are immune to changes in both DST and timezone. For more details, see the README's "_README.html#utctimestamps12", and see module zipmodtimeutc.py for most of the implementation. The former local-time scheme is still used for zipfiles without UTC. [Former caveat's notes: Extracts here do nothing about adjusting modtimes of zipped items for the local timezone and DST policy, except to pass -1 to Python's time.mktime()'s DST flag to defer to libraries. The net effect may or may not agree with another unzip tool, and does not address timezone changes. For more background, see this related note: https://learning-python.com/post-release-updates.html#unziptimes.] ILLEGAL FILENAME CHARACTERS UPDATE: as of [1.3], filenames having illegal characters for the unzip platform are now mangled to use '_' replacements here explicitly, and reported in both the run's output and its final tallies line. Name mangling is attempted whenever an item fails. It is applied only on Windows, and for all filesystems there; Windows disables illegal characters across the platform, and Android 11 shared storage has a folders bug which precludes mangling there (see trymangle()). As a bonus, the new coding mangles symlink names too if needed, and continues with the unzip if any item fails post mangle. Because mangling has a rare potential for data loss, it can also be disabled with nomangle, and a utility script is provided as an manual alternative; see trymangle() for more details. [Former caveat's notes: For most filenames, Python's underlying zipfile module munges illegal characters to "_" when they are extracted on Windows, but a symlink with illegal Windows filename characters will be skipped with a message in the output here (an unlikely case, given Windows' limited symlink support - see zipsymlinks.py).] ERRONEOUS PATH SEPARATORS ziptools' creates (zips) always use Unix '/' for path separators in zipfiles on all platforms, per the zip standard. While it's not impossible that some Windows tools may record path separators as'\', there is no way to recognize these as special here, because '\' is also a valid filename character on Unix. Hence on Unix, any Windows '\' are taken as part of a filename, and won't generate folders. If these arise, see the web for fixes, and consider using other Windows zip tools that aren't an affront to standards and interoperability. TBD: FLAGS ET AL. Should ziptools also propagate file flags on Unix? It already does symlinks, UTC modtimes, and UNIX permissions explicitly. Unix flags have sketchy symlink support across Pythons (e.g., see os.lstat(), os.lchflags()); may or may not be easily smuggled in zips; and no use case for them is known to ziptools developers. There are also the worm-cans of extended attributes and macOS resource forks; pass. ----------------------------------------------------------------------- """ trace('Unzipping from', zipname, 'to', pathto) dirmodtimes = [] stats = ExtractStats() # always prefix with \\?\ on Windows: joined-path lengths are unknown; # hence, on Windows 'savepath' result is also \\?\-prefixed and absolute; pathtoWasRelative = not os.path.isabs(pathto) # user gave relative? pathto = FWP(pathto, force=True) # add \\?\, make abs # # extract all items in zip # zipfile = ZipFile(zipname, mode='r', allowZip64=True) for zipinfo in zipfile.infolist(): # for all items in zip origname = zipinfo.filename # before trymangle mods # # extract one item # try: if isSymlink(zipinfo): # read/save link path: stubs on non-mangle failures trace('(Link)', end=' ') try: savepath = extractSymlink( zipinfo, pathto, zipfile, nofixlinks, trace) except: # retry with mangled name? [1.3] if trymangle(zipinfo, pathto, nomangle, trace): savepath = extractSymlink( zipinfo, pathto, zipfile, nofixlinks, trace, origname) stats.mangled += 1 else: raise # reraise else: # create file or dir: skip on all failures try: savepath = zipfile.extract(zipinfo, pathto) except: # retry with mangled name? [1.3] if trymangle(zipinfo, pathto, nomangle, trace): savepath = zipfile.extract(zipinfo, pathto) stats.mangled += 1 else: raise # reraise except Exception as E: # continue with rest on any item failure post mangle retry [1.3] stats.skipped += 1 trace('**SKIP - item failed and skipped:', zipinfo.filename) trace('Python exception: %s, %s' % (E.__class__.__name__, E)) continue # next zipinfo in for loop # show both from+to paths iff they differ filename = zipinfo.filename # item's path in zip showname = showpath(savepath, pathtoWasRelative) # undo fwp on windows trace3(filename, showname, trace) # show 1 or 2 lines [1.3] # # propagate permissions from/to Unix for all, iff enabled [1.1] # if permissions: try: # create saved perms perms = zipinfo.external_attr >> 16 # to lower 16 bits if perms != 0: if os.path.islink(savepath): # mod link itself, where supported # not on Windows, Py3.2 and earlier # Mac OS bug moot: no-op on exFAT if (hasattr(os, 'supports_follow_symlinks') and os.chmod in os.supports_follow_symlinks): os.chmod(savepath, perms, follow_symlinks=False) # Unix Py 2.X and 3.2- have lchmod, but not f_s elif hasattr(os, 'lchmod'): os.lchmod(savepath, perms) else: # mod file or dir, where supported (exFAT=no-op) os.chmod(savepath, perms) except: trace('--Error setting permissions') # e.g., pre-Oreo Android # # propagate modtime to files, links (and dirs on some platforms) # zipinfo.filename = origname # lookup premangle [1.3] datetime = getModtimeUTCorLocal(zipinfo, zipfile) # UTC if present [1.2] if os.path.islink(savepath): # reset modtime of link itself where supported # but not on Windows or Py3.2-: keep now time # and call _twice_ on Mac for exFAT drives bug stats.symlinks += 1 if (hasattr(os, 'supports_follow_symlinks') and # iff utime does links os.utime in os.supports_follow_symlinks): try: os.utime(savepath, (datetime, datetime), follow_symlinks=False) except: trace('--Error setting link modtime') # pre-Oreo Android [1.2] else: # go again for Mac OS exFAT bug if RunningOnMacOS: os.utime(savepath, (datetime, datetime), follow_symlinks=False) elif os.path.isfile(savepath): # reset (non-link) file modtime now # no Mac OS exFAT bug stats.files += 1 try: os.utime(savepath, (datetime, datetime)) # dest time = src time except: trace('--Error setting file modtime') # pre-Oreo Android [1.2] elif os.path.isdir(savepath): # defer (non-link) dir till after add files stats.folders += 1 dirmodtimes.append((savepath, datetime)) # where supported else: # bad type in zipfile stats.unknowns += 1 assert False, 'Unknown type extracted' # should never happen # # reset (non-link) dir modtimes now, post file adds # for (savepath, datetime) in dirmodtimes: try: os.utime(savepath, (datetime, datetime)) # reset dir mtime now except: # pre-Oreo Android [1.2] trace('--Error settting folder modtime') else: # but ok on Windows/Unix # go again for Mac OS exFAT bug [1.1] if RunningOnMacOS: os.utime(savepath, (datetime, datetime)) zipfile.close() return stats # to be printed at shell [1.1] #=============================================================================== # see ../selftest.py for former __main__ code cut here for new pkg structure