File: mergeall-products/unzipped/test/ziptools/docetc/longpaths/prior-code/ziptools-withoutfwp.py
#!/usr/bin/python """ ================================================================================ ziptools.py (part of the mergeall system [3.0]) [Python 3.X or 2.X] Author: M. Lutz (learning-python.com), copyright March, 2017 License: Provided freely but with no warranties of any kind. Tools to create and extract zipfiles containing a set of files, folders, and symbolic links. All functions here are callable, but the main top-level entry points are these two (see ahead for more on their arguments): createzipfile(zipname, [addnames], storedirs=True, cruftpatts={}, atlinks=False, trace=print) extractzipfile(zipname, pathto='.', nofixlinks=False, trace=print) See also scripts zip-create.py and zip-extract.py for command-line clients, and zipcruft.cruft_skip_keep for a default "cruftpatts" cruft-file definition. This mostly extends Python's zipfile module with top-level convenience tools that add some important missing features: * For folders, adds the folder's entire tree to the zipfile automatically * For zipfile creation, filters out cruft (hidden metadata) files on request * For zipfile extracts, retains original modtimes for files, folders, links * For symlinks, adds/recreates the link itself to/from zipfiles, by default CRUFT HANDLING: This script sidesteps other tools' issues with ".*" cruft files (metadata that is normally hidden): by default, they are not silently/implicitly omitted in zips here for completeness, but can be omitted by passing a filename-patterns definition structure to the optional "cruftpatts" argument. See zipcruft.py for pattern defaults to import and pass, and zipfile-create.py for more background. Most end-user zips should skip cruft files (see mergeall: cruft can be a major issue on Mac OS X in data to be transferred elsewhere). SYMLINKS SUPPORT: This package also supports adding symlinks (symbolic links) to and extracting them from zip archives, on both Unix and Windows with Python 3.X, but only on Unix with Python 2.X. Windows requires admin permissions and NTFS filesystem destinations to create symlinks from a zip file; Unix does not. The underlying Python zipfile module doesn't support symlinks directly today, short of employing the very low-level magic used in ziptools_symlinks.py here, and there is an open bug report to improve this: https://bugs.python.org/issue18595 https://mail.python.org/pipermail/python-list/2005-June/322179.html https://duckduckgo.com/?q=python+zipfile+symlink Symlinks customize messages with "~" characters in creation and "(Link)" prefixes in extraction, because they are a special-enough case to call out in logs, and may require special permission and handling to unzip and use on Windows. For example, link creation and extraction messages are as follows: Adding link ~folder test1/dirlink # create message (Link) Extracted test1/dirlink # extract message By default, zipfile creation zips links themselves verbatim, not the items they refer to. Pass True to the "atlinks" function argument to instead follow links and zip the items they refer to. Unzipping restores whatever was zipped. When links are copied verbatim, extracts adjust the text of a link's path to use the hosting platform's separators - '\' for Windows and '/ for Unix. This provides some degree of link portability between Unix and Windows, but is switchable with "nofixlinks" because it may not be desirable in all contexts (e.g., when unzipping to a drive to be used elsewhere). Symlinks will still be nonportable if they contain other platform-specific syntax, such as Windows drive letters or UNC paths, or use asbolute references to extra-archive items. When "atlinks" is used to follow links and copy itens they refer to, recursive links are detected on platforms and Pythons that support stat objects' st_ino (a.k.a. indode) unique directory identifiers. This includes all Unix contexts, and Windows as of Python 3.2 (other contexts fails on path or memory errors). Recursive links are copied themselves, verbatim, to avoid loops and errors. Besides symlinks, FIFOs and other exotic items are always skipped and ignored. ================================================================================ """ from __future__ import print_function # py 2.X import os, sys, time, shutil from zipfile import ZipFile, ZIP_DEFLATED # stdlib base support from fnmatch import fnmatchcase # non-case-mapping version # default cruft-file patterns, import here for importers try: from zipcruft import cruft_skip_keep except ImportError: from .zipcruft import cruft_skip_keep # if pkg used elsewhere in py3.X # a major workaround: split this narly code off to a module... try: from zipsymlinks import addSymlink, isSymlink, extractSymlink except ImportError: from .zipsymlinks import addSymlink, isSymlink, extractSymlink # ditto #=============================================================================== def tryrmtree(folder, trace=print): """ Utility: remove a folder by pathname if needed before unzipping to it. Python's shutil.rmtree() can sometimes fail on Windows with a "directory not empty" error, even though the dir _is_ empty when inspected after the error, and running again usually fixes the problem (deletes the folder successfully). Bizarre, yes? See the rmtreeworkaround() onerror handler in mergeall's backup.py for explanations and fixes. rmtree() can also fail on read-only files, but this is likely intended by users. """ if os.path.exists(folder): trace('Removing', folder) try: if os.path.islink(folder): os.remove(folder) else: shutil.rmtree(folder) except Exception as why: print('shutil.rmtree (or os.remove) failed:', why) input('Try running again, and press Enter to exit.') sys.exit(1) #=============================================================================== def isCruft(filename, cruftpatts): """ Identify cruft by matching a file or folder basename "filename", to the patterns in dict "cruftpatts", using the fnmatch stdib module. Returns True if filename is a cruft item, which means it matches any pattern on "skip" list, and does not match any pattern on "keep" list, either of which can be empty to produce False results from any(). No files are cruft if the entire patterns dict is empty (the default). See createzipfile() ahead for more on the "cruftpatts" dictionary. """ return (cruftpatts and any(fnmatchcase(filename, patt) for patt in cruftpatts['skip']) and not any(fnmatchcase(filename, patt) for patt in cruftpatts['keep'])) #=============================================================================== def isRecursiveLink(dirpath): """ Use inodes to identify each part of path leading to a link, on platforms that support inodes. All Unix/Posix do, though Windows Python doesn't until till 3.2 - if absent, allow other error to occur (there are not many more options here; on all Windows, os.path.realpath() is just os.path.abspath()). This is linearly slow in the length of paths to dir links, but links are exceedingly rare, "atlinks" use in ziptools may be rarer, and recursive links are arguably invalid data. Recursion may be better than os.walk when path history is required, though this incurs overheads only if needed as is. """ trace = lambda * args: None # or print to watch # called iff atlinks: following links if (not os.path.islink(dirpath) or # dir item not a link? os.stat(os.getcwd()).st_ino == 0): # platform has no inodes? return False # moot, or hope for best else: # collect inode ids for each path extension except last inodes = [] path = [] parts = dirpath.split(os.sep)[:-1] # all but link at end while parts: trace(path, parts) path += [parts[0]] # add next path part parts = parts[1:] # expand, fetch inode thisext = os.sep.join(path) thispath = os.path.abspath(thisext) inodes.append(os.stat(thispath).st_ino) # recursive if points to item with same inode as any item in path linkpath = os.path.abspath(dirpath) trace(inodes, os.stat(linkpath).st_ino) return os.stat(linkpath).st_ino in inodes def isRecursiveLink0(dirpath, visited): """ ABANDONED, UNUSED: realpath() cannot be used portably, because it is just abspath() on Windows Python (but why?). Trap recursive links to own parent dir, but allow multiple non-recursive link visits. The logic here is as follows: If we've reached a link that leads to a path we've already reached from a link AND we formerly reached that path from a link located at a path that is a prefix of the new link's path, then the new link must be recursive. No, really... Catches link at visit #2, but avoids overhead for non-links. """ # called iff atlinks: following links if not os.path.islink(dirpath): # skip non-links return False # don't note path else: # check links history realpath = os.path.realpath(dirpath) # dereference, abs #print('\t', dirpath, '\n\t', realpath, sep='') if (realpath in visited and any(dirpath.startswith(prior) for prior in visited[realpath])): return True else: # record this link's visit visited[realpath] = visited.get(realpath, []) # add first or next visited[realpath].append(dirpath) return False #=============================================================================== def addEntireDir(rootdirname, # pathname of directory to add zipfile, # open zipfile.Zipfile object to add to storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print): # trace message router (or lambda *x: None) """ Add the full folder at rootdirname to zipfile by adding all its parts. Python's zipfile module has extractall(), but nothing like an addall(). Note that the walker's files list is really all non-dirs (which may include non-file items that should likely be excluded on some platforms), and non-link subdirs are always reached by the walker. Dir links are returned in subdir list, but not followed by default. Dirs (a.k.a. folders) don't always need to be written to the zipfile themselves, because extracts add all of a file's dirs if needed (with os.makedirs(), in Python's zipfile module). Really, zipfiles don't have folders per se - just individual items with pathnames and metadata. However, dirs MUST be added to the zipfile themselves to either: 1) Retain folders that are empty in the original. 2) Retain the original modtimes of folders (see extract below). When added directly, the zipfile records folders as zero-length items with a trailing "/", and recreates the folder on extracts as needed. Disable folder writes with "storedirs" if this proves incompatible with other tools (but it works fine with WinZip). If atlinks=True, copies items links reference not links themselves, and steps into subdirs referenced by links; else, copies links and doesn't folow them. For links to dirs, os.walk yields the name of the link (not the dir it references), and this is the name under which the linked subdir is stored in the zip (hence, dirs can be present in multiple tree locations). For example, if link 'python' reference dir 'python3', the latter is stored under the former name. Also traps recursive link paths to avoid running into memory errors or path limits, by using stat object st_ino unique identifiers to discern loops from valid dir repeats. For more details on links in os.walk(), see docetc/symlinks/demo*.txt """ # walker follows dir links iff atlinks treewalker = os.walk(rootdirname, followlinks=atlinks) for (dirhere, subdirshere, fileshere) in treewalker: # handle this dir if storedirs and dirhere != '.': trace('Adding folder', dirhere) zipfile.write(dirhere) # add folders too # handle subdirs here for subname in subdirshere.copy(): if isCruft(subname, cruftpatts): # skip cruft dirs trace('--Skipped cruft dir', subname) subdirshere.remove(subname) # prune the walk else: dirpath = os.path.join(dirhere, subname) if atlinks and isRecursiveLink(dirpath): # link to parent? trace('Recursive link copied', dirpath) addSymlink(dirpath, zipfile) # copy link instead subdirshere.remove(subname) # prune the walk elif os.path.islink(dirpath) and not atlinks: # walk won't follow trace('Adding link ~folder', dirpath) # but add link path addSymlink(dirpath, zipfile) else: # non-link dir or following links pass # follow the link # handle non-dirs here for filename in fileshere: if isCruft(filename, cruftpatts): # skip cruft files trace('--Skipped cruft file', filename) else: filepath = os.path.join(dirhere, filename) if os.path.islink(filepath) and not atlinks: # add link paths trace('Adding link ~file', filepath) # else follow links addSymlink(filepath, zipfile) elif os.path.isfile(filepath): # add files/paths trace('Adding file ', filepath) zipfile.write(filepath) else: # fifo, etc. # skip oddities trace('--Skipped unknown type:', filepath) #=============================================================================== def createzipfile(zipname, # pathname of new zipfile to create addnames, # sequence of pathnames of items to add storedirs=True, # record dirs explicitly in zipfile? cruftpatts={}, # cruft files skip/keep, or {}=do not skip atlinks=False, # zip items referenced instead of links? trace=print): # trace message router (or lambda *x: None) """ Make a zipfile at path "zipname" and add to it all folders and files in "addnames". Pass "trace=(lambda *args: None)" for silent operation. See function addEntireDir() above for details on "storedirs" (its default is normally desired), and ahead here for "cruftpatts" (its default means all cruft files and folders are included in the zip). This always uses ZIP_DEFLATED, the usual zip compression scheme (ZIP_STORED is uncompressed). Python's base zipfile module used here supports Unicode filenames automatically (encoded per UTF8). By default, all files and folders are added to the zip. This is by design, because this code was written as a workaround for WinZip's silent file omissions. As an option, though, this function will instead skip normally-hidden cruft files and folders (e.g., ".*") much like mergeall, so they are not added to zips used to upload websites or otherwise distribute or transfer programs and data. To enable cruft skipping, pass to cruftpatts a dictionary of this form: {'skip': ['pattern', ...], 'keep': ['pattern', ...]} to define fnmatch filename patterns for both items to be skipped, and items to be kept despite matching a skip pattern (e.g., ".htaccess"). If no dictionary is passed, all items are added to the zip; if either list is empty, it fails to match any file. See zipcruft.py for more details, and customizable presets to import and pass to cruftpatts. Also by default, if symbolic links are present, they are added to the zip themselves - not the items they reference. Pass atlinks=True to instead follow links and zip the items they reference. This also traps recursive links if atlinks=True, where inodes are supported. """ trace('Zipping', addnames, 'to', zipname) if cruftpatts: trace('Cruft patterns:', cruftpatts) zipfile = ZipFile(zipname, mode='w', compression=ZIP_DEFLATED) for addname in addnames: if (addname not in ['.', '..'] and isCruft(os.path.basename(addname), cruftpatts)): print('--Skipped cruft item', addname) elif os.path.islink(addname) and not atlinks: trace('Adding link ~item', filepath) addSymlink(addname, zipfile) elif os.path.isfile(addname): trace('Adding file ', addname) zipfile.write(addname) elif os.path.isdir(addname): addEntireDir(addname, zipfile, storedirs, cruftpatts, atlinks, trace) else: # fifo, etc. trace('--Skipped unknown type:', addname) zipfile.close() #=============================================================================== def extractzipfile(zipname, # pathname of zipfile to extract from pathto='.', # pathname of folder to extract to nofixlinks=False, # do not translate symlink separators? trace=print): # trace router (or lambda *x: None) """ Unzip an entire zipfile at zipname to pathto, which is created if it doesn't exist. Note that compression is passed for writing, but is auto-detected for reading here. Pass "trace=(lambda *args: None)" for silent operation. This does no cruft-file skipping, as it is assumed to operate in tandem with the zip creation tools here; see mergeall's nuke-cruft-files.py to remove cruft in other tools' zips. At least through 3.5, Python's zipfile library module does record the original files' modification times in zipfiles it creates, but does NOT retain files' original modification time when extracting: their modification times are set to unzip time. This is clearly a bug, which will hopefully be addressed soon (a similar issue for permissions is posted). The workaround here manually propagates the files' original mod times in the zip as a post-extract step. It's more code than an extractall(pathto), but this version works, and allows extracted files to be listed individually. See this file's main dosctring for details on symlink support here; links and their paths are made portable between Unix and Windows by translating their path separators to the hosting platform's scheme. but "nofixlinks can be used to suppress path separator replacement. SUBTLETY: Py docs suggest that os.utime() doesn't work for folders' modtime on Windows, but it does. Still, a simple extract would change all non-empty folders' modtimes to the unzip time, just by virtue of writing files into those folders. This isn't an issue for mergeall: only files compare by modtime, and dirs are just structural. The issue is avoided here, though, by resetting folder modtimes to their original values in the zipfile AFTER all files have been written. The net effect: assuming the zip records folders as individual items (see create above), this preserves original modtimes for BOTH files and folders across zips, unlike some other zip tools. Cut-and-paste, drag-and-drop, and xcopy can also change folder modtimes on Windows, so be sure to zip folders that have not been copied this way if you wish to test this script's folder modtime retention. ALSO SUBTLE: the written-to "pathname" returned by zipfile.extract() may not be just os.path.join(pathto, filename). extract() also removes any leading slashes, Windows drive and UNC network names, and ".." up-references in "filename" before appending it to "pathto", to ensure that the item is stored relative to "pathto" regardless of any absolute, drive- or server-rooted, or parent-relative names in the zipfile's items. zipfile.write() drops all but "..", which zipfile.extract() discards. The local extractSymlink() behaves like zipfile.extract() in this regard. """ trace('Unzipping from', zipname, 'to', pathto) dirtimes = [] zipfile = ZipFile(zipname, mode='r') for zipinfo in zipfile.infolist(): # all items in zip if isSymlink(zipinfo): # read/save link path trace('(Link)', end=' ') pathname = extractSymlink(zipinfo, pathto, zipfile, nofixlinks) else: # create file or dir pathname = zipfile.extract(zipinfo, pathto) filename = zipinfo.filename # item's path in zip trace('Extracted %s\n\t\t=> %s' % (filename, pathname)) # propagate mod time to files, links (and dirs on some platforms) origtime = zipinfo.date_time # zip's 6-tuple datetime = time.mktime(origtime + (0, 0, -1)) # 9-tuple=>float if os.path.islink(pathname): # reset mtime of link itself where supported # but not on Windows or Py3.2-: keep now time if (hasattr(os, 'supports_follow_symlinks') and os.utime in os.supports_follow_symlinks): os.utime(pathname, (datetime, datetime), follow_symlinks=False) elif os.path.isfile(pathname): # reset (non-link) file mtime now os.utime(pathname, (datetime, datetime)) # dest time = src time elif os.path.isdir(pathname): # defer (non-link) dir till after add files dirtimes.append((pathname, datetime)) # where supported else: assert False, 'Unknown type extracted' # should never happen # reset (non-link) dir modtimes now, post file adds for (pathname, datetime) in dirtimes: try: os.utime(pathname, (datetime, datetime)) # reset dir mtime now except: trace('Error settting directory times') # ok on Windows/Unix zipfile.close() #=============================================================================== if __name__ == '__main__': """ Self-test, run in script's folder (and edit me: your context may vary). Makes a zip file, unzips it, and compares results to original data. See zip-create.py, zip-extract.py, zip-list.py for command-line clients. """ # default cruft-file patterns from zipcruft import cruft_skip_keep # or a custom def, or {}=no skip def announce(*args): print('\n\n****', *args, '****\n') #---------------------------------------------------------------- # configure test run parameters #---------------------------------------------------------------- # map test to test subdir names skipcruft = len(sys.argv) > 1 # any cmdline arg? platform = sys.platform # win32, darwin, or linux cruftsubdir = 'skipcruft' if skipcruft else 'withcruft' platsubdir = dict(win32='Windows', darwin='MacOSX', linux='Linux')[platform] # make+use folder here to create and extract a zipfile testsubdir = os.path.join('selftest', platsubdir, cruftsubdir) if not os.path.exists(testsubdir): # selftest\Windows\withcruft os.makedirs(testsubdir) # selftest/MacOSX/skipcruft zipto = os.path.join(testsubdir, 'ziptest.zip') # plus the zip file target # use test data dirs in '..' parent [**EDIT ME**] origin = '..' folders = ['test1', 'test2'] # i.e., [../test1, ../test2] sources = [(origin + os.sep + folder) for folder in folders] #---------------------------------------------------------------- # zip original source dirs to subdir file #---------------------------------------------------------------- announce('CREATING') if not skipcruft: # any cmdline arg? use cruft patts createzipfile(zipto, sources) # else keep cruft: use {} default else: createzipfile(zipto, sources, cruftpatts=cruft_skip_keep) #---------------------------------------------------------------- # unzip subdir file to subdir dirs, cleaning first if needed #---------------------------------------------------------------- announce('EXTRACTING') for folder in folders: tryrmtree(os.path.join(testsubdir, folder)) # clean extract targets extractzipfile(zipto, testsubdir) # extract in testsubdir #---------------------------------------------------------------- # compare zipped+unzipped subdir dirs to original source dirs #---------------------------------------------------------------- # use mergeall's diff and merge for validation [EDIT ME] diffallpath = os.path.join('..', '..', 'diffall.py') mergeallpath = os.path.join('..', '..', 'mergeall.py') for folder in folders: announce('COMPARING MODTIMES:', folder) pipe = os.popen('%s %s %s %s -report' % (sys.executable, mergeallpath, os.path.join(origin, folder), os.path.join(testsubdir, folder))) for line in pipe: print(line, end='') for folder in folders: announce('COMPARING CONTENT:', folder) pipe = os.popen('%s %s %s %s' % (sys.executable, diffallpath, os.path.join(origin, folder), os.path.join(testsubdir, folder))) for line in pipe: print(line, end='') if sys.platform.startswith('win'): if sys.version[0] == '2': input = raw_input input('Press Enter to exit.') # stay up if clicked