File: mergeall-products/unzipped/test/ziptools/docetc/1.1-upgrades/py-2.X-fixes.txt
This file describes the improvements made for running ziptools on Python 2.X in ziptools version 1.1. In short, non-ASCII filenames no longer trigger exceptions when printed to pipes, and work portably in other tools when added to zipfiles by 2.X. In more detail (and mostly for developers), three 2.X-specific changes were made in ziptools 1.1: 1) Use os.lchmod() for symlink permissions on 2.X and Unix 1.1's new permission support requires running a chmod() call on symlinks. In Python 2.X, os.chmod() doesn't handle symlinks on Unix (the follow_symlinks argument is not available), but os.lchmod() does: use it for permissions propagation in 2.X. 2) Fix trace-message exceptions on 2.X for non-ASCII unicode filenames In 1.0, non-ASCII filenames in zipfiles created by other tools (including Python 3.X) could trigger an exception when printed to a pipe by Python 2.X. This didn't occur when messages were printed to the console, but would happen for pipes whenever an extract yielded a decoded unicode object for a non-ASCII filename. This became common with the next fix, because all non-ASCII filenames are now unicode in 2.X in creates. To fix, printed str items are forcibly encoding to UTF-8 when printed by 2.X. Separately, non-ASCII prints can generally fail in both Pythons on Windows; this was fixed by munging characters or bytes. 3) Fix munged filenames in 3.X (and other) unzips of 2.X zips Summary: when run by Python 2.X, ziptools creates (zips) now process filenames as Unicode, so that zipfiles made by 2.X are more platform agnostic. Else, non-ASCII filenames could be munged when unzipped either across platforms, or by other tools - including Python 3.X. Details: when run on Python 2.X, creates now force filenames to Unicode by passing a unicode object instead of a str to os.listdir(). This in turn makes the 2.X zipfile module encode filenames more interoperably, using Unicode formats and flags expected by other unzip tools. Formerly, ziptools on 2.X stored filenames as bytes pre-encoded per the underlying platform's default. This worked if 2.X unzipped its own zips on the same platform, but could fail to decode properly on 3.X as well as incompatible platforms. When ziptools 1.0 was run on 3.X, 2.X zips yielded munged non-ASCII filenames; with the 1.1 fix, they do not. This fix also made it necessary to avoid a 2.X encoding exception for the now-Unicode non-ASCII filenames in trace messages printed during creates to pipes on Unix. The exceptions could also happen before the fix when 2.X extracted 3.X zips, because Unicode was returned for non-ASCII. For reference, below is the (reformatted) relevant code from the 2.X and 3.X zipfile modules. ziptools' former 2.X str scheme generally worked if 2.X both zipped and unzipped on the same (or a compatible) platform, but the new unicode scheme both yields a more platform-neutral encoding and satisfies the expectation of 3.X and other unzip tools: 2.X (str text is encoded bytes): Create: unicode|bytes=>bytes if isinstance(self.filename, unicode): try: return self.filename.encode('ascii'), self.flag_bits except UnicodeEncodeError: return self.filename.encode('utf-8'), self.flag_bits | 0x800 else: return self.filename, self.flag_bits <== str: platform-encoded Extract: bytes=>unicode|bytes if self.flag_bits & 0x800: return self.filename.decode('utf-8') # ok for ascii or utf8 else: return self.filename <== str: ok IFF compatible! 3.X (str text is decoded Unicode): Create: unicode=>bytes try: return self.filename.encode('ascii'), self.flag_bits except UnicodeEncodeError: return self.filename.encode('utf-8'), self.flag_bits | 0x800 Extract: bytes=>unicode if flags & 0x800: filename = filename.decode('utf-8') # ok if flag & (ascii | utf8) else: filename = filename.decode('cp437') <== MUNGED if UTF-8 from 2.X!