File: mergeall-products/unzipped/test/ziptools/docetc/1.1-upgrades/py-2.X-fixes.txt
This file describes the improvements made for running ziptools
on Python 2.X in ziptools version 1.1. In short, non-ASCII
filenames no longer trigger exceptions when printed to pipes,
and work portably in other tools when added to zipfiles by 2.X.
In more detail (and mostly for developers), three 2.X-specific
changes were made in ziptools 1.1:
1) Use os.lchmod() for symlink permissions on 2.X and Unix
1.1's new permission support requires running a chmod() call
on symlinks. In Python 2.X, os.chmod() doesn't handle symlinks
on Unix (the follow_symlinks argument is not available), but
os.lchmod() does: use it for permissions propagation in 2.X.
2) Fix trace-message exceptions on 2.X for non-ASCII unicode filenames
In 1.0, non-ASCII filenames in zipfiles created by other tools
(including Python 3.X) could trigger an exception when printed
to a pipe by Python 2.X. This didn't occur when messages were
printed to the console, but would happen for pipes whenever an
extract yielded a decoded unicode object for a non-ASCII filename.
This became common with the next fix, because all non-ASCII
filenames are now unicode in 2.X in creates. To fix, printed
str items are forcibly encoding to UTF-8 when printed by 2.X.
Separately, non-ASCII prints can generally fail in both Pythons
on Windows; this was fixed by munging characters or bytes.
3) Fix munged filenames in 3.X (and other) unzips of 2.X zips
Summary: when run by Python 2.X, ziptools creates (zips) now process
filenames as Unicode, so that zipfiles made by 2.X are more platform
agnostic. Else, non-ASCII filenames could be munged when unzipped
either across platforms, or by other tools - including Python 3.X.
Details: when run on Python 2.X, creates now force filenames to Unicode
by passing a unicode object instead of a str to os.listdir(). This in
turn makes the 2.X zipfile module encode filenames more interoperably,
using Unicode formats and flags expected by other unzip tools.
Formerly, ziptools on 2.X stored filenames as bytes pre-encoded per the
underlying platform's default. This worked if 2.X unzipped its own zips
on the same platform, but could fail to decode properly on 3.X as well
as incompatible platforms. When ziptools 1.0 was run on 3.X, 2.X zips
yielded munged non-ASCII filenames; with the 1.1 fix, they do not.
This fix also made it necessary to avoid a 2.X encoding exception for the
now-Unicode non-ASCII filenames in trace messages printed during creates
to pipes on Unix. The exceptions could also happen before the fix when
2.X extracted 3.X zips, because Unicode was returned for non-ASCII.
For reference, below is the (reformatted) relevant code from the 2.X and
3.X zipfile modules. ziptools' former 2.X str scheme generally worked if
2.X both zipped and unzipped on the same (or a compatible) platform, but
the new unicode scheme both yields a more platform-neutral encoding and
satisfies the expectation of 3.X and other unzip tools:
2.X (str text is encoded bytes):
Create: unicode|bytes=>bytes
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits <== str: platform-encoded
Extract: bytes=>unicode|bytes
if self.flag_bits & 0x800:
return self.filename.decode('utf-8') # ok for ascii or utf8
else:
return self.filename <== str: ok IFF compatible!
3.X (str text is decoded Unicode):
Create: unicode=>bytes
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
Extract: bytes=>unicode
if flags & 0x800:
filename = filename.decode('utf-8') # ok if flag & (ascii | utf8)
else:
filename = filename.decode('cp437') <== MUNGED if UTF-8 from 2.X!