File: mergeall-products/unzipped/test/ziptools/docetc/1.1-upgrades/py-2.X-fixes.txt

This file describes the improvements made for running ziptools 
on Python 2.X in ziptools version 1.1.  In short, non-ASCII 
filenames no longer trigger exceptions when printed to pipes,
and work portably in other tools when added to zipfiles by 2.X.

In more detail (and mostly for developers), three 2.X-specific 
changes were made in ziptools 1.1:

1) Use os.lchmod() for symlink permissions on 2.X and Unix

   1.1's new permission support requires running a chmod() call 
   on symlinks.  In Python 2.X, os.chmod() doesn't handle symlinks 
   on Unix (the follow_symlinks argument is not available), but 
   os.lchmod() does: use it for permissions propagation in 2.X.

2) Fix trace-message exceptions on 2.X for non-ASCII unicode filenames

   In 1.0, non-ASCII filenames in zipfiles created by other tools
   (including Python 3.X) could trigger an exception when printed
   to a pipe by Python 2.X.  This didn't occur when messages were
   printed to the console, but would happen for pipes whenever an 
   extract yielded a decoded unicode object for a non-ASCII filename.

   This became common with the next fix, because all non-ASCII
   filenames are now unicode in 2.X in creates.  To fix, printed 
   str items are forcibly encoding to UTF-8 when printed by 2.X.
   Separately, non-ASCII prints can generally fail in both Pythons
   on Windows; this was fixed by munging characters or bytes.

3) Fix munged filenames in 3.X (and other) unzips of 2.X zips

   Summary: when run by Python 2.X, ziptools creates (zips) now process 
   filenames as Unicode, so that zipfiles made by 2.X are more platform 
   agnostic.  Else, non-ASCII filenames could be munged when unzipped 
   either across platforms, or by other tools - including Python 3.X. 

   Details: when run on Python 2.X, creates now force filenames to Unicode
   by passing a unicode object instead of a str to os.listdir().  This in
   turn makes the 2.X zipfile module encode filenames more interoperably,
   using Unicode formats and flags expected by other unzip tools. 

   Formerly, ziptools on 2.X stored filenames as bytes pre-encoded per the
   underlying platform's default.  This worked if 2.X unzipped its own zips
   on the same platform, but could fail to decode properly on 3.X as well
   as incompatible platforms.  When ziptools 1.0 was run on 3.X, 2.X zips 
   yielded munged non-ASCII filenames; with the 1.1 fix, they do not.

   This fix also made it necessary to avoid a 2.X encoding exception for the
   now-Unicode non-ASCII filenames in trace messages printed during creates 
   to pipes on Unix.  The exceptions could also happen before the fix when
   2.X extracted 3.X zips, because Unicode was returned for non-ASCII.

   For reference, below is the (reformatted) relevant code from the 2.X and 
   3.X zipfile modules.  ziptools' former 2.X str scheme generally worked if 
   2.X both zipped and unzipped on the same (or a compatible) platform, but 
   the new unicode scheme both yields a more platform-neutral encoding and 
   satisfies the expectation of 3.X and other unzip tools:

   2.X (str text is encoded bytes):

      Create: unicode|bytes=>bytes
         if isinstance(self.filename, unicode):
             try:
                 return self.filename.encode('ascii'), self.flag_bits
             except UnicodeEncodeError:
                 return self.filename.encode('utf-8'), self.flag_bits | 0x800
         else: 
             return self.filename, self.flag_bits     <== str: platform-encoded

      Extract: bytes=>unicode|bytes
         if self.flag_bits & 0x800:
             return self.filename.decode('utf-8')     # ok for ascii or utf8
         else:
             return self.filename                     <== str: ok IFF compatible!

   3.X (str text is decoded Unicode):

      Create: unicode=>bytes
         try:
             return self.filename.encode('ascii'), self.flag_bits
         except UnicodeEncodeError:
             return self.filename.encode('utf-8'), self.flag_bits | 0x800

      Extract: bytes=>unicode
         if flags & 0x800:
             filename = filename.decode('utf-8')       # ok if flag & (ascii | utf8)
         else:
             filename = filename.decode('cp437')       <== MUNGED if UTF-8 from 2.X!



[Home page] Books Code Blog Python Author Train Find ©M.Lutz