File: mergeall-products/unzipped/test/ziptools/ziptools/zipmodtimeutc.py

"""
================================================================================
zipmodtimeutc.py - zip/unzip UTC modtime timestamps via zip extra field [1.2].
See ziptools' ../_README.html for license, attribution, and other logistics.

ziptools 1.2 makes zipfile modtimes of files, folders, and symlinks immune 
to changes in both timezone and DST, by storing UTC timestamps in one of the 
"extra fields" defined by the zipfile standard.  

In particular, the extended-timestamp extra field (code 0x5455) introduced by 
the Info-ZIP project is ideal for this: ziptools adds one of these extra fields
to each items' central directory header on zips, and fetches them from the same 
location on unzips.  When present for any item, this field is used for propagated
modtime instead of the main MS-DOS "local time," and will simply be ignored by 
tools that don't support it.

This is a full fix to zip's local-time issues: UTC timestamps are relative to 
a fixed point, and thus both timezone and DST agnostic.  Local time is used for
display only, as it should be, and not for file metadata (in zips or elsewhere).
Given zips' lack of timezone info, UTC is the only way to make times accurate.

The prior scheme in ziptools 1.1 used the zip local time, and deferred to 
Python's library calls time.localtime() and time.mktime() to both translate 
UTC time to and from local time, and handle DST changes.  Unfortunately, that
scheme's results could vary from those of other zip tools on DST changes, and 
did nothing about timezone changes.  The new UTC timestamp extra-field scheme
resolves both DST and timezone modtime issues with a single fix.

FIELD LAYOUT:

Tip: 'zipinfo -V zipfile.zip' displays central-directory contents.
The layout of the extra field per spec, all little-endian byte order:

  Value        Size     Description
  -----        ----     -----------
  0x5455       Short    tag for this extra block type ("UT")
  TSize        Short    total data size for this block
  Flags        Byte     info bits (refers to local header, not this)
  (ModTime)    Long     time of last modification (UTC/GMT)

Where TSize designates modtime central-directory presence, 
and Flags describes the local entry correspondence as follows:

  bit 0       if set, modification time is present
  bit 1       if set, access time is present
  bit 2       if set, creation time is present
  bits 3-7    reserved for additional timestamps; not set

LIBRARY DEPENDENCE:

The lack of direct support for extra fields and arguably walled-in coding 
structure of Python's zipfile module renders the code here subtle.  In 
fact, the reliance here on infolist()[-1] may qualify as a hack, and the
use of getinfo() seems a bit magical.  These are public and documented
APIs, and avoiding them would require massive rewrites here.  Still, the 
coupling is tight, and this code may grow module-version dependent in time.

As is, the code here has been verified to work for zipfile in Python 2.7, 
3.5, and 3.7 through 3.9, but open-source code can morph arbitrarily; 
forking a Python standard library is a nonstarter; and a frozen executable 
isn't yet an option here (ziptools is a programmer's library too).  If a 
future zipfile mod breaks this code, the best fix is to use an older Python 
and/or zipfile.

OTHER CAVEATS: 

1) Scope: the new UTC scheme won't help for zipfiles created by tools that 
don't record the extra field; in these cases, ziptools falls back on the 
original 1.1 local-time scheme.  If other zip tools add UTC modtimes in
the central directory's 0x5455 fields, ziptools will make use of them.

2) Field use: it is unclear whether the 0o5455 field should appear in a 
local file header, central directory header, or both.  It's stored in 
the central directory only here, and seems to pass in other tools.
Storing in the local header too may require manual ZipInfo builds.

3) Other fields: besides 0x5455, others extra fields may contain extended 
timestamps too, but ziptools doesn't process these because it doesn't add them.
ziptools also doesn't do anything about creation or access time in its 0x5455
fields, because they are generally too variable across platforms (and can't 
show up in the central directory's headers in any event)  Such support could
be added if there's any user interest; at present, it lacks use cases.

4) Subclassing: this could have been coded as a ZipFile subclass, of course 
(e.g., extending the close() method would save a few manual calls).  This 
wasn't pursued, because the symlinks support is already coded as functions, 
and it was a goal to make this as independent of zipfile's API as possible; 
it's changed in the past, and is prone to change again.
================================================================================
"""

from __future__ import print_function         # run on python 2.x too
import os, time, struct

# show ops or not (this file only)
#trace = lambda *args: print('='* 4, *args)
trace = lambda *args: None

UTCExtraCode  = 0x5455     # extended timestamp ('UT'), introduced by Info-ZIP project
UTCExtraFlags = 0b0000     # no local-header extra fields: just in central directory

AllExtraHdrFmt = '<HH'     # code + length: 2 unsigned 2-byte shorts, little endian
AllExtraHdrLen = 4

UTCExtraDataFmt = '<Bl'    # flags + timestamp: unsigned byte + signed 4-byte long, little
UTCExtraDataLen = 5


#===============================================================================



def addModtimeUTC(zipfile, filepath=None, utcmodtime=None):
    """
    --------------------------------------------------------------------------
    On Zips: add an extra field for the item just written, with an extended 
    timestamp value passed to utcmodtime or read from filepath - filesystem
    UTC time of the original item.  The field added to a zipinfo here is 
    later written to the item's central-directory entry on zipfile.close().

    Called just after a zip write, and assumes the item written was appended 
    to infolist (else need to build ZipInfos for files and folders manually).
    os.path.getmtime() is the same as os.stat().st_mtime, but symlinks must
    pass in a link's own time garnered from its os.lstat().st_mtime.
    filepath already has the Windows long-path prefix on that platform.
    --------------------------------------------------------------------------
    """
    assert not filepath or not utcmodtime

    zipinfo = zipfile.infolist()[-1]                         # the item just written
    utcmodtime = utcmodtime or os.path.getmtime(filepath)    # passed (symlinks) or not

    extrabytes =  struct.pack(AllExtraHdrFmt, UTCExtraCode, UTCExtraDataLen)
    extrabytes += struct.pack(UTCExtraDataFmt, UTCExtraFlags, int(utcmodtime))

    trace('Added UTC timestamp:', repr(extrabytes))
    zipinfo.extra += extrabytes    # to be written on zipfile.close()



#===============================================================================



def getModtimeUTCorLocal(zipinfo, zipfile):
    """
    --------------------------------------------------------------------------
    On unzips: return the UTC modtime timestamp for the item represented by 
    zipinfo in zipfile - from either the extra UTC timestamp field, or zip's
    local time.  If present, UTC timestamps are in the extra fields of each 
    item's central-directory entry.  The extra fields are read (not parsed)
    by zipfile's __init__(), and tabled by filename (getinfo() is a dict []).
    --------------------------------------------------------------------------
    """

    localextra = zipinfo.extra                                # from file local header
    centralextra = zipfile.getinfo(zipinfo.filename).extra    # from central directory
    extrabytes = centralextra                                 # choose wisely?...

    utctime = None
    try:
        # 
        # use UTC timestamp extra field if present, instead of local time;
        # parse through extra-field bytes till timestamp found or no more;
        #
        offset = 0
        while offset < len(extrabytes):
            hdrbytes = extrabytes[offset : offset + AllExtraHdrLen]
            offset += AllExtraHdrLen
            code, length = struct.unpack(AllExtraHdrFmt, hdrbytes)
            if code != UTCExtraCode:
                offset += length
            else:
                databytes = extrabytes[offset : offset + UTCExtraDataLen]
                flags, utctime = struct.unpack(UTCExtraDataFmt, databytes)
                trace('Got UTC timestamp:', utctime)
                break
        else:
            trace('No UTC timestamp found: used local')

    except Exception as why:
        #
        # bad extra-field formatting (e.g., null byte at end)
        # use local, continue unzipping rest of the archive
        #
        trace('Error parsing extra fields: used local')
        trace('Python exception:', why)

    if utctime is None:
        #
        # not found or error: fall back on pre 1.2 scheme: use main 
        # zip local time, and defer to time.mktime() to convert to UTC 
        # as possible (adjusts for DST maybe, but never for timezones);
        #
        localtime = zipinfo.date_time                      # zip's 6-tuple
        utctime   = time.mktime(localtime + (0, 0, -1))    # 9-tuple=>float

    return utctime    # to be propogated to unzipped item



[Home page] Books Code Blog Python Author Train Find ©M.Lutz