File: mergeall-products/unzipped/fixlongpaths.py

r"""
================================================================================
fixlongpaths.py:
   Work around Windows path-length limits (part of the mergeall system [3.0]).
   Main export here: fixLongWindowsPath(pathname, force=False), a.k.a. FWP.
   
On Windows, work around the normal pathname length limit -- 260 characters
for files, 248 when making dirs, and less 1 for a NULL byte -- by prefixing
paths with '\\?\' before they are sent to system tools.  This fix is used by
mergeall, diffall, cpall, and ziptools, to support long pathnames on all
versions of Windows, without requiring an optional Windows 10 extension that
must be enabled and does not apply automatically to all programs.

The fix is a direct add for local-drive paths:
   'C:\folder...' => '\\?\C:\folder...'
   
But requires a minor reformatting step for network-drive paths:
   '\\server\folder...' => '\\?\UNC\server\folder...'

Either way, this prefix lifts the pathname length limit to 32k characters,
with 255 characters generally allowed per path component.

Pathnames are also run through abspath() to ensure they are absolute here
as needed for path prefixing.  This call also changes any '/' to '\' which
is required in this scheme: unlike the normal Windows file API, the API
invoked by '\\?\' does not do this automatically.

Python's abspath() incurs a speed hit which isabs() + .replace('/', '\\') may
reduce, but is optimized to call a Windows C function for much of its work,
and is run only for rare too-long pathnames (but see the update ahead).  It's
required for long paths: \\?\ paths do not interpret any relative path syntax
(e.g., '.' or '..').  Unix '/' tend to crop up most often on Windows in GUIs.

Subtly, the Windows API actually imposes two different path-length limits:

   - 260 characters, for files and almost all other calls
   - 248 characters, when creating folders (260 less 12 for a 8.3 filename)

Both of these size limits include 1 byte for NULL, 3 for a drive name (e.g.,
'C:\'), and other for UNC.  Hence, the extra-content-length limit is really
256 for local drives, and the total string-length limits for files and dir
creation are, respectively, *259* and *247* in Python before passing to C libs.
This module allows you to pick the limit by context when possible, but defaults
to the more-inclusive 247 (any prefixing overkill is likely harmless here).

For Python code that verifies the limits on paths (both abs and rel), see:
    docetc/miscnotes/Windows-path-limits

For the official (though cursory) story on Windows path-limit madness, see:
   https://msdn.microsoft.com/en-us/library/aa365247.aspx#maxpath.

For a demo of the issue and fix in action apart from the code here, see:
   docetc/miscnotes/demo-3.0-long-windows-paths-fix.txt.

For prior art, see the .Net team's (somewhat dated) blog on the subject:
   https://duckduckgo.com/?q=long+paths+in+.net+bcl

See also this file's self-test code and its expected output below.
TBD: the main function here here may be called multiple times on a given
file; would caching prior results for pathnames help or hurt performance?

--------------------------------------------------------------------------------

MORE BACKGROUND:

Long paths generate open() exceptions and other failures on Windows without
this fix.  These are relatively rare, but not ignorable: despite rationales
that this is mainly a developer issue, long paths have grown more common
in both personal archives, and saved web pages (titles become filenames!).

This is a Windows-only fix, and addresses limits in the Windows API itself,
not in filesystems (e.g., long paths and devices that fail on Windows work on
Mac OS X).  All filesystems in common use, including exFAT, FAT, and NTFS
support fle pathnames up to 32K with 255-character components:

   https://msdn.microsoft.com/en-us/library/
        windows/desktop/ee681827(v=vs.85).aspx#limits

Also note that Windows 10 removes the 260-character path limit, but for
compatibility reasons this is provided only as an option that must be
explicitly enabled via registry settings or Group Policy selections, and
may not apply to some existing installed programs.  It's also fully
irrelevant to some 1G machines running older versions of Windows:

   https://msdn.microsoft.com/en-us/library/
          windows/desktop/aa365247(v=vs.85).aspx#maxpath

Python 3.6's What's New oddly lists this as a 3.6 "fix."  In truth, this is
available for any program on Windows 10, but doesn't work for any program on
any prior Windows, including Python 3.6.  The '\\?\' fix works everywhere.

Other platforms are non-issues.  On Mac OS X, the total path limit is likely
1024 with 255 for each filename component, and on Linux these limits can be
4096 and 255, though details can vary per both filesystem and system version.
In any event, these platforms' limits are large enough to be ignored here.

--------------------------------------------------------------------------------

SCOPE (former tbd, now resolved):

Should long paths also be prefixed in contexts other than just open()?
If so, need to factor out path extension logic and use nearly everywhere,
which may be prohibitive.  However, files on too-long paths can be listed
without changing their pathnames, so this seems moot.

UPDATE: no, it's not moot -- os.lstat() fails and os.path.isfile() simply
returns False for files at long paths, unless these paths are also prefixed
(see docetc/miscnotes/demo-3.0-long-windows-paths-fix.txt).  Other Python
tools, including os.makedirs(), shutil.rmtree(), and os.walk() are also prone
to fail on unprefixed long Windows paths (unfortunately, pydev discussed but
rejected addressing this in the stdlib: http://bugs.python.org/issue18199).

Hence, the fix here was generalized from its original OPEN() into a broader
path-prefix tool, now used for all file-related calls in mergeall, diffall,
cpall, and ziptools; OPEN() is now just an example use case.  This incurs some
overhead and clutters code, but -- like symlink and cruft handling -- it is
required in tools like mergeall that process arbitrary data archives portably.

--------------------------------------------------------------------------------

UPDATE - USE ABSOLUTE PATHS TO CHECK LENGTH LIMITS:

For use with tools such as Python's os.path.isdir(), the length of the path
must apparently be that of its absolute+normalized version, _not_ its relative
form.  Per the best theory, Windows' GetFileAttributes, which os.path.isdir()
calls to avoid an os.stat(), seems to expand paths to absolute form by the
time they are used.  This is stated nowhere in the official MSDN docs above,
and even seems to be contradicted by them.  In Python:

   >>> relpath = r'extracted2\xfer-symlinks-winpaths\winpaths\llllllllllllllllll
   lllllllllllllllllllllllllllllll ...etc...'
   >>> len(relpath)                         # relative to cwd='.'
   195
   >>> os.path.isdir(relpath)               <== this should not fail @195 chars!
   False
   >>> os.path.isdir('\\\\?\\' + relpath)   # ok: \\?\ doesn't work on rel paths
   False

   >>> abspath = os.path.abspath(relpath)   # make abs and normalize
   >>> len(abspath)                           
   278
   >>> os.path.isdir(abspath)               # ok: 278-char abs form should fail
   False
   >>> os.path.isdir('\\\\?\\' + abspath)   **need to check abs form, not rel**
   True

   >>> abspath
   'C:\\MY-STUFF\\Code\\mergeall\\test\\test-symlinks\\windows-tests\\test-longp
   aths-symlinks\\extracted2\\ ...and the rest of the rel path above...'

Nor is this limited to just os.path.isdir() and its C lib call; it reflects
a more general nuance regarding relative paths in either Python or Windows:

   # these should not fail @195 chars!
   >>> os.listdir(relpath)
   FileNotFoundError: [WinError 3] The system cannot find the path specified: ...
   >>> os.lstat(relpath)
   FileNotFoundError: [WinError 3] The system cannot find the path specified: ...

   # these should fail @278 chars
   >>> os.listdir(abspath)
   FileNotFoundError: [WinError 3] The system cannot find the path specified: ...
   >>> os.lstat(abspath)
   FileNotFoundError: [WinError 3] The system cannot find the path specified: ...

   # but prefixing works
   >>> os.listdir('\\\\?\\' + abspath)
   ['gggggggggggggggggggggggggggggggggggggggggggggggggg']
   >>> os.lstat('\\\\?\\' + abspath)
   os.stat_result(st_mode=16895, st_ino=39687971716609265, ... )

For additional proof, see the Python test script and its results file in the
following folder; as demonstrated there, the limit is on _characters_, not bytes
(any Unicode encoding is irrelevant), and the limits for relative paths are
reduced by _exactly_ the length of the CWD's prefix (i.e., abs paths are used):

    docetc/miscnotes/Windows-path-limits

Hence, we now always expand paths to absolute form prior to length tests here.
This adds some performance cost on Windows, though as stated, os.path.abspath()
in Python is optimized to run a Windows API C function for much of its work.

The other options are to prefix _every_ Windows path here just in case, or use
DOS 8.3 abbreviated names with GetShortPathNames available via the win32api
add-on or the ctypes module.  Both seem overkill; universal prefixing would
add additional overheads _beyond_ abspath() for every Windows folder visited,
and 8.3 pathnames likely have issues all their own.  For the latter, try:

   https://duckduckgo.com/?q=python+GetShortPathName

Also note that the implementation here assumes Python uses only Windows API
Unicode tools that support the long-path prefix; not all do, and Python may
use MAX_PATH limits internally in some cases (e.g., bytes).  A review of the
Python C code invoked here, however, suggests that Python does not expand paths
to be absolute itself, leaving the Windows API as the more likely suspect.

Finally note that the absolute-path constraint arrived at here is a _theory_
derived from empirical data.  It is supported by neither official documentation
(which is incomplete and contradictory) nor source-code analysis (which is
impossible in closed-source systems).  Like all theories, this one will remain
in force while it matches observation, but no longer.  That is, it may be dead
wrong.  Alas, Windows development is science more often than it should be! 
==============================================================================
"""

from __future__ import print_function   # py 2.x
import sys, os

TracePaths = False             # show expanded paths? (here or arg)
TraceOpen  = False             # show paths for open only?

FileLimit  = 260 - 1           # 259 in Py, including 3 for drive, N for UNC
DirLimit   = FileLimit - 12    # 247 in Py, after reserving 12 for 8.3 name



def fixLongWindowsPath(pathname, force=False, limit=DirLimit, trace=TracePaths):
    """
    ------------------------------------------------------------------
    [3.0] Fix too-long paths on Windows (only) by prefixing as
    needed to invoke APIs that support extended-length paths.
    See this file's main docsting for more details on the fix.
    Call this before other Python file-path tools as required.
    Returns pathname either unaltered or with required expansion.
    
    Pass force=True to prefix on Windows regardless of length.
    This may be required to prefix pathnames on a just-in-case
    basis, for APIs like shutil.rmtree() and os.walk() that 
    recur into subfolders of unknown depth, and for libs that
    otherwise expand paths to unknown lengths (e.g., zipfile).

    This is given a "FWP" shorter synonym ahead for convenience:
    depending on your code it may wind up appearing at _every_
    stdlib file-tool call, but is a quick no-op where unneeded.
    ------------------------------------------------------------------
    """
    if not sys.platform.startswith('win'):
        # Mac, Linux, etc.: no worries
        return pathname
    else:
        abspathname = os.path.abspath(pathname)       # use abs len (see above)
        if len(abspathname) <= limit and not force:   # rel path len is moot
            # Windows path within limits: ok
            return pathname
        else:
            # Windows path too long: fix it
            pathname = abspathname                    # to absolute, and / => \
            extralenprefix = '\\\\?\\'                # i.e., \\?\ (or r'\\?'+'\\')
            if not pathname.startswith('\\\\'):       # i.e., \\   (or r'\\')
                # local drives: C:\
                pathname = extralenprefix + pathname  # C:\dir => \\?\C:\dir
            else:
                # network drives: \\...               # \\dev  => \\?\UNC\dev
                pathname = extralenprefix + 'UNC' + pathname[1:]
            if trace: print('Extended path =>', pathname[:60])
            return pathname



def unfixLongWindowsPath(pathname):
    """
    ------------------------------------------------------------------
    For contexts that require a just-in-case preemptive '\\?\'
    prefix (e.g., os.walk(), shutil.rmtree()), strip the prefix
    to restore the original pathname (mostly) when it is needed.
    
    May be required to get the normal folder name when using
    os.walk(); os.path.splitdrive() strips '\\?\' but also 'C:'.
    
    Note that this does NOT undo relative->absolute mapping:
    see os.path.isabs() and os.path.relpath() where required.
    ------------------------------------------------------------------
    """
    if not pathname.startswith('\\\\?\\'):      # never will on Mac, Linux
        return pathname                         # may or may not on Windows
    else:
        if pathname.startswith('\\\\?\\UNC'):
            return '\\' + pathname[7:]          # network: drop \\?\UNC, add \
        else:
            return pathname[4:]                 # local: drop \\?\ only



#---------------------------------------------------------------------
# Shorter synomyms for coding convenience
# FWP stands for "Fix Windows Paths" (officially)
#---------------------------------------------------------------------

FWP      = fixLongWindowsPath      # generic: use most-inclusive limit (dirs)
FWP_dir  = FWP                     # or force dir-path or file/other limits
FWP_file = lambda *pargs, **kargs: FWP(*pargs, limit=FileLimit, **kargs)
UFWP     = unfixLongWindowsPath



def OPEN(pathname, *pargs, **kargs):   # 3.X kwonly args not an option for trace
    """
    ------------------------------------------------------------------
    [3.0] Extend built-in open() to support long pathnames on Windows.
    To leverage, import and use this instead of the built-in open().
    Importing "as open" works but probably hides the magic too much.
    This was more before fixLongWindowsPath() was pulled out for other
    cases; it's now mostly an example use case for the function above.
    ------------------------------------------------------------------
    """
    pathname = fixLongWindowsPath(pathname)
    if TraceOpen: print('opening', pathname[:70])
    return open(pathname, *pargs, **kargs)



################################################################################
# self-test only follows
################################################################################

if __name__ == '__main__':
    """
    ------------------------------------------------------------------
    Self-test: edit pathnames as needed (and run on Windows)
    ------------------------------------------------------------------
    """
    import shutil      # used here only
    TraceOpen = True
    
    #
    # test1
    #
    print('-'*20, 'test 1 - normal short paths: local drive')
    path = r'C:\Users\mark\uni2.py'
    print('len=%d' % len(path), path[:60])
    f = open(path, 'r')
    print('bltin read worked:', f.readline().rstrip())
    f = OPEN(path, 'r')
    print('FWP read worked:  ', f.readline().rstrip())

    #
    # test2
    #
    print()
    print('-'*20, 'test 2 - normal short paths: network drive')
    path = r'\\readyshare.routerlogin.net\USB_Storage\.Trashes'
    print('len=%d' % len(path), path[:60])
    f = open(path, 'r')
    print('bltin read worked:', f.readline().rstrip())
    f = OPEN(path, 'r')
    print('FWP read worked:  ', f.readline().rstrip())

    #
    # test3
    #
    print()
    print('-'*20, 'test 3 - normal short paths: arguments')
    path = r'C:\Users\mark\temp.txt'
    print('len=%d' % len(path), path[:60])
    f = OPEN(path, mode='w')
    f.write('new spam\n')
    f.close()
    f = OPEN(path)
    print('FWP write+read worked:', f.readline().rstrip())

    
    #
    # test4
    #
    print()
    print('-'*20, 'test 4 - rare long paths: local drive')
    path = (r'C:\Users\mark'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678')

    # delete and recreate test tree dirs
    shutil.rmtree(FWP(path, force=True))       # always prefix: depth
    os.makedirs(FWP(path), exist_ok=True)      # no need to force: length known
    print('FWP folder delete+create worked')

    path = path + '/' + 'EGGS01234567890123456789.txt'
    print('len=%d' % len(path), path[:60])
    
    try:
        f = open(path, 'w')
    except Exception as E:
        print('bltin write failed as expected\n ', str(E)[:75])

    f = OPEN(path, 'w')
    f.write('local spam')
    f.close()
    f = OPEN(path)
    print('FWP write+read worked:', f.read().rstrip())

    try:
        f = open(path)
    except Exception as E:
        print('bltin read failed as expected\n ', str(E)[:75])

    # listdir (like all system calls!) needs FWP too
    print('FWP listdir:', os.listdir(FWP(os.path.dirname(path))))

    
    #
    # test5
    #
    print()
    print('-'*20, 'test 5 - rare long paths: network drive')
    path = (r'\\readyshare.routerlogin.net\USB_Storage\TransferMark'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\01234567890123456789012345678901234567890123456789'
                         r'\0123456789012345678901234567890123456789')

    # delete and recreate test tree dirs
    shutil.rmtree(FWP(path, force=True))       # always prefix: depth
    os.makedirs(FWP(path), exist_ok=True)      # no need to force: length known
    print('FWP folder delete+create worked')

    path = path + '/' + 'EGGS01234567890123456789.txt'
    print('len=%d' % len(path), path[:60])
    
    try:
        f = open(path, 'w')
    except Exception as E:
        print('bltin write failed as expected\n ', str(E)[:75])

    f = OPEN(path, 'w')
    f.write('network spam')
    f.close()
    print('FWP write+read worked:', OPEN(path).readline())

    try:
        f = open(path)
    except Exception as E:
        print('bltin read failed as expected\n ', str(E)[:75])

    # listdir (like all system calls!) needs FWP too
    print('FWP listdir:', os.listdir(FWP(os.path.dirname(path))))



#---------------------------------------------------------------------
# Expected output when TraceOpen=True (your drives will vary):
#---------------------------------------------------------------------
r"""
-------------------- test 1 - normal short paths: local drive
len=21 C:\Users\mark\uni2.py
bltin read worked: from tkinter import *  # 3.4, 3.5, 2.7 with Tkinter
opening C:\Users\mark\uni2.py
FWP read worked:   from tkinter import *  # 3.4, 3.5, 2.7 with Tkinter

-------------------- test 2 - normal short paths: network drive
len=49 \\readyshare.routerlogin.net\USB_Storage\.Trashes
bltin read worked: Suppress Mac trash retention for this drive.
opening \\readyshare.routerlogin.net\USB_Storage\.Trashes
FWP read worked:   Suppress Mac trash retention for this drive.

-------------------- test 3 - normal short paths: arguments
len=22 C:\Users\mark\temp.txt
opening C:\Users\mark\temp.txt
opening C:\Users\mark\temp.txt
FWP write+read worked: new spam

-------------------- test 4 - rare long paths: local drive
FWP folder delete+create worked
len=276 C:\Users\mark\0123456789012345678901234567890123456789012345
bltin write failed as expected
  [Errno 2] No such file or directory: 'C:\\Users\\mark\\01234567890123456789
opening \\?\C:\Users\mark\01234567890123456789012345678901234567890123456789\0
opening \\?\C:\Users\mark\01234567890123456789012345678901234567890123456789\0
FWP write+read worked: local spam
bltin read failed as expected
  [Errno 2] No such file or directory: 'C:\\Users\\mark\\01234567890123456789
FWP listdir: ['EGGS01234567890123456789.txt']

-------------------- test 5 - rare long paths: network drive
FWP folder delete+create worked
len=276 \\readyshare.routerlogin.net\USB_Storage\TransferMark\012345
bltin write failed as expected
  [Errno 2] No such file or directory: '\\\\readyshare.routerlogin.net\\USB_S
opening \\?\UNC\readyshare.routerlogin.net\USB_Storage\TransferMark\0123456789
opening \\?\UNC\readyshare.routerlogin.net\USB_Storage\TransferMark\0123456789
FWP write+read worked: network spam
bltin read failed as expected
  [Errno 2] No such file or directory: '\\\\readyshare.routerlogin.net\\USB_S
FWP listdir: ['EGGS01234567890123456789.txt']
"""



[Home page] Books Code Blog Python Author Train Find ©M.Lutz