File: mergeall-products/unzipped/fix-nonportable-filenames.py
#!/usr/bin/env python3 """ ======================================================================================= fix-nonportable-filenames.py: Replace all nonportable characters with "_" in all file, folder, and symlink names in an entire folder tree, or list nonportable items only. Version: a Mergeall and ziptools utility, Sep-26-2021 License: provided freely, but with no warranties of any kind Author: © M. Lutz (https://learning-python.com), 2021 Runs on: any Python 3.X, and any host platform Status: available in all Mergeall packages, as well as ziptools IMPORTANT: It is strongly recommended that this script be run before propagating content from Unix to platforms and drives which limit filename characters. This includes transfers to Windows, some Androids' shared storage, and FAT32, exFAT, and BDR drives. Else, nonportable filenames may fail and be skipped, both in Mergeall and other tools. Worse, some tools' automatic handling causes subtle problems: backslashes in Unix filenames may generate unintended folders on Windows; and filenames automatically mangled on Windows and drives to enable saves may both trigger file overwrites, and fail to match their originals on the source in later syncs unless source files are also mangled the same way. To avoid all such issues, run this script to make filenames portable before transfers to Windows and other limited contexts. This satisfies Mergeall's filename assumptions, and sidesteps issues inherent in the auto-mangling of names performed by the embedded ziptools when its -nomangle is not used. USAGE: python3 fix-nonportable-filenames.py folderrootpath [listonlyany] SUMMARY: This script replaces any of [\x00 / \ | < > ? * : "] with [_] in all file, folder, and symlink names in a folder tree. Use it to report or fix nonportable filenames for interoperability, before transferring content from Unix (e.g., macOS and Linux) to platforms and filesystems with character constraints. This includes transfers to Windows, but also some Androids' shared-storage (e.g., at /sdcard), as well as FAT32, exFAT, and BDR drives. UPDATE: This script was originally coded to address a macOS auto-mangle issue (see ahead), but it is broadly useful before transferring content from Unix to more restrictive platforms or drives, and is recommended by both ziptools and Mergeall. ziptools mangles have overwrite potential and are not applied in all use cases, and Mergeall assumes that FROM and TO names have both been adjusted as needed to match; running this script satisfies both apps' goals. CAUTION: When run with one argument, this script may rename files in a folder tree and has no automatic rollback of its changes. Read this docstring before use, and always test with a second argument for list-only mode before renames. Because cross-copy name collisions cannot be ruled out (see the next section), also run Mergeall in its "-report" report-only mode initially when propagating changes to contexts with filename restrictions. USAGE DETAILS ------------- This script replaces nonportable characters with a single '_' in all file, folder, and symlink names in an entire folder tree. Run it with one or two command-line arguments: pass the folder tree's pathname as the first argument; pass an optional second argument (of any value) to list rename candidates but not rename them. This script also verifies the run with a prompt to and input from standard streams. All characters in the _nonportables_ string below are considered nonportable and are replaced. On Unix, only NULL ('\x00') and '/' are invalid, but this varies per filesystem and OS; _nonportables_ accommodates the filename rules of Windows and various filesystems, including FAT32 and exFAT, and some Android's shared storage. This script also avoids generating duplicate filenames in the subject content copy detected during its run. It does so by appending duplicate ID numbers if needed to make changed names unique with their folder. For example, 'a|b' and 'a:b' both map to 'a_b' which would cause overwrites or failures unless made unique; this is resolved here by saving them as 'a_b' and 'a_b__2', respectively. The ID number prevents overwrites in this script's run, but it's not impossible that names changed here may be coincidentally the same as others in another content copy. This is astronomically unlikely, but run tools in report-only mode to be sure before syncing content between different copies. Also run this script's list-only mode first to preview its intentions; some nonportable characters may be deliberate. USAGE EXAMPLES -------------- ==List-only mode== /Code/mergeall$ python3 fix-nonportable-filenames.py ~/testfolder - FINDING the following without making changes: ['\x00', '/', '\\', '|', '<', '>', '?', '*', ':', '"'] Continue (y or n)? y /Users/me/testfolder/Subfolder/file?name|here.txt => /Users/me/testfolder/Subfolder/file_name_here.txt /Users/me/testfolder/Subfolder/duptest/a<b|c => /Users/me/testfolder/Subfolder/duptest/a_b_c__2 ****Duplicate to be resolved by filename Visited 12 files and 3 folders Total nonportable names found but unchanged: 2 ==Replacements mode== ~/MY-STUFF/Websites$ python3 $C/mergeall/fix-nonportable-filenames.py UNION REPLACING the following with "_" in all names: ['\x00', '/', '\\', '|', '<', '>', '?', '*', ':', '"'] Continue (y or n)? y UNION/android-tkinter/etc/query-[Tkinter-discuss] tkinter on android?.html => UNION/android-tkinter/etc/query-[Tkinter-discuss] tkinter on android_.html UNION/site-mobile-screenshots/8-ios-4"-safari.PNG => UNION/site-mobile-screenshots/8-ios-4_-safari.PNG ...etc... Visited 11477 files and 1275 folders Total nonportable names found and changed: 11 THE macOS MUNGE --------------- This script's original motivation was a macOS issue: nonportable filename characters are silently mapped to and from Unicode private codes on FAT32 and exFAT drives by macOS, but won't match names on macOS if served from another platform, thereby breaking back syncs. Because this script is now used more broadly, this original rationale's description has been trimmed here; see its online coverage at: learning-python.com/post-release-updates.html#nonportablefilenames The original macOS munge coverage trimmed here is also available in Mergeall at: ./docetc/MoreDocs/fix-nonportable-filenames-orig.txt Similar name mangling occurs when writing to BDR optical drives, and Linux raises errors and refuses to copy nonportable filenames to Windows-filesystem drives (e.g., exFAT) in both file explorers and command lines. Run this script before content copies to avoid all such issues. OTHER MANGLERS -------------- The related ziptools program (learning-python.com/ziptools.html) also auto mangles names that fail on unzips, but only on Windows. Names listed for removal by Mergeall's deltas.py are also auto mangled on Windows if they fail in unmangled form when applied. Both use cases have rare data-loss (overwrite) risks that can be avoided by running this script before transferring content to platforms with filename constraints. See ziptools' related coverage here: learning-python.com/ziptools/ziptools/_README.html#nomangle Mergeall itself does not mangle names of files copied from FROM to TO by syncs, only names deleted from deltas.py __added__.txt lists. This policy avoids out-of-sync and data-loss potentials. Instead, Mergeall assumes that both FROM and TO content names have been mangled as needed, by this script; transfer to an external drive; copies in file explorers, or unzipping with tools like ziptools. See Mergeall's brief related coverage here: learning-python.com/mergeall-products/unzipped/UserGuide.html#filenameportability ======================================================================================= """ # CODE import sys, os help = 'Usage: python3 fix-nonportable-filenames.py folderrootpath [listonlyany]' nonportables = ' \x00 / \\ | < > ? * : " '.replace(' ', '') # tbd: % \' + [ ] (^=fat?) replacements = {ord(c): ord('_') for c in nonportables} # get args try: root = sys.argv[1] assert os.path.isdir(root) listonly = len(sys.argv) > 2 except: print(help) sys.exit(1) # verify run display = [str(c) for c in nonportables] if listonly: print('FINDING the following without making changes: %s' % display) else: print('REPLACING the following with "_" in all names: %s' % display) if input('Continue (y or n)? ').lower() not in ['y', 'yes']: print('Run aborted.') sys.exit(1) print() # walk folder tree numrenames = numfiles = numdirs = 0 for (thisdir, subshere, fileshere) in os.walk(root): # for all folders in tree numdirs += len(subshere) # subs/fileshere include links numfiles += len(fileshere) for name in subshere + fileshere: # for all subfolders and files if any(c in name for c in nonportables): # replace illegals newname = name.translate(replacements) # re.sub would work here too newpath = os.path.join(thisdir, newname) # topdown: mods parents first oldpath = os.path.join(thisdir, name) # avoid duplicates: 'a|b' and 'a:b' both map to 'a_b' numdup = 1 newbase, newext = os.path.splitext(newpath) trypath = newpath while os.path.exists(trypath): numdup += 1 trypath = newbase + '__' + str(numdup) + newext # __dup# before .ext newpath = trypath print('', oldpath, '=>\n', newpath, end='\n\n') if numdup > 1: # this won't always print if listonly: no writes gen dups yet when = 'to be' if listonly else 'was' print('', '****Duplicate %s resolved by filename\n' % when) # rename file or dir numrenames += 1 if not listonly: os.rename(oldpath, newpath) # tell the walker about the new name for the next step if name in subshere and not listonly: subshere.remove(name) subshere.append(newname) # okay to change in-place: "+" made a copy action = 'but unchanged' if listonly else 'and changed' print('Visited %d files and %d folders' % (numfiles, numdirs)) print('Total nonportable names found %s: %d' % (action, numrenames))