mergeall — Revisions History

Last updated: October 16, 2022

Introduction

This document, f/k/a Readme.html and aimed primarily at developers, describes changes made in each released version of mergeall, and provides additional context along the way. This document also includes overviews from its original role as a first-level README; as these are now dated and mostly redundant with other resources, they have been moved to the end as optional reading. For up-to-date usage fundamentals, see instead the User Guide. For additional project and usage background details, see also the original (and also somewhat dated) Whitepaper.

Please note: Apart from its coverage of the latest releases, this document is no longer actively maintained; its style was largely frozen years ago, and much of its material is now project history only. All of the older screenshots and logs referenced in this document have also been removed to minimize mergeall package size, and their links were deleted during docs reorganization. See the newer screenshots collection for up-to-date GUI images, and please excuse the shortage of off-page context here.

Contents

  1. Version History
  2. (Defunct) Usage Summary
  3. (Defunct) System Summary
  4. (Defunct) See-Also Links

 

Version History

This section—the majority of this document—lists both changes and usage notes, grouped by release version. To find a version's specific changes in the source code, search for its release number in the source code files (e.g., search for "3.0" in the ".py" and ".pyw" Python source files to view version 3.0 code changes).

Contents here:

Version 3.3 Oct-2022 Unicode normalization for filename matching Changes  
Version 3.2 Oct-2021 Add deltas.py to save changes separately Changes  
Version 3.1 Dec-2017 Folder modtimes, Linux exe flushes, scripts "-u" Changes  
Version 3.0 Jun-2017 GUI redesign, Mac OS X port, cruft, app/exe, etc. Changes  
Version 2.4 Aug-31-16 Quiet logging mode, screenshot thumbs Changes  
Version 2.3 Apr-24-16 Patch add-file encoding, 2.X makedirs Changes Usage Notes
Version 2.2 Sep-25-15 Faster execution with os.scandir() Changes  
Version 2.1 Mar-31-15 Automatic restores from backups Changes  
Version 2.0 Mar-18-15 Backups for changes, smarter GUI, etc. Changes Usage Notes
Version 1.7.1 Oct-28-14 Error message fix, usage note update Changes Usage Notes
Version 1.7 Oct-10-14 Extend report, minor fixes/upgrades Changes Usage Notes
Version 1.6 Jul-27-14 Python 2.X Unicode fix, verify quit Changes  
Version 1.5 May-5-14 Linux port patch + notes Changes Usage Notes
Version 1.4 Mar-27-14 GUI threading, multiple updates + notes Changes Usage Notes
Version 1.3 Feb-27-14 FAT 2-second modtime range fix Changes  
Version 1.2 Feb-15-14 Launchers Unicode fix Changes  
Version 1.1 Feb-10-14 Add GUI+Console launchers Changes  
Version 1.0 Feb-1-14 Initial release    



Version 3.3, Oct-2022

Summary: add Unicode normalization for filename matching, rebuild packages

Version 3.3 was rereleased on October 16, 2022 in all its download packages—source code, macOS app, Windows exe, and Linux executable—with improved path-normalization logic for Unicode variants. The new coding handles more path syntax, but is used only on platforms that do not auto-normalize paths (e.g., Windows, Linux, and Android app-private). For details, see function matchUnicodePathnames() in fixunicodedups.py, as well as the new path-normalization demo. This release picks up all prior 3.3 changes, including source-only mods of September.

Version 3.3 was rereleased on March 14, 2022, with minor changes to the GUI and summary report, as well as rebuilds of all app and executable packages to make them current with the 3.3 source-code package (all now include all 3.3 and 3.2 changes). Because the GUI is optional and no core Mergeall utility was changed, this repackaging is considered a 3.3 point release. Note that while all download packages now run the latest 3.3, the source-code is still recommended if others won't work in your use case.

Major 3.3 Changes

Version 3.3, published initially on December 28, 2021, adds just one main feature, which was nevertheless crucial enough to warrant a new release. Namely, 3.3 now performs Unicode normalization on filenames before comparing them, to avoid potential skew. This is a subtle issue that cropped up for Mergeall's developer just once in 8 years of use, but may be more common and perilous for content with many non-ASCII filenames maintained across multiple platforms.

In short, the Unicode standard allows the same text to be represented with different code-point strings in its decoded form. The string 'Liñux.png', for example, can have two equivalent but unequal values in memory: the two mean the same thing semantically, but will not match per Python's == or in tests or similar. This guarantees interoperability problems, and impacts text-processing code broadly. You can read the theory behind this here.

In Mergeall specifically, when such variant strings appear as filenames, they must be normalized (i.e., converted to a common form) for comparisons. Else, matches may be missed, resulting in content skew. While some such skew may be automatically repaired by syncs, missing a matching folder name can trigger pointless folder copies. In worse cases, this can yield duplicate and out-of-sync data, especially when applying delta sets. Mergeall 3.3 avoids all such perils, by normalizing filenames for comparison, using unnormalized names for file access, and doing so globally. Mergeall 3.3 also normalizes full paths saved in 3.2's deltas sets to match variants in destination trees, on platforms where this matters.

For more details and examples, please see the docstring at the top of the new source file fixunicodedups.py. Most new code appears in that file, but numerous smaller changes were required elsewhere; as usual, search for "[3.3]" in the code for full fidelity. This change affects the behavior of the scripts mergeall.py, diffall.py, and deltas.py specifically, but some modules were also modified. In addition, the -quiet switch now suppresses normalization messages in all three scripts. For formal tests of this change, see also the 3.3 tests/demos folder.

Additional 3.3 Notes


Version 3.2, Oct-2021

Summary: add and support deltas.py for saving changes separately

Major 3.2 Changes

Version 3.2, released on October 28, 2021, adds a new major script, deltas.py, which implements an alternative run mode. This script detects changes in FROM as usual, but then saves them to a separate folder, instead of applying them to TO immediately. The saved-changes folder can serve multiple roles. For one, it can be archived as an incremental set after burning a full copy. For another, because it's formatted the same as Mergeall backups, it can be applied to TO later and on demand with the -restore mode of mergeall.py.

See the new script's top-of-file docstring for full details; browse its demo folder for examples; and explore the Android Deltas Sync use case here, here, and here. The latter of these employs components of Mergeall as a small software stack. Supporting the new deltas script also required a handful of changes to Mergeall's base code; search it for "[3.2]" in code to find related mods.

Additional 3.2 Notes

While the new deltas mode is this version's primary change, 3.2 also:

This release is limited and provisional: its changes are currently available only in Mergeall's source-code package. If and when they are propagated to all other packages (apps and executables), this will qualify as a final version 3.2. Please check back here for future developments (and see the 3.3 update).

Potential mods for a final 3.2 include a dark mode for its GUI. A final 3.2 may also integrate changes required to run Mergeall's GUI on Android (shipped here and described here), but these patches require using a specific commercial app which comes with both tkinter glitches and freemium advertising, and may be best managed out of band.


Version 3.1, Dec-2017

Summary: folder modtimes, Linux exe flushes, scripts "-u"

Changes

This minor-enhancements version was released in all packages—source, app, and executables—and is a recommended upgrade for all prior-version users. There were no changes to user configurations or the Mergeall GUI in this release, and version 3.0's screenshots still reflect the state of the system in 3.1. Version 3.1 includes the following enhancements (tagged with "[3.1]" in the code):

  1. Mergeall and cpall—all forms: propagate folder modtimes to copies

    Both the mergeall.py and cpall.py scripts now propagate source-folder modification times to new destination-folder copies on platforms that support this, just as they formerly did for simple files and symlinks. This change was implemented for all package formats (source, app, and executable) of these two programs, and is naturally inherited by Mergeall's launcher GUI. It has been seen to work well on Mac OS, Windows, and Linux, though Mac OS required a coding workaround to handle exFAT drives properly, and not all combinations of platforms, filesystems, and drivers may support folder modtime updates.

    Folder (a.k.a. directory) modtimes are less significant than others and were formerly ignored, because they are not used by mergeall's incremental-updates logic and do not influence its results, and may be changed whenever any contained item is changed. The latter of these factors can render folder modtimes nearly useless. On Mac OS, for example, a folder's modtime changes whenever Finder adds a hidden ".DS_Store" file to it; hence, simply viewing a folder is enough to lose its original modtime! Still, folder updates history may be useful enough in some contexts to preserve where possible, especially on systems that do not differentiate folders and files in listings.

    Implementation details: because mergeall uses cpall's copytree() to copy folders in all contexts, folder modtime propagation required just a post-processing step in copytree(), run at the end of each call to this function (including any recursive-level calls). This satisfies the requirement that modtimes be copied after a tree is fully processed, else new folder copy times may be updated automatically when their nested content is copied. This also avoids having to queue modtimes to be copied later, as done in the related ziptools program.

  2. Mergeall—Linux executable: flush output to avoid GUI pauses

    The mergeall.py script now forcibly flushes its output lines in the Linux executable, so they appear immediately in the GUI. This was formerly implemented for the Windows executable (along with Unicode translations not needed or used on Linux). It is not required and was not implemented for the Mac app or the source-code version on Linux and elsewhere, as Python's "-u" flag is forced by the GUI in both contexts. For source code, Python's "-u" flag or PYTHONUNBUFFERED setting can be used to disable buffering selectively.

    Implementation details: output flushing changes print() to a custom version, instead of always using print()'s flush=True (which is available only in Python 3.3+) or PyInstaller "spec" files (which complicate builds and offer less control). A stream-proxy class would work too, but the custom print() was already coded for Windows. The GUI also had already been setting PYTHONUNBUFFERED before the spawn, with no effect (which seems a PyInstaller issue). See subprocproxy.py in PyEdit for related context and notes.

  3. Diffall and cpall—Mac apps: support unbuffered output with "-u"

    The diffall.py script grew a "-u" command-line argument to make its output unbuffered. This is useful for watching diffall's output with a Unix "tail" in the Mac app or Linux executable, where Python's own "-u" flag cannot be used, and its PYTHONUNBUFFERED environment equivalent may go unnoticed. It's irrelevant on Windows, and is not required when using source-code (use Python's "-u" instead). For symmetry, a "-u" switch was added to cpall.py for use when tailing its Mac app too. For more usage-level details, see this post.

    Implementation details: Unbuffered stdout may now be standard for frozen Mac apps per a recent py2App change, but the two Mergeall scripts here use a stream-proxy class for platform- and version-neutral control. See subprocproxy.py in PyEdit for related context and notes. Note that a "-u" flag was not added to the mergeall.py script, because its stdout is flushed forcibly in all contexts when spawned by the GUI launcher (mergeall.py's primary role), and can be flushed optionally using Python's own "-u" when the script is run as source code. This may be less than orthogonal, but no use case has arisen to justify a redesign.

  4. Mac app package—retain original resource-file modtimes

    The Mac app's build script was redesigned to copy all extra resource items manually, in order to preserve their original modification times. These times are especially crucial in Mergeall's test folder where modtimes influence test results, and py2app's copy policy for its automatic "--resources" option did not propagate modtimes correctly. Resource modtimes were already correct in the source-code package (built by ziptools), as well as for files in the Windows and Linux executable packages (built by PyInstaller, but using manual resource copies that were tweaked to use shutil's copy2() for top-level file times).

  5. Etcetera—utility Python 2.X support, terminology skew

    The fix-fat-dst-modtimes.py utility script works on Python 2.X again (it formerly used a keyword argument in os.utime() unsupported in 2.X). Even so, this script's status has declined over time, because most users are better off addressing DST rollovers by formatting external drives as exFAT. All other scripts were reverified to run under 2.X.

    There has also been some attempt to more consistently use "Mergeall" to denote the system at large, and "mergeall" to refer to just the mergeall.py script. Given the volume of documentation this project has spawned over its 4-year run, though, this convention might not be adopted universally for quite some time...


Version 3.0, Jun-2017

Summary:

Changes

Per the preceding summary list, this was a major release, initially started as a port to Mac OS X, and expanded with new features over many months of development. The following list describes 3.0's most prominent changes, but is not complete. For a more exhaustive look at this version's changes, search mergeall's source files for string "[3.0]".

  1. {all} Mac OS X port

    mergeall, its GUI and console launchers, and its accompanying programs including diffall and cpall, have all been ported to run on Mac OS X, in addition to their prior support of Window and Linux.

    The underlying mergeall.py script worked on the Mac largely unchanged, as it was coded to be portable, and formerly ported to Unix-like Linux. However, it required changes to avoid using Python 3.5+'s os.scandir() on Macs only, and eventually replaced the os.scandir() variant altogether with a recoding that uses saved os.lstat() results. Before it was dropped, the os.scandir() call ran quicker than os.listdir() on Windows and Linux, but 3 times slower on Mac OS X as used by mergeall (see 2.2's notes).

    In addition, the Mac OS X port necessitated numerous changes to the GUI launcher:


    Beyond GUI and scandir() impacts, the Mac port motivated broader functional changes, most notably the new cruft-file skipping modes and removal scripts (see ahead) and symlinks support which grew to include both Unix and Windows (also ahead). On the upside, mergeall and its GUI are now fully usable on the Mac, and merges both run quickly and seem to sidestep some Unicode-filename issues described in this document that now appear confined to Windows.

    For more details on Mac OS port changes, search for "darwin"—the Mac's platform name in Python—in the system's source files.

  2. {launchers} GUI redesign: disable versus erase, better labels and text, etc.

    As most users are likely to launch mergeall with its GUI, some work was devoted to improving its ergonomics and utility. For example, the GUI launcher now disables and enables widgets as they fall in and out of relevance, instead of erasing and redrawing them. This was initially motivated by the Mac OS X port—where a redraw can trigger a visible flash—but proved subjectively less chaotic on other platforms too.

    The mergeall GUI was also polished in other ways, including bold section headers; more descriptive selection labels and dialogs text; new toggles to suppress comparison messages and logfile popups; and less-dense dialog text layout that yields better readability on Mac and Linux. See the screenshots page and folder for examples of the GUI and its dialogs in actions.

  3. {launchers} Suppressible comparison-phase messages

    Also motivated by the Mac OS X port, but useful elsewhere: the GUI grew a new toggle to suppress per-folder comparison-phase messages in the GUI only. These messages serve as status indicators if enabled and still appear in the saved mergeall log file even if suppressed. However, suppressing them in the GUI avoids some clutter, and, more critically, avoids delays for results if the GUI scrolls messages more slowly than the underlying mergeall process generates them. When not suppressed, the GUI's scrolling may continue to run after the mergeall process has already finished, artificially inflating the merge's apparent runtime.

    Although text scrolling may add a trivial handful of seconds on Windows and Linux, it adds an especially long delay on Mac OS X. On Macs, the currently recommended install's Tk 8.5 Text widget scrolls text messages some 30 times slower than mergeall prints them. In one test, mergeall may finish in 2 seconds on a Mac, but the GUI's scrolling of its output can run for one minute before the final results are displayed. Because of this, the new suppression toggle is enabled by default on Macs, but disabled on Windows and Linux where the GUI largely keeps up with mergeall.

    This is user-switchable because other platforms may benefit from disabling messages on slower machines, and the Mac speed issue may be addressed in future Tks (if it's not already fixed in Tk 8.6—to be tested). The new toggle is also dynamic: it can be enabled and disabled any time during a mergeall run to turn comparison messages on and off.

  4. {launchers, mergeall} New configurations: text area, editor popups, cruft, and more

    The GUI now supports a much wider variety of user-configurable options, defined and customizable in the top-level mergeall_configs.py. Among these, users can now tailor the colors, font, and initial sizes of the scrollable message text area; can specify a default for the log-file saves folder that overrides the per-platform Desktop path; and enabled or disable the automatic text editor popup after mergeall runs for viewing a saved logfile (this later became an initial value for the popup's toggle added to the GUI). Cruft filename patterns are also defined in this file to support user customization, though they are mostly off interest to advanced users (see ahead). The maximum-backups-retentions setting is still present as before.

  5. {launchers} Linux app icon

    mergeall's GUI launcher now sets its windows' app-bar icons on Linux platforms to a custom image. Windows sets window icons as before, but Mac OS X does not currently set custom icons as these seem outside the scope of source-code based programs on that platform (update: the Mac app distribution added later fully supports all Mac icon contexts, and seems required for icons on this platform).

  6. {all} Skipping cruft files: handling platform-specific metadata

    Given mergeall's new portability to Windows, Linux, and Max OS X, support has been added for explicit handling of platform-specific metadata files (a.k.a. "cruft"). This is especially important on Mac OS X, which adds numerous hidden files to content, that have no purpose outside a Mac, and may be undesirable in cross-platform archive copies. To this end, mergeall 3.0 provide two new tools:
    For more on the new cruft-skipping tools, see the new User Guide's coverage as well as its cross-platform pointers; the cruft filename patterns and examples in mergeall_configs.py; and the background notes in nuke_cruft_files.py. Note to Mac users: mergeall itself copies just data forks (normal file content), not resource forks, and does not merge resource forks back to data forks if they are present; see dot_clean to address the latter, and the User Guide for more background.

  7. {mergeall} Support for Windows and Unix symlinks

    New in this release, mergeall supports propagating symbolic links on both Windows and Unix (Mac OS X and Linux), subject to platform and portability constraints enumerated in the User Guide. When present, symlinks are always copied, not followed, to avoid duplicating data. For a tool that also supports link following, see the ziptools system.

  8. {mergeall} Support long pathnames on Windows

    The new module fixlongpaths.py provides tools that support very-long pathnames on Windows. It does so by mutating too-long pathnames to use a "\\?" prefix ("'\\?\UNC\" for network paths), which automatically enables extended-path Windows API tools (these tools are no-ops on Unix). mergeall, diffall, and cpall all use these tools for every pathname passed to system calls, as well as those passed to recursive tree walkers. The net effect lifts the normal 260-character pathname limit to 32k characters on this platform.

    Long pathnames typically crop up in saved webpage folders; they formerly generated error messages and failed to update in mergeall, but can now be processed normally. See the new module for more details, and search for FWP (uppercase) in mergeall's source files for the new module's clients; ziptools uses these tools as well.

  9. {mergeall, diffall} Code and algorithm optimizations

    Some work was done in this release to optimize the code in the mergeall and diffall programs. Specifically, repeated scans of listing result were eliminated, and os.path.join() calls were replaced with possibly simpler direct os.sep concatenations (the former change also improved diffall reports, by reporting missed files before subdirectories).

    In the end, most optimization attempts were fruitless, as the time spent in either system calls or file I/O far overshadowed the speed of mergeall programs' code. One exception: on Windows, the time required to compare two very large archive copies fell from 19 to 14 seconds on Pythons 3.4 and older (which use os.listdir()). However, there was no impact to a mergeall 7.2 second runtime on Pythons 3.5+ (which use an os.scandir() variant that fully accounts for its faster speed), or diffall (which spends nearly all of its time reading files byte-for-byte).

    Also in this category: the comparison phase in mergeall was recoded to use saved os.lstat() results, which made it as fast as its former os.scandir() variant on Windows; the os.scandir() branch was subsequently dropped. For more details, see comparetrees() in mergeall.py, and the main docstring in diffall.py. For timing results, see this folder.

  10. {launchers} Sanitize non-BMP Unicode characters in scrolled mergeall text

    Tk 8.6 and earlier, used by the tkinter Python module underlying mergeall's GUI, cannot display Unicode characters whose codepoints fall outside the BMP (UCS2) range of U+0000..U+FFFF. This includes newer "emoji" characters; when such non-BMP characters are used in filenames, they formerly killed the GUI with an uncaught exception when the GUI attempted to insert them in the scrolled text area.

    To work around this, mergeall now replaces all non-BMP characters in displayed text with the standard Unicode replacement character, U+FFFD, which Tk displays as a highlighted question mark diamond. This workaround was coded to assume that Tk 8.7—to be supported in a future but unknown Python release—will lift the BMP restriction, per a developer forums post. For details, see fixTkBMP() in the GUI launcher.

  11. {cpall, mergeall} Ignore spurious Mac exceptions from shutl.copystat()

    The cpall.copyfile() function used by mergeall now suppresses and ignores EINVAL (a.k.a. error number 22, "Invalid argument") if it is raised by Python's shutil.copystat(). On Mac OS X, shutil.copystat() can fail this way due to an error raised by Mac libraries when trying to copy extended attributes with chflags() from a file on a Mac filesystem drive (e.g., HFS+) to a file on a non-Mac filesystem drive (e.g., FAT32 or exFAT).

    This error occurs after all content and times have been copied, so it's safe to ignore in this context. It also occurs at the shell on "cp -p", so it's likely a Mac issue. This cropped up in mergeall for all files saved with Mac's TextEdit, which adds an extended attribute for Unicode encoding type, but can also occur in other contexts such as files marked as quarantined. For more details, see the main docstring in cpall.py, and the shell session log mac-chflags-error22.txt.

  12. {docs, packaging} New user guide, new folder structure

    A completely new user guide was developed: UserGuide.html, shipped in the package's top-level folder. This new user guide is designed to be more user-focused, and provides a less technically heavy overview of the system and its GUI. It largely subsumes the former documentation, which was more implementation- and project-focused, and arguably less approachable for end users. Nevertheless, the original documents are still shipped in folder docs/MoreDocs for now:
    In addition, the original top-level launcher-config folder was demoted to a docetc subfolder due to its declining relevance; the somewhat dated Lessons-Learned.html was kept for its implementation notes; and a new Tools folder ships with line-end conversion and color-chooser utility scripts.

  13. {screenshots} New screenshots and examples (older items dropped)

    New screenshots were taken for this release on all three of its supported platforms, and new example session logs were compiled, including logs from all three platforms formatted as HTML for readability. In light of the new screenshots and logs, to reduce the size of the program's distribution package all prior screenshots were dropped from the package, and their links in docs were scrubbed.

  14. {packaging} New "frozen" distributions: Mac app, Windows and Linux executables

    In addition to its original source-code distribution, mergeall is now available in Mac app, Windows executable, and Linux executable forms. The new forms run on just one platform, but do not require a Python install. For more details on these new packages, see the README file, and the mergeall downloads page.

  15. {assorted} And so on

    Version 3.0 incorporates additional enhancements, including:


    And more; again, search for "[3.0]" in the system's source-code files for all changes.


Version 2.4, Aug-31-16

Summary: quiet log messages mode, screenshot thumbnail pages

Changes

This is a minor enhancement release, adding two user-visible functionality upgrades. For all code changes applied in this release, search for "[2.4]" in its recently modified source files.

  1. {mergeall, launchers} Add quiet log messages mode

    Both the main script and the GUI and console launchers now allow users to suppress per-file backup messages in the generated output. These are informational and may be of interest to new users, but are arguably superfluous once the system's operation usage is clear, because files being replaced or removed are already displayed. In large merges, the extra lines decrease report readability. To support the new quiet mode:


    When quiet mode is selected—by command-line, GUI, or console—the system still generates one message indicating that backups mode is enabled and giving the backups folder path, but it does not print a backups message for every file replaced or removed. Users may still inspect the backups folder to see results.

  2. {screenshots} Add thumbnails pages for screenshots folders

    To make the screenshots collection easier to browse, thumbnail image index pages were added to the screenshots root folder, as well its subfolders. See the new root index page, and click on its subfolder links. The subfolders display their own thumbs pages automatically on a server; click their "index.html" files manually if viewing offline in a file explorer. These pages are courtesy of the Python-coded thumbspage program.


Version 2.3, Apr-24-16

Summary: patches for __added__.txt Unicode encoding and Python 2.X os.makedirs() calls; filename dashes usage note

Changes

This is a minor patch release, to address two issues of minimal impact. No screenshots were retaken for this release, and documentation changes pertain to this release's changes only. For all code changes applied in this release, search for "[2.3]" in source file backup.py.

  1. {mergeall} Use explicit UTF8 (by default) for __added__.txt encoding

    In both Python 3.X and 2.X, use an explicit UTF8 Unicode encoding, instead of the platform default encoding, for writing and reading the __added__.txt files created in backups mode for use in 2.1 emergency restores. These files reside in per-run __bkp__ subfolders, and are used for backing out prior archive additions. The new preset UTF8 encoding should suffice for most use cases, but can be changed in code if required; see backup.py's ADDENC setting.

    This is a minor change unlikely to impact most users (if any at all), as both unencodable filenames and emergency restores are very rare. Without it, a new file whose name could not be encoded per the local Unicode default would be added to the TO archive normally, but also generate an error message in the mergeall log, and not be removed from the archive automatically by a future emergency restore.

    This change is also expected to be largely backward-compatible: because ASCII is a subset of UTF8, this should not have any major impact for most users' __added__.txt files written before this change was applied.

  2. {mergeall} Use code portable to Python 2.X for os.makedirs() calls

    Python's os.makedirs(), used in backup-mode runs, supports an exists_ok switch in 3.X only that suppresses an exception if the path already exists. To support backup-mode use on 2.X, specialize all makedirs calls on 2.X to emulate the 3.X exists_ok behavior without passing the 3.X-only argument. This patch applies to 2.X users only, but is crucial for such users. Without it, nearly all backup-mode mergeall runs will fail on 2.X with exceptions.

    Note that use on Python 2.X is now generally discouraged, as 3.X has better support for Unicode; 3.5+ allows for much faster execution since mergeall version 2.2; and mergeall's development "staff" has limited resources for 2.X testing. As a random compatibility example, filenames with odd characters may still be skipped by mergeall in 2.X only, because that Python's os module fails to classify them as either file or directory on Windows (unlike 3.X).

    In retrospect, supporting both Python lines in a system-level tool like mergeall has proven to be substantial effort, and probably prohibitive in this project's context. Library differences can impact code more than language differences, and are often more complex to accommodate. While mergeall largely works the same on 2.X, and 2.X usage is not deprecated, please run mergeall on 3.X if at all possible.

Usage Notes

  1. More Windows FAT32 filename character mangling: emdash versus ASCII dash

    This note describes a very rare mergeall usage issue, not a mergeall bug or change. An erroneous translation of dashes in filenames was recently observed on a FAT32 device, which seems related to the accent-morphing issue described earlier (ahead) for mergeall versions 1.7.1 and 1.7. To date this has been seen only on one USB flashdrive and Windows 7, but potentially applies to any FAT32 drive.

    Specifically, the content-based diffall script reported a spurious file difference not noted by the timestamp-based mergeall. This happened on a FAT32 device containing two files of differing content, whose names differed only in one character position which was an ASCII dash ("-") in one and a Unicode emdash ("—") in the other. For example, with paths and some output omitted for space:
    c:\test> dir /B "d:\xxxxxx*"
    xxxxxx - xxxxxx.htm
    xxxxxx — xxxxxx.htm
    
    c:\test> dir "d:\xxxxxx*"
    04/03/2016  09:46 AM            50,444 xxxxxx - xxxxxx.htm
    04/15/2016  11:30 AM            50,573 xxxxxx — xxxxxx.htm
    
    When both such files are present on a FAT32 drive, the Windows operating system may return the wrong file's content for a given filename, because it internally maps the emdash to an ASCII dash. This in turn causes diffall to register a false file difference.

    Because this occurs in the filesystem level of the operating system, it may not be addressable in Python code—filename dashes passed correctly by a Python script are mishandled after they are received by an open() call. In fact, this issue extends beyond Python: the two files in question also incorrectly report a difference in a Windows/DOS "fc" command line despite having identical content.

    For instance, in the following command-line session, the same issue crops up when comparing same-named files on an SSD (NTFS filesystem) and USB flashdrive (FAT32 filesystem) having names with an embedded emdash. Curiously, comparisons fail only after similarly named files with an ASCII-dash have been accessed once; prior to that, the emdash files compare the same correctly, suggesting that caching may be a factor:
    # After either a fresh insert or removal+reinstert of a FAT32 USB flashdrive on d:
    
    c:\test> fc "c:\xxxxxx — xxxxxx.htm" "d:\xxxxxx — xxxxxx.htm"
    Comparing files C:\xxxxxx — xxxxxx.htm and D:\XXXXXX — XXXXXX.HTM
    FC: no differences encountered
    
    c:\test> fc "c:\xxxxxx - xxxxxx.htm" "d:\xxxxxx - xxxxxx.htm"
    Comparing files C:\xxxxxx - xxxxxx.htm and D:\xxxxxx - xxxxxx.HTM
    FC: no differences encountered
    
    c:\test> fc "c:\xxxxxx — xxxxxx.htm" "d:\xxxxxx — xxxxxx.htm"
    Comparing files C:\xxxxxx — xxxxxx.htm and D:\XXXXXX — XXXXXX.HTM
    ***** C:\xxxxxx — xxxxxx.htm
                    <meta name="bitly-verification" content="3xx1017cyy1d"/>
                    <title>xxxxxx ΓÇö xxxxxx
    
    ***** D:\XXXXXX — XXXXXX.HTM
                    <meta name="bitly-verification" content="3xx1017cyy1d"/>
                    <title>xxxxxx - xxxxxx                                       # <= ASCII-dash content
    *****
    # ....Plus many more diffs....
    
    This issue wasn't addressed in mergeall, because it may be impossible to fix at the Python level, and seems rare in the extreme—it has been witnessed only once in two years of frequent mergeall usage; may be limited to a subset of devices used on Windows; and can occur only for folders containing files with names identical apart from alternative dash characters in the same positions.

    Should this recur anyhow, the suggested workaround is to either ignore the diffall differences, or simply adjust your filenames. Formatting USB drives with NTFS may help, but this may also impact drive performance, and is to be determined.

    For more hints on the convoluted—and even tortuous—underlying operating-system issue, see this forum thread, or this Microsoft page. I'd report this as a bug to Microsoft, but a Windows fix for this seems as likely as ski-lift tickets in Hades (no, really).


Version 2.2, Sep-25-15

Summary: faster execution with os.scandir() using Python 3.5+ or PyPI package install

This version was repackaged three times after its initial release:

On Jan-27-16 with minor code and doc changes:
Correct the script name in diffall.py's usage message; add total runtime in diffall.py's report; and add documentation notes about common role, cross-platform restores, and diffall purpose.

On Nov-10-15 with doc changes only:
New font, header, and toolbar styling; minor content tweaks; and updated URLs for book site relocation.

On Oct-2-15 with minor doc changes only:
Document new folder dialog on Windows.

Changes

Update for version 3.0: The scandir() optimization described below ran comparisons 5X-10X faster on Windows and 2X faster on Linux, but proved to run 3X slower on Mac OS X, as used by mergeall. Consequently, mergeall 3.0 used this call on Windows and Linux, but not on Mac OS X. A later recoding to use saved os.lstat() results eventually made the non-scandir() variant as fast on Windows and Linux, and made the scandir() optimization obsolete. For more details, see comments in the comparison-phase code of mergeall.py.

Version 2.2 speeds up tree comparisons radically by using the new os.scandir() call, which is standard in Python 3.5 and later, and available separately as a PyPI package for other Pythons, including 2.7. In tests on Windows, the mergeall tree comparison phase runs 5 to 10 times quicker when the 2.2 optimization is used, depending on devices and trees. For larger trees, this can shave dozens of seconds off total runtime, and more on slower machines. If the scandir() call is not present in the os module or a separate install, mergeall falls back on the original os.listdir() scheme to support older Pythons (though a scandir() is now recommended for performance).

mergeall's resolution phase was not optimized, because it is bound by file write times, and visits only differences. Because the optimized tree comparison phase always scans two trees exhaustively, however, it can dominate mergeall runtimes, especially when there are relatively few changes in large trees. This change impacts only the mergeall.py script, whose output was augmented with an initial line indicating use of the new optimization, plus lines giving runtime for each of its phases.

Other: As no changes were made to the GUI apart from a new version number, most prior screenshots were not retaken for this release. One new screenshot was taken on Windows 10 as described in the list below, and a new folder-browse dialog screenshot was taken for its new and improved native format on Windows as of Python 3.5. The new folder dialog reflects a change in Python 3.5 (really, in the latest version of the Tk 8.6 library it includes), not in mergeall code; see this overview and the Tk changes note for more details. Documentation was also revamped for this release as usual (and restyled for the Nov-10 repackaging).

For more details, see:


Version 2.1, Mar-31-15

Summary: automatic restores (a.k.a. rollbacks) from automatic backups

April Updates

After its March release, this version was repackaged—most recently on Apr-29-15—with only very minor changes to its documentation files and retaken screenshots for its Ultrabook, Windows tablet, and Linux use case. As these changes did not impact any functionality, a new version number was not warranted.

March Release

Version 2.1 was an afterthought to 2.0. By using and extending 2.0's automatic change backups, 2.1 supports complete and automatic rollback of an immediately preceding run's changes, including additions, as a failsafe for catastrophic or emergency scenarios.

Changes

  1. {mergeall + docs} Automatic restores from automatic backups

    Added support for complete rollback of a prior run's changes, by extending the 2.0 "-backup" option and adding a new "-restore" option in mergeall.py to allow changes to be undone by merging from a __bkp__ folder's date/time subfolder to its archive's root. These changes are invoked in consecutive mergeall runs:

    1. Synchronize run: The existing "-backup" option saves replaced and removed items in the TO folder's __bkp__ as before, but was extended to also list items added to the TO tree in a new __added__.txt file at the top of a __bkp__ date/time subfolder.

    2. Restore run: The new "-restore" option runs a normal merge from backup to root (in automatic or selective updates mode), but:

      • Does not delete unique items in the TO tree. In restores, the TO tree is the archive root and FROM is the backup; items present in the archive but not the backup were unchanged in the prior synchronization run.

      • As a pre-merge step removes items from the TO tree that are listed in a __bkp__ subfolder's __added__.txt (if this file is present). This is pre-merge because order matters for renames on Windows. The __added__.txt file itself is copied to TO by the merge as well, but manually removed.

    Hence, when mergeall is run from a command line with "-restore" to merge from a prior run's backup subfolder to its archive root, the net effect is a complete rollback of all changes made in a prior run: replacements and removals are restored, and additions are removed.

    Restores require "-backups" to be used in the prior run, and are primarily intended to be used to restore all of an immediately preceding run's changes in catastrophic scenarios (e.g., transposing FROM and TO folders). They will not fully reset the TO tree if any changes were made to it since the backup was created (and in this event may erase more recent changes), and older backups will be out of synch with the current tree unless applied serially.

    For general restore operation, see this backups folder. For implementation details, see mergeall.py's changes marked with "[2.1]" and backup.py. For complete usage details, see Whitepaper.html. Automatic restore is available in command-line mode only; because no changes were made to the GUI, no GUI screenshots were retaken for this release. Logfile content is also unchanged in this release apart from a minor section reordering (per item #3 ahead).

    Usage update (defunct): because added items are recorded using the path syntax of the platform on which the prior mergeall ran, restores with additions should generally be run on the same platform as the prior merge. On platforms with incompatible path syntax, additions won't terminate a restore operation, but they will trigger error messages and won't be backed out.

    Usage update update: as of mergeall 3.0, the prior note's constraint has been lifted, by converting __added__.txt path separators from '/' to '\' on Windows, and from '\' to '/' on Unix. This makes these paths portable, such that backups saved on Windows can now be rolled back on Unix, and vice versa. For details, see the "CAVEAT" and "UPDATE" in function removeprioradds() of source file backup.py.

  2. {utilities} New rollback.py convenience script for restores

    As part of the restore enhancement, also added a convenience script, rollback.py. Given just an archive's root path (on the command line or interactively), this script automatically builds and runs an automatic-updates restore-mode mergeall command line, by globbing and sorting to find the archive's latest backups folder. This script also routes prints and prompts to stderr, so that mergeall stdout output (only) can be captured to a file via a ">" shell redirect, and can be run by command line or filename/icon clicks. See its example session.

  3. {mergeall} Reorder categories in differences report for consistency

    Minor and cosmetic, but in mergeall's differences report, order the categories to match the order in which their updates are applied (and later reported), as well as the order of totals printed in the summary report. This makes the report more consistent, but also reflects the fact that update order can matter on some platforms (on case-insensitive Windows, deletes must always precede adds for mixed-case renames; see mergeall.py's mergetrees() docstring for details). This complicates logfile comparisons to prior versions, but is a user-visible item.

  4. {mergeall} Import maximum-number-backups setting from a user-configurations module

    Fetch the limit on number of backup folders per archive copy from the new mergeall_configs.py module, which can be more easily changed by users than a hard-coded literal in the program's code. After this limit is reached, backups are pruned by age. Frequent mergers may want a higher number than the default (10), and users with typically large backup folders may want a lower setting. Errors in this module simply make mergeall fall back on the default (it has just one setting today).

  5. {docs} Rewrote Whitepaper material to clarify intended usage

    In the main usage overview doc (now Whitepaper), updated the usage modes section substantially to better describe ways to use the system; some of this was formerly tentative by design, but practice has solidified its concepts. Also added a new comparison to Windows explorer folder merges (which really just combine, not synchronize).


Version 2.0, Mar-18-15

Summary: automatic backup of changed items, more intelligent GUI, help, counts, DST, etc.

This version's development spanned two and a half weeks. It was initially focused on the new auto-backup for changes option, but spawned additional enhancements, and warranted a new major version number.

Changes

  1. {mergeall + launchers} Automatic backup of changed items

    When enabled in the launchers or mergeall command lines, this option makes backup copies of all files and directories in the TO directory that will be destructively replaced or deleted in-place during a mergeall run. These items' prior versions in the TO tree are saved in the automatically created __bkp__ folder at the top of the TO archive, with their full directory paths, and segregated by run in a date/time-stamped subfolder. Backup folders are not synchronized across trees, but are automatically pruned by age when their number exceeds a limit.

    This option makes mergeall generally safer, as unwanted or failed changes can be later undone by restoring backup copies from any of the latest mergeall run backups in the __bkp__ of any archive copy. This change's new "-backup" mergeall command line argument was also integrated into both the GUI and console launchers. Automatic backups defaults to on (enabled) in the GUI launcher, because it should normally be used for data safety unless space becomes a concern.

    Backup folders can be changed by users arbitrarily; their per-run subfolders may appear as diffall.py differences that generally can be ignored. When used, __bkp__ folders can also serve as a record of runs with changes against a tree, and an alternative to the logfile for inspecting changes, though only replacements and deletions are recorded; new additions are never backed up, as they would be just redundant copies (though version 2.1 later extended the backups option to also list additions in a backup folder's __added__.txt).

    Also note that, despite its name, this new backups option simply saves prior versions of files and folders on changes, and is just a nested operation within a general archive backup performed by a mergeall run. For more complete details, see the docstrings in the new file backup.py, which hosts the backup system's implementation, as well as the summary in the version 2.0 update of Whitepaper.html.

  2. {Launcher GUI} More intelligent and dynamic GUI

    The GUI launcher was changed so as to show only configuration items relevant to run modes selected: the logfile folder chooser appears only if logging is toggled on, and the new backups toggle frame appears only if automatic updates are selected (-backup applies only to -auto mode in the GUI, as it has no interactive/selective update mode). Both hidden components retain their state while hidden in the GUI. Also made the mode selections text more descriptive: changed from "Report only" and "Automatic updates" to "Report differences only" and "Automatically resolve differences in TO" (this is a GUI, after all).

  3. {Launcher GUI} Help button and popup

    Added a "Help" button that spawns the main mergeall user guide document in a web browser (in the spirit of the frigcal calendar GUI. Just a convenience, but useful nonetheless.

  4. {mergeall} Summary report: number files/directories compared/changed, diffs found

    Added counters for both the comparison and resolution phases, displayed in the log at each phase's end. For comparison: files and folders checked. For resolution: (replaced, deleted, created) for both files and folders. Later added counts of number differences found in each of the 4 categories, from differences data.

  5. {Launcher GUI} Workaround for last line covered on repack GO button

    mergeall now issues a final 'finished\n\n' message, which prevents the last output line being covered when the GUI's GO button is unhidden after resizes (a minor annoyance, that required a scroll). The extra blank line is now covered, which is easier and less distracting than auto-scrolling.

  6. {mergeall} Try to recover from rmdir Windows deletion failures in shutil.rmtree

    On Windows, retry shutil.rmtree's os.rmdir directory removal calls that fail, via a temporary wait-loop callback on errors. Apparently, Windows deletes may sometimes not be finalized immediately—they are left still pending after the delete call returns (perhaps due to other activities, such as indexing or anti-virus software). This is lethal to rmtree, as directories cannot be removed until after all their contents are removed.

    This seems rare; indeed, it's been observed on just one machine after a year of usage, and may warrant further research. However, its symptoms were witnessed on failures during the new backup folder pruning, and are also prone to occur during mergeall's normal deletion of unique TO folders. To trigger the delete error recovery logic, open a file in a folder to be deleted.

    Note that this recovery logic applies only to os.rmdir calls in shutil.rmtree directory removals, not to deletions of simple files in the TO folder with os.remove. File deletes could be retried too, but there seems little point; such failures are very rare, they're likely to be caused by unrecoverable permission errors, and they just leave an extra file in TO. Temporary in-use lock failures will be cleaned up by the next mergeall run. Scan your logfiles' resolution phase messages (or the scrolled text in the GUI) to see if any updates may have failed.

    See backup.py for additional details, links to related threads on the web, and the workaround's error callback. Python's shutil.rmtree may address this shortcoming in the future, though failing changes may be a broader Windows issue (os.rename, not used here, also seems suspect). All such failures are mostly harmless here, as they simply cancel a single update and continue, leaving a difference for the next mergeall run to resolve.

  7. {mergeall, Launchers} More error checking for command-line arguments and files

    Expanded error checking for command-line arguments passed to mergeall, in both command-line and launcher modes. Bad from/to file paths formerly showed full Python exception text in all three usage modes, but no longer do:
  8. {Launcher GUI} Use Desktop for logfiles by default on Windows

    Set the initial value of the logfile path to the user's Desktop folder, on Windows machines where this works and exists (on all others, use the former "select..." message). This is just an initial suggested default for convenience, and can be changed freely in the GUI. It's intended to discourage use of a flashdrive for both an archive source and logfile target (which slows progress), but could prove too user-friendly to retain.

  9. {diffall} Add recently-changed-comparisons-only option, new stats

    Not part of mergeall itself per se, but in the accompanying diffall.py script borrowed from the book PP4E, added a "-recent [days]" command-line option which limits file comparisons to files modified within the last N days in either tree (N defaults to 90 if not given; use 365 for a full year). This is a heuristic, designed to allow quick verifications for recent mergeall changes only. It assumes that recent changes in a large archive are typically limited to a small subset of its files.

    By default, diffall does a full byte-for-byte compare of every file in two trees, and should be run occasionally to verify integrity of entire archive copies. While complete, this script can take a long time for large archives (1 hour or more for the 72G use case, with a USB stick and micro SD card). The "-recent" option allows for quicker verifications of just items changed recently, and hence subject to recent mergeall updates. This option is for command-line use only; mergeall's "-verify" still does exhaustive compares.

    Like mergeall, diffall also grew new simple counter stats, reported at run end; its output ends with an extra line of this form: "Dirs checked 52, Files checked: 8, Files skipped: 1528".

  10. {diffall, cpall} Call file.close explicitly for use outside CPython

    Changed the related diffall.py and cpall.py scripts/modules borrowed from PP4E to call file.close explicitly for use outside CPython (e.g., PyPy), rather than relying on the auto-close-on-collection behavior of file objects in CPython. diffall.py is run by mergeall for "-verify", and manually for archive integrity checks; cpall.py is imported and used by mergeall for its core file and tree copying.

  11. {mergeall, cpall} Dropped the cpall.copyfile shutil.copystat hack

    Got rid of a blatantly evil case of monkey-patching in mergeall.py, by changing cpall.copyfile in-place to call copystat as a default option. The original code went to great lengths to avoid changing cpall, but was far too dark to document further here; see mergeall.py (if you must).

  12. {utilities} New script to work around DST modtime skew on FAT drives

    Added a new script, fix-fat-dst-modtimes.py, as one option for addressing the 1-hour modtime skew of FAT drives on Windows that occurs at Daylight Savings Time rollovers. Simply run this from a command line after each DST rollover; it adds or subtracts an hour from the modtime of each file in a FAT archive copy, to keep them in synch with an NTFS copy, per mergeall's timestamp+size comparisons. For more on this issue, see the version 1.4 release note below; it's also mentioned in Lessons Learned and Usage Overview. See the script itself for usage pointers. 3.0 update: you can generally avoid this script by formatting external drives with exFAT.

  13. {Docs, examples} Relative links, README to HTML, miscellaneous changes

    Assorted non-functional changes:

Usage Notes

  1. More on Windows FAT daylight savings time rollover issue: 2 copies

    It has been pointed out that this issue, documented in version 1.4 notes below, can also be addressed by keeping two FAT device archive copies: one to be used when DST is active, and one when it is not. This way, DST rollover won't require a full archive rewrite on the currently used copy, and you'll also automatically keep a longer-term backup copy. Keeping two such copies on the same device is equivalent to keeping the copies on separate devices, provided your archives are small enough, and your device is FAT enough (yes, pun intended).

  2. File permission-related failures: fix and rerun

    Mergeall 's updates can fail for files whose permissions preclude changes. This includes files marked as:


    These failures don't stop a merge; they report as errors in the logfile and are simply skipped, leaving the difference for the next run. To avoid these failures, though, make sure that the files are not read-only or hidden, by right clicking to their Properties, and unclicking these modes (you may need to enable viewing of hidden files in order to see them in file explorer).

    Mergeall itself does not change permissions, as your files are your property; read-only mode, for instance, may be set deliberately to avoid overwrites. In-use errors (and skips) can't be avoided by mergeall in general; be sure that you don't have a file open in the TO archive when mergeall is run, or rerun again to pick up changes for files previously in use.


Version 1.7.1, Oct-28-14

Summary: minor error text fix, and updated usage note here.

Repackaged Oct-31-14 and Nov-8-14 with only minor doc changes here and in HTML files.

Changes

  1. {mergeall} Minor error message text format patch in mergall.py

    The "message" argument in mergeall's file error() message text was not being displayed. Also prefixed error text produced by cpall.copyfile() with "**", so the format of errors reported during its recursive tree copies matches that of mergeall's own top-level file error messages (they're now both "**Error...").

Usage Notes

  1. Update on 1.7's Windows Unicode filenames issue: accents

    Update: though details have now been lost to time, it's not impossible that this issue reflects, or at least is related to, the Unicode normalization issue addressed in 2021's version 3.3 above.

    Update: for a possibly related example of this issue observed later, see also release 2.3's usage note above.

    This note augments a 1.7 usage note below. On further exploration, this appears to be yet another FAT32 filesystem issue, and dependent on order of directory copies. The issue occurs only when both:

    1. Copying to FAT32 filesystems, of the sort used by default on USB flash drives.
    2. Copying the non-accented name first, followed by the accented name that is otherwise equivalent.

    When both conditions are met, both Windows file explorer and the mergeall Python script issue an error for trying to create a folder that already exists.

    For instance, Windows' file explorer issues the following error message text in a popup and offers to merge folders, even though the only folder in the destination is the unaccented "Rodriguez":
    This destination already contains a folder named 'Rodríguez'"
    
    Python—and hence mergeall—issues a Windows 183 exception; mergeall skips the single folder copy and continues, per the messages in its run log:
    copied new FROM dir, C:/.../test-Rodriguez\Rodriguez
    **Error copying FROM dir: skipped C:/.../test-Rodriguez\Rodríguez
     [WinError 183] Cannot create a file when that file already exists: 'D:/rodriguez\\Rodríguez'
    copied new FROM file, C:/.../test-Rodriguez\findings.txt
    
    Hence, this is the same FAT32-related error, and seems independent of Python. Conversely, the issue does not occur when either:

    1. Copying to NTFS filesystem devices (e.g., to the C: drive) via drag-and-drop, cut-and-paste, or otherwise.
    2. Copying the accented name first, followed by the non-accented name (or when a multi-folder copy is lucky enough to be ordered this way).

    In either case, both folders are created, and no error occurs. If you do manage to copy both folders to a FAT32 device, though, trying to delete both later either issues an error or leaves one unremoved. This behavior seems a bug, given that FAT32 on USB drives supports non-ASCII file and folder names in most other contexts. It may, however, reflect a fundamental limitation in the older FAT filesystem used by default for most USB and SD flashcard devices.

    There may be a procedural workaround for this issue that requires an additional and manual step (e.g., code page settings?), but an automatic resolution may be beyond the scope of a Python script if the issue is inherent in either the FAT32 implementation, or Python's own choice of filesystem API calls. In any event, it seems rare enough to warrant a pass here. The workaround for now is to either:


    Watch for "**Error" in your run logs to see if/when this occurs. The following links provide background on this issue, but search on "fat32 unicode filenames" for other pointers:


Version 1.7, Oct-10-14

Summary: minor GUI fixes, mergeall report update, doc updates, usage caveat note.

Changes

  1. {Launcher GUI} Minor fix for Python 2.X only: showerror import

    Add an import of Tkinter's showerror when using Python 2.X; else this dialog never appears if a bad logfile name is used. The import was present for 3.X, but not 2.X, and was required only by a rare context never tested under 2.X.

  2. {Launcher GUI} Minor fix: catch log open() exceptions

    Catch PermissionError (etc.) on logfile open and report error in popup; else fails silently on Windows, as ".pyw" has no console for exception text. This can occur if you select "C:\Program Files" for the log dir on Windows. Formerly, only the existence of the logfile's folder was verified.

  3. {mergeall} Add disposition note lines to differences report

    Add message lines for each difference category, reminding user how they will be resolved by automatically if -auto, or if updates selected in GUI: "These items will be replaced", "These items will be permanently deleted", and so on.

  4. {Docs} Assorted minor doc updates, and USB 3.0 speed correction

    Assorted minor updates to the HTML files in the docs subfolder, plus one minor correction added in Lessons-Learned.html: its USB 3.0-versus-wifi speed figures were off by a factor of 8 due to bytes/bits rating differences (USB is actually 8X faster than previously stated). Also added new version 1.7 screenshots in examples/, taken on Windows 7.

Usage Notes

  1. Windows: differing folder names may be the same sans Unicode

    Update: see further details on this issue in 1.7.1's usage note above.

    A bizarre and very rare use case can trigger run-log error messages that require manually copying a directory after mergeall finishes. The observed behavior: on Windows 7, if there are two different directories named:


    then the two are treated as having the same name, and you cannot copy both to the same folder. This is true for Windows drag-and-drop copies (which issue an error), so it appears that Windows itself effectively drops the accent, making the two the same for core file operations.

    Mergeall reports an error for trying to create a folder that already exists, when copying the second of the two. In this likely very rare event, the simplest workaround is to manually copy the folder whose automatic copy failed and displayed an error in the log. This is not Python 3.X/2.X-specific.

    It may be possible that using bytes (instead of str) for folder names in mergeall's os.listdir() calls would obviate this issue, but Window's own drag-and-drop failures suggest that it might be a deeper issue in Windows itself, and the issue's rarity and large impact on existing code makes further exploration unwarranted. This would also apply to Python 3.X only, because 2.X has no true bytes object. A Windows 7 (US) console doesn't even print this character properly, though IDLE does, and your console might (setting the Windows codepage via a "chcp 65001" helps on mine—see Page 755 in Learning Python, 5th Edition (LP5E) for details, and test with the following script):
    #!python3
    # -*- coding: latin-1 -*-
    s = 'í'
    print(s)
    print(s.encode('latin-1'))
    


Version 1.6, Jul-27-14

Summary: Python 2.X Unicode issue workaround, verify quit, misc GUI/doc/package updates.

Repackaged Aug-05-14 with minor doc-only updates.

Changes

  1. {Launcher GUI} Python 2.X Unicode issue workaround (3.X recommendation)

    Wrapped a stream line decode in an exception handler, to prevent its potential failure on Python 2.X from killing the GUI for some non-ASCII characters in filenames. This is a process-boundary issue that impacts only the GUI display (not the logfile, or the underlying mergeall process), and reflects a 2.X/3.X incompatibility, despite the launcher's automatic propagation of PYTHONIOENCODING. See the 1.6 change note in launch-mergeall-GUI.pyw for details (search on "1.6"). This fix was also applied to the console launcher, for stream lines decoded for console display.

    Note that this patch applies only to the GUI and console launchers' displays. Its worst-case impact is that some non-ASCII filenames may be displayed with "(UNDECODABLE LINE):" prefixes and still-encoded names in the GUI or console launcher displays under Python 2.X only. This normally happens for just a handful of filenames, if any, and filenames display correctly in both the logfiles created by the launchers, and the main mergeall.py script itself, which processes files with non-ASCII names properly. Nevertheless, this is significant enough to recommend use of Python 3.X for users with archives having many non-ASCII filenames.

    Also note that PYTHONIOENCODING must still be set manually in your system shell when running script mergeall.py directly from a command line, if it may ever process and thus print non-ASCII filenames, especially in 3.X. This manual setting isn't required for the GUI launcher, as it automatically sets and propagates this to its mergeall.py subprocess, and does not route text to a console (only to a GUI and logfile). However, this setting may be required for both mergeall.py and the console launcher, as both print filenames to the console.

  2. {Launcher GUI} Verify main window quit

    Added a simple quit verify dialog. Caveat: this avoids accidental exits, but no longer shuts down the GUI immediately if there are queued lines to be displayed; a sys.exit() might exit quicker, but could result in GUI error messages in the console.

  3. {Misc} Sync 2 doc files, fix launch-mergeall-GUI.pyw eolns, display version#

    Synchronized 2 MoreDocs/ HTML files with current versions on book website (Lessons-Learned, and Whitepaper which is now called mergeall.html on the website). Also added version number in GUI launcher title (and console launcher startup), and fixed file launcher-mergeall-GUI.pyw to have Windows eolns (a.k.a. end-of-lines, endlines); as it was, this file inconsistently had Unix line breaks, which show as a single line in most some text editors like Notepad (though not PyEdit or IDLE); origin unknown, but likely harmless. None of the changes in this category impacted program execution.


Version 1.5, May-5-14

Summary: Linux compatibility—patch and usage notes.

This system was initially developed and used on Windows (7 and 8). Testing on Linux (Fedora 20/Gnome 3) has so far yielded one minor patch, and two usage notes for Linux users.

Note that the patch applied allows mergeall to work on Linux for archives containing basic files and directories—that is, for normal user data and media. More exotic Linux file types (e.g., links and FIFOs) remain untested, and may or may not require additional changes; modify as desired.

Changes

  1. {Linux Patch} subprocess.Popen() shell argument

    In both GUI and console launchers, changed the call to Python's subprocess.Popen() to pass shell=False on Linux, and other Unix-like platforms, only. Else, when passing a command-line sequence (not a single string), this call always spawns just an interactive Python session—as though the full command run were "python", the first item in the command-line sequence. However, on Windows, shell=True is still required if filename associations are to be employed. This seems counter to the portability goals of subprocess (and is largely undocumented), but the fix is very minor.

    With this patch, mergeall's GUI and main script work well for basic file types on Linux in testing thus far; see the Linux screenshots from versions 1.5 and 2.1 (defunct), and 3.0.

Usage Notes

  1. Linux: #! lines

    Linux users may want to change some of the "#!" first lines in this system's script files to name the specific version of Python for which you have [tT]kinter GUI support installed, if you wish to run the scripts directly as executables. For instance, a change from "#!/usr/bin/python" to "#!/usr/bin/python3" in launch-mergeall-GUI.pyw was required for my Python 3.X install, but was not changed as such in the released code, as this script also works on Python 2.X systems and other platforms. Change as needed for your installs and links, or use full "python2 ..." or "python3 ..." command lines to launch the top-level script.

  2. Linux: Windows/Linux timestamps DST skew

    Also on Linux, it appears that there is another file timestamp DST rollover issue that makes some files' mod times off by an hour when synchronizing between Windows and Linux trees. Specifically, a Windows NTFS volume (e.g., your mounted C:) may report some mod times skewed by 1 hour from Linux times; this appears to happen for files saved in the past while DST was active. Naturally, this can generate spurious differences in timestamp-based synchronization tools like mergeall.

    This is a TBD, but seems related or similar to the Windows NTFS/FAT skew reported in release 1.4 notes below (see its item #1). No fix was coded and no ideal workaround is yet known; but synching once with auto-update on suffices to remove the timestamp differences, albeit at the expense of some extra one-time copies. As a demo, the new Linux desktop screenshot in ./examples/Screenshots shows mergeall runs on Linux performing and verifying a Windows/Linux timestamp synch. Note that this is an issue only when comparing trees _between_ Windows and Linux, not for compares of trees that reside on the same platform.


Version 1.4, Mar-10-14 to March-27-14

Summary: Multiple updates—behavior (earlier dates) and docs (later dates).

This version's development spanned 2 weeks. It yielded numerous changes and notes reflecting real world usage and testing.

Changes

  1. {GUI Launcher} Enhanced to thread subprocess stream reads

    Read the spawned mergeall subprocess's stdout/stderr lines in a spawned parallel thread, that posts lines to a queue polled by timer events in the main GUI thread. This structure is more complex, but prevents the GUI from being blocked and unresponsive while waiting for a next line from the subprocess—not a bug and normally not a concern, but it could become apparent if mergeall was busy copying large trees.

  2. {Launchers} Fix for Unicode stream encoding (binary mode + manual decode)

    Redo on 1.2 issue: forcing the mergeall subprocess to use the default Unicode encoding in the locale module sufficed to make it agree with Popen's text-mode stream reader (which always uses the locale setting), but still failed on encoding errors on Windows for some Unicode filenames as they were printed in mergeall—before they ever reached the Popen reader. Fixed by forcing subproc to use the broader UTF8 for its prints via PYTHONIOENCODING, and reading stdout lines from Popen in binary mode with manual post-read UTF8 decoding. See the 1.4 change notes in launch-mergeall-GUI.pyw for more details.

  3. {Launchers} Fix for Python 2.X logfile incompatibility (binary mode files)

    Prior launcher versions failed in Python 2.X when logfiles were enabled, because they opened logfiles in text mode using 3.X's open() with encoding, and didn't account for 2.X's different open(). Temporarily changed to use open=codecs.open in 2.X, then changed to write logs in binary mode with new binary stream data to sidestep the issue altogether. 2.X's codecs.open() does not expand \n to \r\n on Windows when writing decoded Unicode, though the next item made this a moot point.

  4. {Launchers} Handle Python 2.X -u unbuffered flag in mergeall spawn command-line

    This Python switch makes streams unbuffered, but oddly also makes line-ends \r\n in 3.X but \n in 2.X, which leads to single-line logfiles in Windows if not special-cased. Temporarily dropped for 2.X compatibility, so all line-ends are \r\n when written to files on Windows. Later reinstated: without the Python '-u' unbuffered flag, mergeall output may not appear for 10 or more seconds on some machines and slower devices due to internal buffering. Because this flag also makes line-breaks differ between Python 2.X and 3.X, though, also need to use special-case logfile writes to map all linebreaks to the platform's version. See 1.4 change notes in launch-mergeall-GUI.pyw for more details.

  5. {Docs} Added Lessons-Learned.html post implementation notes

    This write-up summarizes trade-offs and issues, and discusses decoupled versus single process architectures.

Usage Notes

  1. FAT drives munge file modtimes at DST rollover if auto-adjust

    Update for version 3.0: A new overview of this issue and a new list of fixes now appears in the User Guide added in release 3.0. Importantly, users on Windows and Mac OS X are now advised to format their external drives using the exFAT filesystem, which avoids this issue altogether; Linux exFAT support is somewhat emerging, but the fixer script and other options below still apply.

    On Windows, FAT/FAT32 file systems (e.g., many USB sticks) have an issue with daylight savings time (DST): they adjust file modtimes for localtime, making them all appear to be off by one hour when DST begins, versus the true UTC time of NTFS and exFAT. This is a well-known Windows issue, and seems to occur only if your Windows system is set to auto-adjust when DST begins, but it can make every file register as a difference in mergeall if only one of the drives uses FAT.

    No solution was coded in mergeall itself here, but there are a variety of procedural ways to deal with this, from arguably simplest to most complex:

    1. Allow mergeall, and other timestamp-based backup or synchronization tools, to rewrite your archive in full twice a year.

    2. Clear your Windows auto-dst-adjust setting, and manually change your time/clock when needed (see below).

    3. Use two FAT device archive copies (e.g., on one or two USB sticks)—one during DST and one otherwise; this has the advantage of keeping a long-term backup copy (more on this in 2.0 usage notes).

    4. Write a script to add or subtract 1 hour on all file modtimes, and run on FAT drive archives at DST rollovers; use os.walk, os.path.getmtime, and os.utime. Done => see this 2.0 note for a script to run.

    5. Use NTFS instead of FAT on your drives (e.g., a shell command such as "convert D: /FS:NTFS" can do the job), if this makes sense on your device; it may degrade performance on some.

    6. Resort to using lower-level C/C++ Windows libraries if they offer a solution not available in Python directly (this requires recoding, and possibly C++ skills if no Python API exists).

    The first of these is the default if you take no action. The second—clearing your auto-dst-adjust setting—is easy but manual: see Control Panel => Date and Time => Timezone, or click your toolbar date/time to clear your DST setting (be sure to "OK" out of all your Control Panel dialogs). The third and fourth require some minimal action at DST rollovers; the new 2.0 script makes the fourth a simple command-line run, but the third ensures a long-term backup. See Lessons-Learned.html for more on this issue, including relevant links on the web.

  2. Some programs may change file content but not modtime or size

    Excel on Windows (among others?) can occasionally change a few bytes in a file's content trivially without updating the file's modification time or size. This registers as a difference in the bytewise diffall.py but not in the timestamp/size-based mergeall.py (and is officially considered to be cheating here). Such modifications appear to reflect changes to unimportant metadata only; thus far seem limited to older Excel files opened but not saved; and can generally be ignored. Copy over the impacted files manually, if you don't wish to see the diffall difference.

  3. Some filesystems limit maximum filename path length

    Update for version 3.0: pathname limits were eventually addressed and lifted on Windows by automatically adding "\\?" pathname-prefix strings universally on that platform (only) to invoke enhanced APIs; see 3.0 above.

    For large/deep trees, you may run up against file path length limits. These won't terminate mergeall or the GUI, but will manifest as error messages in mergeall output that will continue to appear in later runs until addressed. This often is the result of directory renames or moves, and is a filesystem issue, not a program error—you also may not be able to do much with files in such long paths in a file explorer, until you shorten the path by renaming files or folders, moving items closer to the drive's root, or deleting parent directories. That is the suggested policy and workaround for mergeall as well. See Lessons-Learned.html for more on this issue.

  4. Routing logfiles to a USB drive being scanned slows merges

    Perhaps a given to some readers, but mergeall scans USB flash drives quicker if you route the logfile (if one is requested) to a different device than the USB drive being scanned—to your Desktop, for example—and copy it over to the USB drive later if desired. Writing a logfile on the same USB drive being scanned can slow down the scan by a factor of 3 or 4 in tests run, due to the read/write combination.

  5. Naming devices and network drives in Windows pathnames

    Perhaps also obvious to some, but on Windows, pathnames denote connected drives by device letter, and name shared network drives by volume syntax. Examples:
    C:\folder\...          --for a folder on your main drive (normally)
    D:\folder\...          --for a folder on your USB flashdrive (or other letter)
    \\Computer\folder\...  --for a shared folder on a computer in your network
    
    Such path formats are passed to the main script as arguments, but are automatic when selecting folders in the GUI. Other platforms use different naming schemes (e.g., /dev/..., /mnt/...); see your system docs.

    Also on this topic: drives shared on a Windows home network seem to be very slow (often 35-50 times slower than recent USB drives) and tedious to set up, but your mileage may vary. Private clouds may or may not be faster, but seem likely to be bound by similar constraints imposed by network transmission speed in general (and public clouds are loaded with tradeoffs: see the last section of Whitepaper.html). See also USB flashdrive and Internet speed comparisons in Lessons-Learned.html.


Version 1.3, Feb-27-14

Summary: mergeall.py fix for FAT 2-second modtime resolution (range test).

Changes

Allowed for FAT32 file system's 2-second resolution/granularity of file modification times, by replacing equality with a +/- 2 second range test; else copies on the more accurate NTFS file system (and others) may register a mismatch despite having identical content. Update: see Lessons-Learned.html for more on this issue.


Version 1.2, Feb-15-14

Summary: Launchers fix for Unicode stream encoding (match subproc to Popen).

Changes

Fixed encoding disagreement between mergeall subprocess streams and launcher's Popen text mode auto-decoding, by using PYTHONIOENCODING and locale module setting used by Popen; else aborts on Unicode exception in stdlib when reading subproc's stdout lines for non-ASCII filename in report. This was later revisited in version 1.4.


Version 1.1, Feb-10-14

Summary: New GUI+console launchers.

Changes

Added console and GUI launchers, run atop the main mergeall.py script. Console launcher supports interactive/selective mode, but GUI does not.


Version 1.0, Feb-1-14

Summary: Initial release, command-line mergeall.py only.

 

(Defunct) Usage Summary

Please note: the following is now largely subsumed by the new User Guide added in version 3.0, but is retained for any extra details or context it may provide. For current information in using the source, app, and executable formats of mergeall, see the User Guide's pointers, as well as the main README file. This section covers the original and still-available source-code format.

Download mergeall's zipfile here, and unpack on your computer. Its unpacked content may be viewed either locally or online.

This source-code version of the system requires just Python 3.X or 2.X. Python 3.X is preferred for Unicode filenames, per version 1.6's release notes. Python 3.5 or later (or a separate scandir() PyPI install) was recommended for speed on Windows and Linux, per version 2.2's release notes, but no longer as of 3.0.

The system is known to run on Windows 7, Windows 8.1, Windows 10, and Linux, and, as of version 3.0, Mac OS X. Most usage to date has been on Windows for archives of normal files and folders, though Mac and Linux have seen more action lately; as of 3.0, Mac has emerged as both major enhancements source and platform of choice.

This program may be launched in 3 ways, from simplest to advanced:

  1. Run launch-mergeall-GUI.pyw to run mergeall.py easily from a desktop GUI.
  2. Run launch-mergeall-Console.py to run mergeall.py with interactive inputs.
  3. Run mergeall.py directly via manual command lines in a console window.

For more screenshots of modes 1 and 2, see Whitepaper.html. For script usage details in mode 3, see mergeall.py's topmost docstring, and the example sessions here.

This system began as a command-line-only tool with this file as its sole documentation in plain text format, and was later extended with HTML documents. Largely due to this legacy, you can find documentation for it in multiple places and forms:

 

(Defunct) System Summary

Please note: the following has grown redundant with the new User Guide and original and similarly dated Whitepaper, but is retained as an alternative (if now largely historical) overview.

mergeall.py, the main script, synchronizes a destination tree to be the same as a source tree, by copying only differing and unique items in the source to the destination, and pruning unique items in the destination. This process is applied to both files and folders in the trees.

For speed, file differences are detected by checking only modification times and sizes (with an optional limited content test), and all updates are made in-place in the destination and limited to changed items only. As of version 2.0, prior versions of changed items can be saved to a backup folder automatically; as of 2.1, backups may be restored automatically.

This can be useful for both quick backups of changes made in large trees, as well as one-way synchronization of multiple tree copies. In the former role, a single run suffices to backup changed items. In the latter role, multiple runs work to broadcast changes to multiple copies—backup changes to an external device (e.g., USB flashdrive, backup drive, or network drive), and propagate them from there to one or more destination devices. In selective/interactive mode, this system may also be used as a more peer-level synchronization tool.

In the target use case (currently 73G space, 45k files, and 2.6K directories) total runtime fell from 2 to 3 hours for a full copy and compare, to just 1 minute for a typical mergeall run with moderate changes on devices tested. Running twice to leverage an intermediary device normally takes 5 minutes or less.

The main script is command-line and console based, and runs in report-only, automatic-update, and selective/interactive modes. The launcher scripts simplify common usage modes by inputting settings in a shell console or a [tT]kinter GUI, and spawning the main script automatically. The GUI launcher scrolls the main script's output in its main window, and saves the output to a logfile on request.

All scripts in this system run on both Python 3.X and 2.X (and mergeall.py works around a 2.X library issue regarding modtime digits). To date, this system has been tested and used on Windows, Linux, and Mac OS X, on Python 3.5, 3.4, 3.3, and 2.7; other Pythons are likely supported, but await formal testing.

This is an extension to similar tools in the book Programming Python, 4th Edition (PP4E), from which the cpall.py and diffall.py here were borrowed and reused. See code docstrings for open issues (TBDs) and shortcomings (CAVEATs), and the first two items in the next major section's table for additional context.

 

(Defunct) See-Also Links

Please note: the following has grown out of date (and will be dropped in a future release); please pardon the cruft!

See also the following major items in this system's zipfile:

User Guide.html The first-level user's guide, with common usage mode options, GUI documentation, general pointers, and more (read this first)
Whitepaper.html The older and original usage guide, with additional background on features and roles, comparisons to cloud-based storage, and more
Lessons-Learned.html Early implementation notes, including implementation issues, device notes, and process architecture alternatives
mergeall.py's docstring Full details on the merge/sync process itself (or run help("mergeall") at the interactive Python prompt)
new examples folder mergeall.py script example usage logs, as well as other examples updated for version 3.0
[Defunct] old examples folder mergeall.py script example usage logs, as well as launcher example logs and GUI screenshots, and Python demos of known issues
test directory Test-case subdirectories to experiment with (don't risk changing your own until you're familiar with this system)
launch-mergeall-GUI.pyw A script that inputs settings in a [tT]kinter GUI and runs any number of implied mergeall -report or -auto command lines
launch-mergeall-Console.py A script that inputs settings in a console interactively and runs one implied mergeall command line, in -report, -auto, or selective mode
readme-Windows-shortcuts.txt Hints on making clickable desktop icons to launch this (and other) scripts on Windows (this scheme is largely subsumed by the two launch-* scripts, coded later in the project)
manual-commands-cheat.txt Example command lines used to manually invoke mergeall (this system) and diffall (byte-for-byte verification compares, when desired)
launcher-configs directory With "mergeall-desktop-icon.ico", a Windows icon usable for shortcuts to the launcher GUI drug out onto your desktop (right-click to Properties)
backup.py The 2.0 auto-backups for changes extension, with implementation and usage details.
diffall.py, cpall.py Utility modules and scripts reused and extended for this project, from the book PP4E



[Host site] Guide Code Mergeall Apps Blog Input © M. Lutz