This document, f/k/a Readme.html and aimed primarily at developers,
describes changes made in each released version of mergeall,
and provides additional context along the way. This document
also includes overviews from its original role as a first-level README; as
these are now dated and mostly redundant with other resources, they have been moved
to the end as optional reading.
For up-to-date usage fundamentals, see instead the
User Guide.
For additional project and usage background details, see also the original (and also
somewhat dated) Whitepaper.
Please note:
Apart from its coverage of the latest releases,
this document is no longer actively maintained; its style was largely frozen
years ago, and much of its material is now project history only.
All of the older screenshots and logs referenced in this document have also
been removed to minimize
mergeall package size, and their links were deleted during docs reorganization.
See the newer screenshots collection for up-to-date
GUI images, and please excuse the shortage of off-page context here.
This section—the majority of this document—lists both changes and usage notes,
grouped by release version.
To find a version's specific changes in the source code, search for its
release number in the source code files (e.g., search for "3.0" in the ".py" and ".pyw"
Python source files to view version 3.0 code changes).
Summary: add Unicode normalization for filename matching, rebuild packages
Version 3.3 was rereleased on October 16, 2022 in all its download
packages—source code,
macOS app, Windows exe, and Linux executable—with improved path-normalization
logic for Unicode variants. The new coding handles more path
syntax, but is used only on platforms that do not
auto-normalize paths (e.g., Windows, Linux, and Android app-private).
For details, see function matchUnicodePathnames() in
fixunicodedups.py,
as well as the new path-normalization
demo.
This release picks up all prior 3.3 changes, including
source-only mods of September.
Version 3.3 was rereleased on March 14, 2022, with
minor changes to the
GUI
and summary
report,
as well as rebuilds of all app and executable
packages
to make them current with the 3.3 source-code package
(all now include all 3.3 and 3.2 changes).
Because the GUI is optional and no core Mergeall utility was changed,
this repackaging is considered a 3.3 point release.
Note that while all download packages now run the latest 3.3, the source-code
is still recommended if others won't work in your use case.
Version 3.3, published initially on December 28, 2021,
adds just one main feature, which was nevertheless crucial enough
to warrant a new release. Namely, 3.3 now performs Unicode normalization
on filenames before comparing them, to avoid potential skew. This is a subtle
issue that cropped up for Mergeall's developer just once in 8 years of use,
but may be more common and perilous for content with many non-ASCII filenames
maintained across multiple platforms.
In short, the Unicode standard allows the same text to be represented
with different code-point strings in its decoded form. The string
'Liñux.png', for example, can have two equivalent but unequal
values in memory: the two mean the same thing semantically, but will not
match per Python's == or in tests or similar.
This guarantees interoperability problems, and impacts text-processing code broadly.
You can read the theory behind this
here.
In Mergeall specifically, when such variant strings appear as filenames,
they must be normalized (i.e., converted to a common form) for comparisons.
Else, matches may be missed, resulting in content skew. While some such
skew may be automatically repaired by syncs, missing a matching folder name
can trigger pointless folder copies. In worse cases, this can yield
duplicate and out-of-sync data, especially when applying delta sets.
Mergeall 3.3 avoids all such perils, by
normalizing
filenames for comparison, using unnormalized names for file access, and doing so globally.
Mergeall 3.3 also normalizes full paths saved in 3.2's deltas
sets to match variants in destination trees,
on platforms where this matters.
For more details and examples, please see the docstring at the top of the
new source file fixunicodedups.py.
Most new code appears in that file, but numerous smaller changes were required
elsewhere; as usual, search for "[3.3]" in the code for
full fidelity. This change affects the behavior of the scripts
mergeall.py,
diffall.py, and
deltas.py specifically, but some
modules were also modified.
In addition, the -quiet switch now suppresses normalization messages
in all three scripts.
For formal tests of this change, see also the 3.3 tests/demos
folder.
Additional 3.3 Notes
Python version:
partly due to this version's Unicode normalization, Mergeall now
strongly recommends Python 3.X for content with non-ASCII filenames.
Python 2.X still works in general, but may have problems in some border cases
(see normalizeUnicode() in the code).
This applies to the source-code package only; app and executable packages
use Python 3.X automatically, but source code requires
a separate Python install.
Run logs:
version 3.3 shortens console output when
mergeall.py is used in -restore mode, by omitting
messages for unique TO items skipped. This applies to both rollbacks
and deltas-set applies. In both use cases, these skips are fully irrelevant
(they reflect unchanged items), and inflated logs needlessly.
In related changes, 3.3 deltas-set creation now emits a log message for
each item noted in __added__.txt (not just a total), and reports items removed
via this file later as "listed" not "added" (to apply to both rollbacks
and deltas).
Docs and galleries:
the oldest development history in the docstring at the top of
mergeall.py has now been moved to a
separate file:
mergeall.py-devdocs.txt;
while this info is useful for studying the system, the docstring grew
long enough over the last eight years to qualify as an impediment to
scrolling to the code.
As usual, this version also picks up the latest thumbspage changes
(e.g., swipes) for its screenshot galleries.
Android 11 bug:
version 3.3 also uncovered a bug
in Android 11 which may impact some Mergeall users. In brief, Android 11
shared storage sometimes fails to write files whose names use the composed
(e.g., NFC) Unicode format. This bug may be temporary, and its scope is
unclear—it's been seen only on one Samsung device, and seems to require
a triggering context. Because it's also disjoint from
3.3's normalization, a work-around script has been posted in the Android
Deltas Sync package, which is impacted more: see the
script
for more details.
Version status:
note that 3.3, unlike 3.2, is not provisional. All changes in
both 3.3 and 3.2 are now officially adopted, and present in all packages
as of the Feb-2022 builds. Source-code remains a fallback option
if app or executable packages are not usable, and may host later changes
first in the future. Users of the Android Deltas Sync
package
should especially upgrade to 3.3, as the only known case of Unicode-variant
error cropped up in that system's context.
Minor GUI mods:
per the Mar-2022 update above, 3.3 also
made minor changes to Mergeall's GUI, to accommodate the broader scope of the
-quiet command-line flag of the
mergeall.py script spawned by
the GUI. See the new gallery
for full details; in short, the flag now applies
to all run modes, not just backups. These changes invalidate former screenshots
trivially, but better reflect program logic.
Minor summary-report mods:
also per the Mar-2022 update,
version 3.3 made three minor changes to the summary report which
appears at the end of
mergeall.py's console output:
the any-errors indicator line now appears again, after being inadvertently
omitted in prior releases; the Differences uniqueto
counter now shows as "n/a" for -restore deltas and rollback runs, because
it is pointless and confusing in this context; and the final label is now "Saved"
instead of "Changed" for deltas.py compare+save runs,
because TO is not changed in this context. These changes invalidate prior examples,
but the skew is minor, and the effect is clearer.
Later minor updates:
Mergeall's 3.3 source-code package (only) was reuploaded on September 28, 2022,
with an added .nomediafile
to prevent Android galleries from assimilating this package's screenshots;
a new utility
for comparing files that
diffall.py flags as differing;
a mod to the GUI's Help
button that opens the online version of the user guide instead of the
local copy (which links to items absent in frozen packages);
a handful of minor doc edits;
and three similarly minor changes to the nested ziptools
package.
No core functionality was changed in Mergeall, and only trivial usage-message
formatting was altered in ziptools. All these mods were later incorporated into
the Oct-2022 all-packages rerelease noted above.
Version 3.2, released on October 28, 2021,
adds a new major script, deltas.py,
which implements an alternative run mode. This script detects changes in
FROM as usual, but then saves them to a separate folder, instead of applying
them to TO immediately. The saved-changes folder can serve multiple roles.
For one, it can be archived as an incremental set after burning a full copy.
For another, because it's formatted the same as Mergeall backups, it can be
applied to TO later and on demand with the -restore mode of
mergeall.py.
See the new script's top-of-file docstring for full details;
browse its demo folder for examples;
and explore the Android Deltas Sync use case
here,
here, and
here.
The latter of these employs components of Mergeall as a small software stack.
Supporting the new deltas script also required a handful of changes to
Mergeall's base code; search it for "[3.2]" in code
to find related mods.
Additional 3.2 Notes
While the new deltas mode is this version's primary change, 3.2 also:
Adds a
utility script that changes nonportable
filename characters for Windows and drives
Publishes its automated build
script and
log, for full open-source transparency
Comes with the latest version of
ziptools,
still embedded in the test folder as a legacy
Makes a few old docs more reader and mobile friendly, including this and
this
Includes snapshot
copies of Android
helper scripts and patched GUI code, also
here and
here
Tallies symlinks separately and displays tallies consistently, in all tools'
summary reports
Displays the Mergeall version number at start of run in all major
scripts for clarity
Fixes an obscure glitch for trailing folder slashes + unnested backups;
see backup.noteaddition
Fixes a more obscure problem for symlink modtimes in Python 2.X;
see cpall.copyinfo
Fixes an even more obscure issue for symlinks burned badly to BDR by macOS;
see the
demo
This release is limited and provisional: its changes are currently
available only in Mergeall's source-code
package. If and when they are
propagated to all other packages (apps and executables), this will qualify
as a final version 3.2. Please check back here for future developments
(and see the 3.3 update).
Potential mods for a final 3.2 include a dark mode for its GUI.
A final 3.2 may also integrate changes required to run
Mergeall's GUI on Android (shipped
here and described
here),
but these patches require using a specific commercial app which comes with
both tkinter
glitches and freemium
advertising,
and may be best managed out of band.
This minor-enhancements version was released in all packages—source, app, and executables—and
is a recommended upgrade for all prior-version users.
There were no changes to user configurations or the Mergeall GUI in this release, and version 3.0's
screenshots still reflect the state of the system in 3.1.
Version 3.1 includes the following enhancements (tagged with "[3.1]" in the code):
Mergeall and cpall—all forms: propagate folder modtimes to copies
Both the mergeall.py and cpall.py
scripts now propagate source-folder modification times to new destination-folder
copies on platforms that support this, just as they formerly did for simple files
and symlinks. This change was implemented for all package formats (source, app, and executable)
of these two programs, and is naturally inherited by Mergeall's launcher GUI.
It has been seen to work well on Mac OS, Windows, and Linux, though Mac OS required
a coding workaround to handle exFAT drives properly, and not all combinations of
platforms, filesystems, and drivers may support folder modtime updates.
Folder (a.k.a. directory) modtimes are less significant than others and were formerly ignored,
because they are not used by mergeall's incremental-updates logic and do not influence its results,
and may be changed whenever any contained item is changed. The latter of these factors can render
folder modtimes nearly useless. On Mac OS, for example, a folder's modtime changes whenever
Finder adds a hidden ".DS_Store" file to it; hence,
simply viewing a folder is enough to lose its original modtime! Still, folder updates
history may be useful enough in some contexts to preserve where possible, especially on systems
that do not differentiate folders and files in listings.
Implementation details:
because mergeall uses cpall's copytree() to copy folders in all contexts,
folder modtime propagation required just a post-processing step in
copytree(), run at the end of each call to this function (including any
recursive-level calls). This satisfies the requirement that modtimes be copied
after a tree is fully processed, else new folder copy times may be updated
automatically when their nested content is copied. This also avoids having to
queue modtimes to be copied later, as done in the related
ziptools program.
Mergeall—Linux executable: flush output to avoid GUI pauses
The mergeall.py script now forcibly flushes its output lines
in the Linux executable, so they appear immediately in the GUI. This was formerly implemented
for the Windows executable (along with Unicode translations not needed or used on Linux).
It is not required and was not implemented for the Mac app or the source-code version on Linux
and elsewhere, as Python's "-u" flag is forced by the GUI in both contexts. For source code,
Python's "-u" flag or PYTHONUNBUFFERED setting can be used to disable buffering selectively.
Implementation details:
output flushing changes print() to a custom version, instead of always using print()'s flush=True
(which is available only in Python 3.3+) or PyInstaller "spec" files (which complicate builds and offer
less control). A stream-proxy class would work too, but the custom print() was already coded for
Windows. The GUI also had already been setting PYTHONUNBUFFERED before the spawn, with no effect
(which seems a PyInstaller issue).
See subprocproxy.py in PyEdit
for related context and notes.
Diffall and cpall—Mac apps: support unbuffered output with "-u"
The diffall.py script grew a "-u" command-line argument
to make its output unbuffered. This is useful for watching diffall's output with a
Unix "tail" in the Mac app or Linux executable, where Python's own "-u" flag cannot be
used, and its PYTHONUNBUFFERED environment equivalent may go unnoticed. It's irrelevant
on Windows, and is not required when using source-code (use Python's "-u" instead).
For symmetry, a "-u" switch was added to cpall.py for use
when tailing its Mac app too. For more usage-level details, see
this post.
Implementation details:
Unbuffered stdout may now be standard for frozen Mac apps per a recent
py2App change,
but the two Mergeall scripts here use a stream-proxy
class
for platform- and version-neutral control.
See subprocproxy.py in PyEdit
for related context and notes.
Note that a "-u" flag was not added to the
mergeall.py script, because its stdout is flushed
forcibly in all contexts when spawned by the GUI launcher (mergeall.py's primary role),
and can be flushed optionally using Python's own "-u" when the script is run as source code.
This may be less than orthogonal, but no use case has arisen to justify a redesign.
Mac app package—retain original resource-file modtimes
The Mac app's build
script
was redesigned to copy all extra resource items
manually, in order to preserve their original modification times. These
times are especially crucial in Mergeall's test folder
where modtimes influence test results, and
py2app's copy policy for
its automatic "--resources" option did not propagate modtimes correctly.
Resource modtimes were already correct in the source-code package (built by
ziptools),
as well as for files in the Windows and Linux executable packages (built by
PyInstaller,
but using manual resource copies that were tweaked to use shutil's
copy2()
for top-level file times).
The fix-fat-dst-modtimes.py utility
script works on Python 2.X again (it formerly used a keyword argument in os.utime()
unsupported in 2.X). Even so, this script's status has declined over time, because
most users are better off addressing DST rollovers by formatting external drives as
exFAT. All other scripts were reverified to
run under 2.X.
There has also been some attempt to more consistently use "Mergeall" to denote the system
at large, and "mergeall" to refer to just the mergeall.py script. Given
the volume of documentation this project has spawned over its
4-year run, though,
this convention might not be adopted universally for quite some time...
Per the preceding summary list, this was a major release, initially
started as a port to Mac OS X, and expanded with new features over many months
of development. The following list describes 3.0's most prominent changes,
but is not complete. For a more exhaustive look at this version's changes,
search mergeall's source files for string "[3.0]".
{all} Mac OS X port
mergeall, its GUI and console launchers, and its accompanying programs
including diffall and cpall, have all been ported to run on
Mac OS X, in
addition to their prior support of
Window and
Linux.
The underlying mergeall.py script worked on the Mac largely
unchanged, as it was coded to be portable, and formerly ported to Unix-like Linux.
However, it required changes to avoid using Python 3.5+'s os.scandir() on Macs only,
and eventually replaced the os.scandir() variant altogether with a recoding that uses
saved os.lstat() results.
Before it was dropped, the os.scandir() call ran quicker than os.listdir() on Windows
and Linux, but 3 times slower on Mac OS X as used by mergeall
(see 2.2's notes).
In addition, the Mac OS X port necessitated numerous changes to the
GUI launcher:
Colored Buttons must be replaced with Labels
The Python webbrowser module requires complete file URLs
Special __main__ code may be needed to force initial active-window state
The Desktop default folder for saved logs must be set by platform-specific code
Widgets are better disabled/enabled than erased/redrawn (see ahead)
Dynamic scrolling in the Text widget is impractically slow (see ahead)
File and folder open dialogs require a "message" argument, as "title" doesn't appear
Common dialogs can use a "parent" argument to open as slide-down sheets instead of popups
The Mac ReopenApp event is caught to deiconify and force focus to the main window
Paths in backups' __added__.txt files are now translated portably during rollbacks
Beyond GUI and scandir() impacts, the Mac port motivated broader functional changes,
most notably the new cruft-file skipping modes and removal scripts (see
ahead) and symlinks support which grew to include both Unix
and Windows (also ahead).
On the upside, mergeall and its GUI are now fully usable
on the Mac, and merges both run quickly and seem to sidestep some Unicode-filename issues
described in this document that now appear confined to Windows.
For more details on Mac OS port changes, search for "darwin"—the Mac's
platform name in Python—in the system's source files.
As most users are likely to launch mergeall with its
GUI, some work was
devoted to improving its ergonomics and utility. For example, the GUI launcher
now disables and enables widgets as they fall in and out of relevance, instead
of erasing and redrawing them. This was initially motivated by the Mac OS X
port—where a redraw can trigger a visible flash—but proved
subjectively less chaotic on other platforms too.
The mergeall GUI was also polished in other ways, including bold section headers;
more descriptive selection labels and dialogs text; new toggles to suppress comparison
messages and logfile popups; and less-dense dialog
text layout that yields better readability on Mac and Linux. See the
screenshots page and folder for examples of
the GUI and its dialogs in actions.
Also motivated by the Mac OS X port, but useful elsewhere: the
GUI grew a new
toggle to suppress per-folder comparison-phase messages in the GUI only.
These messages serve as status indicators if enabled and still appear in
the saved mergeall log file even if suppressed. However, suppressing
them in the GUI avoids some clutter, and, more critically, avoids delays
for results if the GUI scrolls messages more slowly than the underlying
mergeall process generates them. When not suppressed, the GUI's scrolling
may continue to run after the mergeall process has already finished,
artificially inflating the merge's apparent runtime.
Although text scrolling may add a trivial handful of seconds on Windows and Linux,
it adds an especially long delay on Mac OS X. On Macs, the currently recommended
install's Tk 8.5 Text widget scrolls text messages some 30 times slower
than mergeall prints them. In one test, mergeall may finish in 2 seconds
on a Mac, but the GUI's scrolling of its output can run for one minute
before the final results are displayed. Because of this, the new
suppression toggle is enabled by default on Macs, but disabled on
Windows and Linux where the GUI largely keeps up with mergeall.
This is user-switchable because other platforms may benefit from disabling
messages on slower machines, and the Mac speed issue may be addressed in future
Tks (if it's not already fixed in Tk 8.6—to be tested). The new
toggle is also dynamic: it can be enabled and disabled any time during a
mergeall run to turn comparison messages on and off.
{launchers, mergeall} New configurations: text area, editor popups, cruft, and more
The GUI now supports a much wider variety of user-configurable options,
defined and customizable in the top-level
mergeall_configs.py. Among these,
users can now tailor the colors, font, and initial sizes of the scrollable
message text area; can specify a default for the log-file saves folder that
overrides the per-platform Desktop path; and enabled or disable the
automatic text editor popup after mergeall runs for viewing a
saved logfile (this later became an initial value for the popup's
toggle added to the GUI). Cruft filename patterns are also defined in this
file to support user customization, though they are mostly off interest to
advanced users (see ahead).
The maximum-backups-retentions setting is still present as before.
{launchers} Linux app icon
mergeall's GUI launcher now sets its windows' app-bar icons on Linux
platforms to a custom image. Windows sets window icons as before,
but Mac OS X does not currently set custom icons as these seem outside
the scope of source-code based programs on that platform (update: the
Mac app distribution added later fully supports all Mac icon contexts,
and seems required for icons on this platform).
Given mergeall's new portability to Windows, Linux, and Max OS X, support
has been added for explicit handling of platform-specific metadata files
(a.k.a. "cruft"). This is especially important on Mac OS X, which adds
numerous hidden files to content, that have no purpose outside a Mac,
and may be undesirable in cross-platform archive copies. To this end,
mergeall 3.0 provide two new tools:
The new script
nuke-cruft-files.py allows
cruft files and folders to be removed manually and on-demand, and
can be used for folders and drives not explicitly managed by mergeall.
The new "-skipcruft" option—available in
mergeall,
diffall,
cpall, and mergeall's GUI and console
launchers—automatically skips
files and folders matching cruft name patterns in both FROM (source) and
TO (destination) folders. In mergeall and diffall comparisons, this
prevents cruft for being reported as differences. In mergeall's updates
mode, this new option allows platform-specific cruft to remain on its
creating platform, but prevents it from being propagated to other
copies and computers where it is irrelevant. When used consistently in
mergeall, merged folders wind up the same except for their unique
cruft items, and the prior bullet's script is unnecessary in most use cases.
For more on the new cruft-skipping tools, see the new User Guide's
coverage as well as
its cross-platform pointers;
the cruft filename patterns and examples in
mergeall_configs.py; and the
background notes in nuke_cruft_files.py.
Note to Mac users: mergeall itself copies just data forks (normal file
content), not resource forks, and does not merge resource forks back to
data forks if they are present; see dot_clean to address the latter,
and the User Guide for more
background.
New in this release, mergeall supports propagating symbolic links on
both Windows and Unix (Mac OS X and Linux), subject to platform
and portability constraints enumerated in the
User Guide. When present, symlinks are
always copied, not followed, to avoid duplicating data. For a tool
that also supports link following, see the
ziptools system.
{mergeall} Support long pathnames on Windows
The new module fixlongpaths.py provides
tools that support very-long pathnames on Windows.
It does so by mutating too-long pathnames to use a "\\?" prefix ("'\\?\UNC\"
for network paths), which automatically enables extended-path Windows API tools
(these tools are no-ops on Unix).
mergeall, diffall, and cpall all use these tools for every pathname passed
to system calls, as well as those passed to recursive tree walkers. The
net effect lifts the normal 260-character pathname limit to 32k characters on this
platform.
Long pathnames typically crop up in saved webpage folders; they
formerly generated error messages and failed to update in mergeall, but can
now be processed normally. See the new module
for more details, and search for FWP (uppercase) in mergeall's source files for
the new module's clients; ziptools
uses these tools as well.
{mergeall, diffall} Code and algorithm optimizations
Some work was done in this release to optimize the code in the mergeall and
diffall programs. Specifically, repeated scans of listing result were
eliminated, and os.path.join() calls were replaced with possibly simpler
direct os.sep concatenations (the former change also improved diffall reports,
by reporting missed files before subdirectories).
In the end, most optimization attempts were fruitless, as the time spent in
either system calls or file I/O far overshadowed the speed of mergeall
programs' code. One exception: on Windows, the time required to compare two
very large archive copies fell from 19 to 14 seconds on Pythons 3.4 and older
(which use os.listdir()). However, there was no impact to a mergeall 7.2
second runtime on Pythons 3.5+ (which use an os.scandir() variant that fully
accounts for its faster speed), or diffall (which spends nearly all of its time
reading files byte-for-byte).
Also in this category: the comparison phase in mergeall was recoded to use
saved os.lstat() results, which made it as fast as its former os.scandir()
variant on Windows; the os.scandir() branch was subsequently dropped.
For more details, see comparetrees() in mergeall.py,
and the main docstring in diffall.py.
For timing results, see this folder.
{launchers} Sanitize non-BMP Unicode characters in scrolled mergeall text
Tk 8.6 and earlier, used by the tkinter Python module underlying mergeall's GUI,
cannot display Unicode characters whose codepoints fall outside the BMP (UCS2)
range of U+0000..U+FFFF. This includes newer "emoji" characters; when such non-BMP
characters are used in filenames, they formerly killed the GUI with an uncaught
exception when the GUI attempted to insert them in the scrolled text area.
To work around this, mergeall now replaces all non-BMP characters in displayed text with
the standard Unicode replacement character, U+FFFD, which Tk displays as a highlighted
question mark diamond. This workaround was coded to assume that Tk 8.7—to be
supported in a future but unknown Python release—will lift the BMP restriction,
per a developer forums post. For details, see fixTkBMP() in the GUI
launcher.
{cpall, mergeall} Ignore spurious Mac exceptions from shutl.copystat()
The cpall.copyfile() function used by mergeall now suppresses and ignores EINVAL (a.k.a.
error number 22, "Invalid argument") if it is raised by Python's shutil.copystat().
On Mac OS X, shutil.copystat() can fail this way due to an error raised by
Mac libraries when trying to copy extended attributes with chflags() from a file on
a Mac filesystem drive (e.g., HFS+) to a file on a non-Mac filesystem drive
(e.g., FAT32 or exFAT).
This error occurs after all content and times have been copied, so it's safe to
ignore in this context. It also occurs at the shell on "cp -p", so it's likely
a Mac issue. This cropped up in mergeall for all files saved with Mac's TextEdit,
which adds an extended attribute for Unicode encoding type, but can also occur in
other contexts such as files marked as quarantined. For more details,
see the main docstring in cpall.py, and the shell
session log mac-chflags-error22.txt.
{docs, packaging} New user guide, new folder structure
A completely new user guide was developed:
UserGuide.html, shipped
in the package's top-level folder. This new user guide is designed
to be more user-focused, and provides a less technically heavy overview
of the system and its GUI. It largely subsumes the former documentation,
which was more implementation- and project-focused, and arguably less
approachable for end users. Nevertheless,
the original documents are still shipped in folder
docs/MoreDocs for now:
The prior Usage-Guide.html was retained for its extra background
details on roles and features, and renamed
Whitepaper.html.
The former top-level Readme.html was also kept for its version
history, and rebranded Revisions.html (this file).
In addition, the original top-level launcher-config folder was demoted
to a docetc subfolder due to its declining
relevance;
the somewhat dated Lessons-Learned.html
was kept for its implementation notes; and a new Tools
folder ships with line-end conversion and color-chooser utility scripts.
{screenshots} New screenshots and examples (older items dropped)
New screenshots were taken for this release
on all three of its supported platforms, and new example
session logs were
compiled, including logs from all three platforms formatted as HTML for
readability.
In light of the new screenshots and logs, to reduce the size of the program's
distribution package all prior screenshots were dropped from the package,
and their links in docs were scrubbed.
{packaging} New "frozen" distributions: Mac app, Windows and Linux executables
In addition to its original source-code distribution, mergeall is now available in
Mac app, Windows executable, and Linux executable forms. The new forms run on just
one platform, but do not require a Python install. For more details on these
new packages, see the README file, and the mergeall
downloads page.
{assorted} And so on
Version 3.0 incorporates additional enhancements, including:
A new indicator of preceding error messages in mergeall's report summary
Better error reporting for terminal exceptions during mergeall's comparisons phase
Verbose-level arguments and exception skip-or-fail options in cpall
Better argument error checking and message labels in diffall
A fix for the bogus extra line at the end of scrolled text in mergeall's GUI
A Python-coded tool for unzipping precoded test folders
This is a minor enhancement release, adding two user-visible functionality upgrades.
For all code changes applied in this release, search for "[2.4]" in its recently modified
source files.
{mergeall, launchers} Add quiet log messages mode
Both the main script and the GUI and console launchers now allow users to suppress
per-file backup messages in the generated output. These are informational and may
be of interest to new users, but are arguably superfluous once the system's operation
usage is clear, because files being replaced or removed are already displayed.
In large merges, the extra lines decrease report readability. To support the new quiet mode:
The main script has a new "-quiet" command-line
option, which is relevant only when "-backup" is also used. Backups themselves
apply only when updating, but the new quiet switch is effective whether automatic
("-auto") or interactive (not "-report") updates mode is selected.
The GUI launcher has a new toggle button, displayed only when backups are enabled,
to force "-quiet" to be passed to the main script. The new toggle appears at the bottom of the
controls section when
the backups toggle is selected, which in turn appears only when automatic updates
run-mode is chosen.
The console launcher similarly prompts the user for this mode only when backups
are chosen, whether updates are automatic or interactive.
When quiet mode is selected—by command-line, GUI, or console—the system
still generates one message indicating that backups mode is enabled and giving the
backups folder path, but it does not print a backups message for every file replaced
or removed. Users may still inspect the backups folder to see results.
{screenshots} Add thumbnails pages for screenshots folders
To make the screenshots collection easier to browse, thumbnail image index
pages were added to the screenshots root folder, as well its subfolders. See the
new root index page, and
click on its subfolder links. The subfolders display their own thumbs
pages automatically on a server;
click their "index.html" files manually if viewing offline in a file explorer.
These pages are courtesy of the Python-coded
thumbspage program.
This is a minor patch release, to address two issues of minimal impact. No screenshots
were retaken for this release, and documentation changes pertain to this release's
changes only. For all code changes applied in this release, search for "[2.3]" in
source file backup.py.
{mergeall} Use explicit UTF8 (by default) for __added__.txt encoding
In both Python 3.X and 2.X, use an explicit UTF8 Unicode encoding, instead of the
platform default encoding, for writing and reading the __added__.txt files created
in backups mode for use in 2.1 emergency restores. These
files reside in per-run __bkp__ subfolders, and are used for backing out prior archive
additions. The new preset UTF8 encoding should suffice for most use cases, but can
be changed in code if required; see backup.py's ADDENC setting.
This is a minor change unlikely to impact most users (if any at all), as both unencodable
filenames and emergency restores are very rare. Without it, a new file whose name
could not be encoded per the local Unicode default would be added to the TO archive
normally, but also generate an error message in the mergeall log, and not be removed
from the archive automatically by a future emergency restore.
This change is also expected to be largely backward-compatible: because ASCII is a
subset of UTF8, this should not have any major impact for most users' __added__.txt
files written before this change was applied.
{mergeall} Use code portable to Python 2.X for os.makedirs() calls
Python's os.makedirs(), used in backup-mode runs, supports an exists_ok switch in 3.X only
that suppresses an exception if the path already exists. To support backup-mode use on 2.X,
specialize all makedirs calls on 2.X to emulate the 3.X exists_ok behavior without passing
the 3.X-only argument. This patch applies to 2.X users only, but is crucial for such
users. Without it, nearly all backup-mode mergeall runs will fail on 2.X with exceptions.
Note that use on Python 2.X is now generally discouraged, as 3.X has better support for
Unicode; 3.5+ allows for much faster execution since mergeall
version 2.2; and mergeall's development
"staff" has limited resources for 2.X testing. As a random compatibility example, filenames
with odd characters may still be skipped by mergeall in 2.X only, because that Python's
os module fails to classify them as either file or directory on Windows (unlike 3.X).
In retrospect, supporting both Python lines in a system-level tool like mergeall has proven
to be substantial effort, and probably prohibitive in this project's context. Library differences
can impact code more than language differences, and are often more complex to accommodate.
While mergeall largely works the same on 2.X, and 2.X usage is not deprecated, please run mergeall
on 3.X if at all possible.
More Windows FAT32 filename character mangling: emdash versus ASCII dash
This note describes a very rare mergeall usage issue, not a mergeall bug or change.
An erroneous translation of dashes in filenames was recently observed on a FAT32 device,
which seems related to the accent-morphing issue described earlier (ahead) for mergeall
versions 1.7.1 and 1.7.
To date this has been seen only on one USB flashdrive and Windows 7,
but potentially applies to any FAT32 drive.
Specifically, the content-based diffall script reported a spurious file
difference not noted by the timestamp-based mergeall. This happened
on a FAT32 device containing two files of differing content, whose names differed only in one character
position which was an ASCII dash ("-") in one and a Unicode emdash ("—") in the other.
For example, with paths and some output omitted for space:
c:\test> dir /B "d:\xxxxxx*"
xxxxxx - xxxxxx.htm
xxxxxx — xxxxxx.htm
c:\test> dir "d:\xxxxxx*"
04/03/2016 09:46 AM 50,444 xxxxxx - xxxxxx.htm
04/15/2016 11:30 AM 50,573 xxxxxx — xxxxxx.htm
When both such files are present on a FAT32 drive, the Windows operating system may return the wrong
file's content for a given filename, because it internally maps the emdash to an ASCII dash.
This in turn causes diffall to register a false file difference.
Because this occurs in the filesystem level of the operating system, it may not
be addressable in Python code—filename dashes passed correctly by a Python script
are mishandled after they are received by an open() call. In fact, this issue extends
beyond Python: the two files in question also incorrectly report a difference in a
Windows/DOS "fc" command line despite having identical content.
For instance, in the following command-line session, the same issue crops up when comparing
same-named files on an SSD (NTFS filesystem) and USB flashdrive (FAT32 filesystem) having names
with an embedded emdash. Curiously, comparisons fail only after similarly named files
with an ASCII-dash have been accessed once; prior to that, the emdash files compare the same
correctly, suggesting that caching may be a factor:
# After either a fresh insert or removal+reinstert of a FAT32 USB flashdrive on d:
c:\test> fc "c:\xxxxxx — xxxxxx.htm" "d:\xxxxxx — xxxxxx.htm"
Comparing files C:\xxxxxx — xxxxxx.htm and D:\XXXXXX — XXXXXX.HTM
FC: no differences encountered
c:\test> fc "c:\xxxxxx - xxxxxx.htm" "d:\xxxxxx - xxxxxx.htm"
Comparing files C:\xxxxxx - xxxxxx.htm and D:\xxxxxx - xxxxxx.HTM
FC: no differences encountered
c:\test> fc "c:\xxxxxx — xxxxxx.htm" "d:\xxxxxx — xxxxxx.htm"
Comparing files C:\xxxxxx — xxxxxx.htm and D:\XXXXXX — XXXXXX.HTM
***** C:\xxxxxx — xxxxxx.htm
<meta name="bitly-verification" content="3xx1017cyy1d"/>
<title>xxxxxx ΓÇö xxxxxx
***** D:\XXXXXX — XXXXXX.HTM
<meta name="bitly-verification" content="3xx1017cyy1d"/>
<title>xxxxxx - xxxxxx # <= ASCII-dash content
*****
# ....Plus many more diffs....
This issue wasn't addressed in mergeall, because it may be impossible to fix at the Python level,
and seems rare in the extreme—it has been witnessed only once in two years
of frequent mergeall usage; may be limited to a subset of devices used on Windows; and can occur only
for folders containing files with names identical apart from alternative dash characters in the same
positions.
Should this recur anyhow, the suggested workaround is to either ignore the diffall differences,
or simply adjust your filenames. Formatting USB drives with NTFS may help, but this
may also impact drive performance, and is to be determined.
For more hints on the convoluted—and even tortuous—underlying operating-system issue, see this forum
thread, or this Microsoft
page.
I'd report this as a bug to Microsoft, but a Windows fix for this seems as likely as ski-lift tickets in Hades
(no, really).
Summary: faster execution with os.scandir() using Python 3.5+ or PyPI package install
This version was repackaged three times after its initial release:
On Jan-27-16 with minor code and doc changes:
Correct the script name in diffall.py's usage message;
add total runtime in diffall.py's report;
and add documentation notes about common role,
cross-platform restores, and
diffall purpose.
On Nov-10-15 with doc changes only:
New font, header, and toolbar styling;
minor content tweaks;
and updated URLs for book site relocation.
Update for version 3.0:
The scandir() optimization described below ran comparisons 5X-10X faster on Windows and
2X faster on Linux, but proved to run 3X slower on Mac OS X, as used by mergeall.
Consequently, mergeall 3.0 used this call on Windows and Linux, but not on Mac OS X.
A later recoding to use saved os.lstat() results eventually made the non-scandir() variant
as fast on Windows and Linux, and made the scandir() optimization obsolete.
For more details, see comments in the comparison-phase code of mergeall.py.
Version 2.2 speeds up tree comparisons radically by using the new os.scandir() call, which is standard in
Python 3.5 and later, and available separately as a PyPI package
for other Pythons, including 2.7. In tests on Windows, the mergeall tree comparison phase runs 5 to 10 times quicker
when the 2.2 optimization is used, depending on devices and trees. For larger trees, this can shave dozens of seconds
off total runtime, and more on slower machines. If the scandir() call is not present in the os module or a separate
install, mergeall falls back on the original os.listdir() scheme to support older Pythons (though a scandir() is now
recommended for performance).
mergeall's resolution phase was not optimized, because it is bound by file write times, and visits only
differences. Because the optimized tree comparison phase always scans two trees exhaustively, however, it can
dominate mergeall runtimes, especially when there are relatively few changes in large trees. This change impacts
only the mergeall.py script, whose output was augmented with an initial line indicating
use of the new optimization, plus lines giving runtime for each of its phases.
Other:
As no changes were made to the GUI apart from a new version number, most prior screenshots were not retaken
for this release.
One new screenshot was taken on Windows 10
as described in the list below, and a new folder-browse dialog screenshot was taken for its new and improved native
format on Windows as of Python 3.5.
The new folder dialog reflects a change in Python 3.5 (really, in the latest version of the Tk 8.6 library it includes),
not in mergeall code; see
this overview and the
Tk changes note for more details.
Documentation was also revamped for this release as usual (and restyled for the Nov-10 repackaging).
Summary: automatic restores (a.k.a. rollbacks) from automatic backups
April Updates
After its March release, this version was repackaged—most recently on Apr-29-15—with
only very minor changes to its documentation files and retaken
screenshots for its Ultrabook,
Windows tablet,
and Linux use case.
As these changes did not impact any functionality, a new version number was not warranted.
March Release
Version 2.1 was an afterthought to 2.0. By using and extending 2.0's automatic
change backups, 2.1 supports complete and automatic rollback of an immediately preceding
run's changes, including additions, as a failsafe for catastrophic or emergency scenarios.
{mergeall + docs} Automatic restores from automatic backups
Added support for complete rollback of a prior run's changes, by extending the
2.0 "-backup" option and adding a new "-restore" option in
mergeall.py to allow changes to be undone by merging from
a __bkp__ folder's date/time subfolder to its archive's root. These changes are
invoked in consecutive mergeall runs:
Synchronize run: The existing "-backup" option saves replaced and removed items in the TO folder's __bkp__ as before,
but was extended to also list items added to the TO tree in a new __added__.txt file at the top of a
__bkp__ date/time subfolder.
Restore run: The new "-restore" option runs a normal merge from backup to root (in automatic or selective updates mode), but:
Does not delete unique items in the TO tree. In restores, the TO tree is the archive root and FROM is the backup; items
present in the archive but not the backup were unchanged in the prior synchronization run.
As a pre-merge step removes items from the TO tree that are listed in a __bkp__ subfolder's __added__.txt
(if this file is present). This is pre-merge because order matters for renames on Windows. The __added__.txt
file itself is copied to TO by the merge as well, but manually removed.
Hence, when mergeall is run from a command line with "-restore" to merge from a prior run's backup subfolder
to its archive root, the net effect is a complete rollback of all changes made in a prior run: replacements
and removals are restored, and additions are removed.
Restores require "-backups" to be used in the prior run, and are primarily intended to be used to restore all of an
immediately preceding run's changes in catastrophic scenarios (e.g., transposing FROM and TO folders). They will not
fully reset the TO tree if any changes were made to it since the backup was created (and in this event may erase more
recent changes), and older backups will be out of synch with the current tree unless applied serially.
For general restore operation, see this
backups folder.
For implementation details, see
mergeall.py's changes marked with "[2.1]" and backup.py.
For complete usage details, see Whitepaper.html.
Automatic restore is available in command-line mode only;
because no changes were made to the GUI, no GUI screenshots were retaken for this release. Logfile content
is also unchanged in this release apart from a minor section reordering (per item #3 ahead).
Usage update (defunct): because added items are recorded using the path syntax of the platform on which the
prior mergeall ran, restores with additions should generally be run on the same platform as the prior merge.
On platforms with incompatible path syntax, additions won't terminate a restore operation, but they will trigger
error messages and won't be backed out.
Usage update update: as of mergeall 3.0, the prior note's constraint has been lifted, by converting
__added__.txt path separators from '/' to '\' on Windows, and from '\' to '/' on Unix. This makes these
paths portable, such that backups saved on Windows can now be rolled back on Unix, and vice versa.
For details, see the "CAVEAT" and "UPDATE" in function removeprioradds() of source file
backup.py.
{utilities} New rollback.py convenience script for restores
As part of the restore enhancement, also added a convenience script, rollback.py. Given
just an archive's root path (on the command line or interactively), this script automatically builds and runs an
automatic-updates restore-mode mergeall command line, by globbing and sorting to find the archive's latest backups folder.
This script also routes prints and prompts to stderr, so that mergeall stdout output (only) can be captured to a file via
a ">" shell redirect, and can be run by command line or filename/icon
clicks. See its example session.
{mergeall} Reorder categories in differences report for consistency
Minor and cosmetic, but in mergeall's differences report, order the categories to match the order in
which their updates are applied (and later reported), as well as the order of totals printed in the summary report.
This makes the report more consistent, but also reflects the fact that update order can matter on
some platforms (on case-insensitive Windows, deletes must always precede adds for mixed-case renames;
see mergeall.py's mergetrees() docstring for details). This complicates logfile
comparisons to prior versions, but is a user-visible item.
Fetch the limit on number of backup folders per archive copy from the new mergeall_configs.py
module, which can be more easily changed by users than a hard-coded literal in the program's code. After this limit is
reached, backups are pruned by age. Frequent mergers may want a higher number than the default (10), and users with typically
large backup folders may want a lower setting. Errors in this module simply make mergeall fall back on the default (it has
just one setting today).
{docs} Rewrote Whitepaper material to clarify intended usage
In the main usage overview doc (now Whitepaper), updated the usage modes
section substantially to better describe ways to
use the system; some of this was formerly tentative by design, but practice has solidified its concepts.
Also added a new comparison to Windows explorer
folder merges (which really just combine, not synchronize).
Summary: automatic backup of changed items, more intelligent GUI, help, counts, DST, etc.
This version's development spanned two and a half weeks. It was initially focused
on the new auto-backup for changes option, but spawned additional enhancements,
and warranted a new major version number.
{mergeall + launchers} Automatic backup of changed items
When enabled in the launchers or mergeall command lines, this option makes
backup copies of all files and directories in the TO directory that will
be destructively replaced or deleted in-place during a mergeall run. These
items' prior versions in the TO tree are saved in the automatically created
__bkp__ folder at the top of the TO archive, with their full directory paths,
and segregated by run in a date/time-stamped subfolder. Backup folders are
not synchronized across trees, but are automatically pruned by age when their
number exceeds a limit.
This option makes mergeall generally safer, as unwanted or failed changes can
be later undone by restoring backup copies from any of the latest mergeall run
backups in the __bkp__ of any archive copy. This change's new "-backup" mergeall
command line argument was also integrated into both the GUI and console launchers.
Automatic backups defaults to on (enabled) in the GUI launcher, because it should
normally be used for data safety unless space becomes a concern.
Backup folders can be changed by users arbitrarily; their per-run subfolders
may appear as diffall.py differences that generally can
be ignored. When used, __bkp__ folders can also serve as a record of runs with
changes against a tree, and an alternative to the logfile for inspecting changes,
though only replacements and deletions are recorded; new additions are never backed
up, as they would be just redundant copies (though version 2.1
later extended the backups option to also list additions in a backup folder's __added__.txt).
Also note that, despite its name, this new backups option simply saves prior
versions of files and folders on changes, and is just a nested operation within
a general archive backup performed by a mergeall run. For more complete details,
see the docstrings in the new file backup.py, which hosts
the backup system's implementation, as well as the summary in the version 2.0 update
of Whitepaper.html.
{Launcher GUI} More intelligent and dynamic GUI
The GUI launcher was changed so as to show only configuration items
relevant to run modes selected: the logfile folder chooser appears
only if logging is toggled on, and the new backups toggle frame appears
only if automatic updates are selected (-backup applies only to -auto
mode in the GUI, as it has no interactive/selective update mode).
Both hidden components retain their state while hidden in the GUI.
Also made the mode selections text more descriptive: changed from
"Report only" and "Automatic updates" to "Report differences only" and
"Automatically resolve differences in TO" (this is a GUI, after all).
{Launcher GUI} Help button and popup
Added a "Help" button that spawns the main mergeall user guide
document in a web browser (in the spirit of the
frigcal calendar
GUI.
Just a convenience, but useful nonetheless.
{mergeall} Summary report: number files/directories compared/changed, diffs found
Added counters for both the comparison and resolution phases, displayed
in the log at each phase's end. For comparison: files and folders checked.
For resolution: (replaced, deleted, created) for both files and folders.
Later added counts of number differences found in each of the 4 categories,
from differences data.
{Launcher GUI} Workaround for last line covered on repack GO button
mergeall now issues a final 'finished\n\n' message, which prevents the
last output line being covered when the GUI's GO button is unhidden after
resizes (a minor annoyance, that required a scroll). The extra blank line
is now covered, which is easier and less distracting than auto-scrolling.
On Windows, retry shutil.rmtree's os.rmdir directory removal calls that
fail, via a temporary wait-loop callback on errors. Apparently, Windows
deletes may sometimes not be finalized immediately—they are left still
pending after the delete call returns (perhaps due to other activities,
such as indexing or anti-virus software). This is lethal to rmtree, as
directories cannot be removed until after all their contents are removed.
This seems rare; indeed, it's been observed on just one machine after a
year of usage, and may warrant further research. However, its symptoms
were witnessed on failures during the new backup folder pruning, and are
also prone to occur during mergeall's normal deletion of unique TO folders.
To trigger the delete error recovery logic, open a file in a folder to be
deleted.
Note that this recovery logic applies only to os.rmdir calls in shutil.rmtree directory
removals, not to deletions of simple files in the TO folder with os.remove. File
deletes could be retried too, but there seems little point; such failures are very rare,
they're likely to be caused by unrecoverable permission errors, and they just leave an
extra file in TO. Temporary in-use lock failures will be cleaned up by the next mergeall run.
Scan your logfiles' resolution phase messages (or the scrolled text in the GUI) to see if
any updates may have failed.
See backup.py for additional details, links to related
threads on the web, and the workaround's error callback. Python's shutil.rmtree
may address this shortcoming in the future, though failing changes may be a broader
Windows issue (os.rename, not used here, also seems suspect). All such failures
are mostly harmless here, as they simply cancel a single update and continue,
leaving a difference for the next mergeall run to resolve.
{mergeall, Launchers} More error checking for command-line arguments and files
Expanded error checking for command-line arguments passed to mergeall, in both command-line
and launcher modes. Bad from/to file paths formerly showed full Python exception
text in all three usage modes, but no longer do:
In the mergeall script, catch non-existent from/to paths in the
command-line, and report with a simple error message, instead of exception text. Also start
interactive help('mergeall') as before on this and other usage errors, but only if stdin and
stdout are an interactive console—not when connected to subprocess pipes used by the
launchers in most modes. pydoc itself didn't prompt for input when the calling process was
connected to pipes, but mergeall formerly did (though not for bad paths), and prompts in spawned
programs can be problematic (see ahead)
In the GUI launcher,
check for bad from/to paths before starting mergeall, so errors can be reported in
new GUI popups,
instead of mergeall's text output. mergeall's own checks would catch this and display text in the
GUI's text area (without help()), but that's not as nice in a GUI, and showing usage help for a command-line doesn't
make sense for users of a GUI that automates it. The logfile's path was already being handled by pretests
this way.
In the console launcher, also
test for valid from/to paths before starting mergeall, and display a simple message instead of
exception text. mergeall's own error messages would work in both the interactive and non-interactive
modes of this launcher (interactive mode shares its streams with mergeall; non-interactive is non-tty,
so help() would be precluded and not prompt for input), but mergeall's command-line oriented display
doesn't make sense here either.
{Launcher GUI} Use Desktop for logfiles by default on Windows
Set the initial value of the logfile path to the user's Desktop folder, on Windows machines
where this works and exists (on all others, use the former "select..." message). This is just an
initial suggested default for convenience, and can be changed freely in the GUI. It's intended to
discourage use of a flashdrive for both an archive source and logfile target (which slows progress),
but could prove too user-friendly to retain.
Not part of mergeall itself per se, but in the accompanying diffall.py
script borrowed from the book PP4E,
added a "-recent [days]" command-line option which limits file comparisons to files
modified within the last N days in either tree (N defaults to 90 if not given;
use 365 for a full year). This is a
heuristic, designed to allow quick verifications for recent mergeall changes
only. It assumes that recent changes in a large archive are typically
limited to a small subset of its files.
By default, diffall does a full byte-for-byte compare of every file in two
trees, and should be run occasionally to verify integrity of entire archive
copies. While complete, this script can take a long time for large archives
(1 hour or more for the 72G use case, with a USB stick and micro SD card).
The "-recent" option allows for quicker verifications of just items changed
recently, and hence subject to recent mergeall updates. This option is for
command-line use only; mergeall's "-verify" still does exhaustive compares.
Like mergeall, diffall also grew new simple counter stats, reported at run end;
its output ends with an extra line of this form:
"Dirs checked 52, Files checked: 8, Files skipped: 1528".
{diffall, cpall} Call file.close explicitly for use outside CPython
Changed the related diffall.py and
cpall.py scripts/modules borrowed from
PP4E
to call file.close explicitly for use outside CPython (e.g., PyPy), rather
than relying on the auto-close-on-collection behavior of file objects in CPython.
diffall.py is run by mergeall for "-verify", and manually for archive integrity
checks; cpall.py is imported and used by mergeall for its core file and tree
copying.
{mergeall, cpall} Dropped the cpall.copyfile shutil.copystat hack
Got rid of a blatantly evil case of monkey-patching in mergeall.py, by changing
cpall.copyfile in-place to call copystat as a default option. The original
code went to great lengths to avoid changing cpall, but was far too dark to
document further here; see mergeall.py (if you must).
Added a new script, fix-fat-dst-modtimes.py, as one option
for addressing the 1-hour modtime skew of FAT drives on Windows that occurs at
Daylight Savings Time rollovers. Simply run this from a command line after each
DST rollover; it adds or subtracts an hour from the modtime of each file in a FAT
archive copy, to keep them in synch with an NTFS copy, per mergeall's timestamp+size
comparisons. For more on this issue, see the version 1.4 release note below;
it's also mentioned in Lessons Learned and
Usage Overview. See the
script itself for usage pointers.
3.0 update: you can generally avoid this script by formatting external drives with
exFAT.
{Docs, examples} Relative links, README to HTML, miscellaneous changes
Assorted non-functional changes:
Adjusted links in documents to use a new examples structure that's the same in the
zipfile;
for the web, use links relative to a simple unpacked copy of the zipfile
instead of copying individual items to a website folder. Due to ISP rules, this also required
an .htaccess file in the top folder (only) to display indexes on the web,
and forced this top-level file to be renamed from README.html to Readme.html
(see .htaccess).
Converted this README from plain text to HTML for readability; rewrote much of the
docs folder's existing HTML documentation; generated new screen shots and
logfiles in the (now defunct) examples folder.
Added a note in launch-mergeall-Console.py
with new findings on the streams issue for interactive input prompts from programs spawned by
subprocess. The prompts work if the streams are unbuffered and read by byte instead of line,
and the parent process's stdout is flushed after each byte is printed. This may be still
problematic, though, for multi-byte Unicode characters, endline sequence normalization, and
large outputs.
Added another new script,
__sloc.py__, a simple source lines-count script used for metrics purposes only.
More on Windows FAT daylight savings time rollover issue: 2 copies
It has been pointed out that this issue, documented in version 1.4 notes
below, can also be addressed by keeping two
FAT device archive copies: one to be used when DST is active, and one
when it is not. This way, DST rollover won't require a full archive rewrite
on the currently used copy, and you'll also automatically keep a longer-term
backup copy. Keeping two such copies on the same device is equivalent
to keeping the copies on separate devices, provided your archives are small
enough, and your device is FAT enough (yes, pun intended).
Mergeall 's updates can fail for files whose permissions preclude changes.
This includes files marked as:
Read-only (copyable, by not changeable in an archive copy)
Hidden/system (e.g., dekstop.ini, thumbnails.db, some media files)
In-use by another process (even the Windows indexer can trigger this)
These failures don't stop a merge; they report as errors in the logfile
and are simply skipped, leaving the difference for the next run. To avoid
these failures, though, make sure that the files are not read-only or
hidden, by right clicking to their Properties, and unclicking these modes
(you may need to enable viewing of hidden files in order to see them in
file explorer).
Mergeall itself does not change permissions, as your files are your property;
read-only mode, for instance, may be set deliberately to avoid overwrites.
In-use errors (and skips) can't be avoided by mergeall in general; be sure
that you don't have a file open in the TO archive when mergeall is run,
or rerun again to pick up changes for files previously in use.
{mergeall} Minor error message text format patch in mergall.py
The "message" argument in mergeall's file error() message text was not
being displayed. Also prefixed error text produced by cpall.copyfile()
with "**", so the format of errors reported during its recursive tree
copies matches that of mergeall's own top-level file error messages
(they're now both "**Error...").
Update on 1.7's Windows Unicode filenames issue: accents
Update: though details have now been lost to time, it's not impossible
that this issue reflects, or at least is related to, the Unicode normalization
issue addressed in 2021's version 3.3 above.
Update: for a possibly related example of this issue observed later,
see also release 2.3's usage note above.
This note augments a 1.7 usage note below.
On further exploration, this appears to be yet another FAT32
filesystem issue, and dependent on order of directory copies.
The issue occurs only when both:
Copying to FAT32 filesystems, of the sort used by default
on USB flash drives.
Copying the non-accented name first, followed by the
accented name that is otherwise equivalent.
When both conditions are met, both Windows file explorer and
the mergeall Python script issue an error for trying to create
a folder that already exists.
For instance, Windows' file explorer issues the following error message text
in a popup and offers to merge folders, even though the only
folder in the destination is the unaccented "Rodriguez":
This destination already contains a folder named 'Rodríguez'"
Python—and hence mergeall—issues a Windows 183 exception;
mergeall skips the single folder copy and continues, per the
messages in its run log:
copied new FROM dir, C:/.../test-Rodriguez\Rodriguez
**Error copying FROM dir: skipped C:/.../test-Rodriguez\Rodríguez
[WinError 183] Cannot create a file when that file already exists: 'D:/rodriguez\\Rodríguez'
copied new FROM file, C:/.../test-Rodriguez\findings.txt
Hence, this is the same FAT32-related error, and seems independent
of Python. Conversely, the issue does not occur when either:
Copying to NTFS filesystem devices (e.g., to the C: drive) via
drag-and-drop, cut-and-paste, or otherwise.
Copying the accented name first, followed by the non-accented name
(or when a multi-folder copy is lucky enough to be ordered this way).
In either case, both folders are created, and no error occurs.
If you do manage to copy both folders to a FAT32 device, though, trying
to delete both later either issues an error or leaves one unremoved.
This behavior seems a bug, given that FAT32 on USB drives supports
non-ASCII file and folder names in most other contexts. It may, however,
reflect a fundamental limitation in the older FAT filesystem used by
default for most USB and SD flashcard devices.
There may be a procedural workaround for this issue that requires an
additional and manual step (e.g., code page settings?), but an automatic
resolution may be beyond the scope of a Python script if the issue is
inherent in either the FAT32 implementation, or Python's own choice of
filesystem API calls. In any event, it seems rare enough to warrant a
pass here. The workaround for now is to either:
Rename without accents
Manually merge the two folders' content once
Manually copy the folders once in the desired order
Watch for "**Error" in your run logs to see if/when this occurs.
The following links provide background on this issue, but search on "fat32 unicode filenames"
for other pointers:
This page
on MSDN describes (tersely) the underlying issue
{Launcher GUI} Minor fix for Python 2.X only: showerror import
Add an import of Tkinter's showerror when using Python 2.X; else this dialog never
appears if a bad logfile name is used. The import was present for 3.X, but
not 2.X, and was required only by a rare context never tested under 2.X.
{Launcher GUI} Minor fix: catch log open() exceptions
Catch PermissionError (etc.) on logfile open and report error in popup;
else fails silently on Windows, as ".pyw" has no console for exception text.
This can occur if you select "C:\Program Files" for the log dir on Windows.
Formerly, only the existence of the logfile's folder was verified.
{mergeall} Add disposition note lines to differences report
Add message lines for each difference category, reminding user how they will
be resolved by automatically if -auto, or if updates selected in GUI: "These
items will be replaced", "These items will be permanently deleted", and so on.
{Docs} Assorted minor doc updates, and USB 3.0 speed correction
Assorted minor updates to the HTML files in the docs subfolder, plus one minor
correction added in Lessons-Learned.html:
its USB 3.0-versus-wifi speed figures were off by a factor of 8 due to bytes/bits rating differences
(USB is actually 8X faster than previously stated). Also added new version 1.7
screenshots in examples/, taken on Windows 7.
Update: see further details on this issue in 1.7.1's usage note above.
A bizarre and very rare use case can trigger run-log error messages that
require manually copying a directory after mergeall finishes. The observed
behavior: on Windows 7, if there are two different directories named:
then the two are treated as having the same name, and you cannot copy both
to the same folder. This is true for Windows drag-and-drop copies (which
issue an error), so it appears that Windows itself effectively drops the
accent, making the two the same for core file operations.
Mergeall reports an error for trying to create a folder that already exists,
when copying the second of the two. In this likely very rare event, the
simplest workaround is to manually copy the folder whose automatic copy
failed and displayed an error in the log. This is not Python 3.X/2.X-specific.
It may be possible that using bytes (instead of str) for folder names in
mergeall's os.listdir() calls would obviate this issue, but Window's own
drag-and-drop failures suggest that it might be a deeper issue in Windows
itself, and the issue's rarity and large impact on existing code makes
further exploration unwarranted. This would also apply to Python 3.X only, because
2.X has no true bytes object. A Windows 7 (US) console doesn't even print
this character properly, though IDLE does, and your console might (setting
the Windows codepage via a "chcp 65001" helps on mine—see Page 755 in
Learning Python, 5th Edition (LP5E)
for details, and test with the following script):
Wrapped a stream line decode in an exception handler, to prevent its
potential failure on Python 2.X from killing the GUI for some non-ASCII
characters in filenames. This is a process-boundary issue that impacts only
the GUI display (not the logfile, or the underlying mergeall process), and
reflects a 2.X/3.X incompatibility, despite the launcher's automatic propagation
of PYTHONIOENCODING. See the 1.6 change note in
launch-mergeall-GUI.pyw
for details (search on "1.6"). This fix was also applied to the
console launcher, for stream
lines decoded for console display.
Note that this patch applies only to the GUI and console launchers' displays.
Its worst-case impact is that some non-ASCII filenames may be displayed with
"(UNDECODABLE LINE):" prefixes and still-encoded names in the GUI or console
launcher displays under Python 2.X only. This normally happens for just a handful
of filenames, if any, and filenames display correctly in both the logfiles created
by the launchers, and the main mergeall.py script itself,
which processes files with non-ASCII names properly. Nevertheless, this is significant
enough to recommend use of Python 3.X for users with archives having many non-ASCII filenames.
Also note that PYTHONIOENCODING must still be set manually in your system shell
when running script mergeall.py directly from a command line, if it may ever process
and thus print non-ASCII filenames, especially in 3.X. This manual setting
isn't required for the GUI launcher, as it automatically sets and propagates
this to its mergeall.py subprocess, and does not route text to a console
(only to a GUI and logfile). However, this setting may be required for both
mergeall.py and the console launcher, as both print filenames to the console.
{Launcher GUI} Verify main window quit
Added a simple quit verify dialog. Caveat: this avoids accidental exits,
but no longer shuts down the GUI immediately if there are queued lines
to be displayed; a sys.exit() might exit quicker, but could result in
GUI error messages in the console.
Synchronized 2 MoreDocs/ HTML files with current versions on book website
(Lessons-Learned, and Whitepaper which is now called mergeall.html
on the website). Also added version number in GUI launcher title (and
console launcher startup), and fixed file launcher-mergeall-GUI.pyw to
have Windows eolns (a.k.a. end-of-lines, endlines); as it was, this file
inconsistently had Unix line breaks, which show as a single line in most some
text editors like Notepad (though not PyEdit or IDLE);
origin unknown, but likely harmless. None of the changes in this category
impacted program execution.
Summary: Linux compatibility—patch and usage notes.
This system was initially developed and used on Windows (7 and 8).
Testing on Linux (Fedora 20/Gnome 3) has so far yielded one minor
patch, and two usage notes for Linux users.
Note that the patch applied allows mergeall to work on Linux for archives
containing basic files and directories—that is, for normal user data and media.
More exotic Linux file types (e.g., links and FIFOs) remain untested, and
may or may not require additional changes; modify as desired.
In both GUI and console launchers, changed the call to Python's
subprocess.Popen() to pass shell=False on Linux, and other Unix-like
platforms, only. Else, when passing a command-line sequence (not a
single string), this call always spawns just an interactive Python
session—as though the full command run were "python", the first
item in the command-line sequence. However, on Windows, shell=True
is still required if filename associations are to be employed. This
seems counter to the portability goals of subprocess (and is largely
undocumented), but the fix is very minor.
With this patch, mergeall's GUI and main script work well for basic file
types on Linux in testing thus far; see the Linux screenshots from versions
1.5 and 2.1 (defunct), and 3.0.
Linux users may want to change some of the "#!" first lines in this
system's script files to name the specific version of Python for which
you have [tT]kinter GUI support installed, if you wish to run the scripts
directly as executables. For instance, a change from "#!/usr/bin/python"
to "#!/usr/bin/python3" in launch-mergeall-GUI.pyw was required for my
Python 3.X install, but was not changed as such in the released code,
as this script also works on Python 2.X systems and other platforms.
Change as needed for your installs and links, or use full "python2 ..."
or "python3 ..." command lines to launch the top-level script.
Also on Linux, it appears that there is another file timestamp DST
rollover issue that makes some files' mod times off by an hour when
synchronizing between Windows and Linux trees. Specifically, a Windows
NTFS volume (e.g., your mounted C:) may report some mod times skewed by
1 hour from Linux times; this appears to happen for files saved in the
past while DST was active. Naturally, this can generate spurious
differences in timestamp-based synchronization tools like mergeall.
This is a TBD, but seems related or similar to the Windows NTFS/FAT
skew reported in release 1.4 notes below (see its item #1).
No fix was coded and no ideal workaround is yet known; but synching once with
auto-update on suffices to remove the timestamp differences, albeit at
the expense of some extra one-time copies. As a demo, the new Linux
desktop screenshot in ./examples/Screenshots shows mergeall runs on Linux
performing and verifying a Windows/Linux timestamp synch. Note that this
is an issue only when comparing trees _between_ Windows and Linux, not
for compares of trees that reside on the same platform.
{GUI Launcher} Enhanced to thread subprocess stream reads
Read the spawned mergeall subprocess's stdout/stderr lines in a spawned
parallel thread, that posts lines to a queue polled by timer events in
the main GUI thread. This structure is more complex, but prevents the
GUI from being blocked and unresponsive while waiting for a next line
from the subprocess—not a bug and normally not a concern, but it
could become apparent if mergeall was busy copying large trees.
Redo on 1.2 issue: forcing the mergeall subprocess to use
the default Unicode encoding in the locale module sufficed to make it agree with
Popen's text-mode stream reader (which always uses the locale setting), but still
failed on encoding errors on Windows for some Unicode filenames as they were
printed in mergeall—before they ever reached the Popen reader. Fixed by
forcing subproc to use the broader UTF8 for its prints via PYTHONIOENCODING, and
reading stdout lines from Popen in binary mode with manual post-read UTF8 decoding.
See the 1.4 change notes in launch-mergeall-GUI.pyw
for more details.
{Launchers} Fix for Python 2.X logfile incompatibility (binary mode files)
Prior launcher versions failed in Python 2.X when logfiles were enabled,
because they opened logfiles in text mode using 3.X's open() with encoding,
and didn't account for 2.X's different open(). Temporarily changed to use
open=codecs.open in 2.X, then changed to write logs in binary mode with new
binary stream data to sidestep the issue altogether. 2.X's codecs.open() does
not expand \n to \r\n on Windows when writing decoded Unicode, though the next
item made this a moot point.
{Launchers} Handle Python 2.X -u unbuffered flag in mergeall spawn command-line
This Python switch makes streams unbuffered, but oddly also makes line-ends
\r\n in 3.X but \n in 2.X, which leads to single-line logfiles in Windows
if not special-cased. Temporarily dropped for 2.X compatibility, so all
line-ends are \r\n when written to files on Windows.
Later reinstated: without the Python '-u' unbuffered flag, mergeall output
may not appear for 10 or more seconds on some machines and slower devices
due to internal buffering. Because this flag also makes line-breaks differ
between Python 2.X and 3.X, though, also need to use special-case logfile
writes to map all linebreaks to the platform's version. See 1.4 change notes
in launch-mergeall-GUI.pyw for more details.
{Docs} Added Lessons-Learned.html post implementation notes
This write-up summarizes trade-offs and
issues, and discusses decoupled versus single process architectures.
Update for version 3.0: A new overview of this issue and a new list of fixes now
appears in the User Guide added in release 3.0.
Importantly, users on Windows and Mac OS X are now advised to format their external
drives using the exFAT filesystem, which avoids this issue altogether; Linux exFAT
support is somewhat emerging, but the fixer script and other options below still apply.
On Windows, FAT/FAT32 file systems (e.g., many USB sticks) have an issue with
daylight savings time (DST): they adjust file modtimes for localtime, making them all
appear to be off by one hour when DST begins, versus the true UTC time of NTFS and
exFAT. This is a well-known Windows issue, and seems to occur only if your
Windows system is set to auto-adjust when DST begins, but it can make every
file register as a difference in mergeall if only one of the drives uses FAT.
No solution was coded in mergeall itself here, but there are a variety of procedural
ways to deal with this, from arguably simplest to most complex:
Allow mergeall, and other timestamp-based backup or synchronization
tools, to rewrite your archive in full twice a year.
Clear your Windows auto-dst-adjust setting, and manually change your
time/clock when needed (see below).
Use two FAT device archive copies (e.g., on one or two USB sticks)—one during
DST and one otherwise; this has the advantage of keeping a long-term backup copy
(more on this in 2.0 usage notes).
Write a script to add or subtract 1 hour on all file modtimes, and run on
FAT drive archives at DST rollovers; use os.walk, os.path.getmtime, and os.utime.
Done => see this 2.0 note for a script to run.
Use NTFS instead of FAT on your drives (e.g., a shell command such as
"convert D: /FS:NTFS" can do the job), if this makes sense on your device;
it may degrade performance on some.
Resort to using lower-level C/C++ Windows libraries if they offer a
solution not available in Python directly (this requires recoding, and
possibly C++ skills if no Python API exists).
The first of these is the default if you take no action. The second—clearing
your auto-dst-adjust setting—is easy but manual: see
Control Panel => Date and Time => Timezone, or click your toolbar date/time to
clear your DST setting (be sure to "OK" out of all your Control Panel dialogs).
The third and fourth require some minimal action at DST rollovers;
the new 2.0 script makes the fourth a simple command-line
run, but the third ensures a long-term backup.
See Lessons-Learned.html for more
on this issue, including relevant links on the web.
Excel on Windows (among others?) can occasionally change a few bytes in a file's
content trivially without updating the file's modification time or size. This registers
as a difference in the bytewise diffall.py but not in the
timestamp/size-based mergeall.py (and is officially considered to be cheating here).
Such modifications appear to reflect changes to unimportant metadata only; thus far
seem limited to older Excel files opened but not saved; and can generally be ignored.
Copy over the impacted files manually, if you don't wish to see the diffall difference.
Update for version 3.0: pathname limits were eventually addressed
and lifted on Windows by automatically adding "\\?" pathname-prefix strings
universally on that platform (only) to invoke enhanced APIs;
see 3.0 above.
For large/deep trees, you may run up against file path length limits. These
won't terminate mergeall or the GUI, but will manifest as error messages in
mergeall output that will continue to appear in later runs until addressed.
This often is the result of directory renames or moves, and is a filesystem
issue, not a program error—you also may not be able to do much with files
in such long paths in a file explorer, until you shorten the path by renaming files
or folders, moving items closer to the drive's root, or deleting parent directories.
That is the suggested policy and workaround for mergeall as well.
See Lessons-Learned.html for more on this issue.
Perhaps a given to some readers, but mergeall scans USB flash drives quicker
if you route the logfile (if one is requested) to a different device than the
USB drive being scanned—to your Desktop, for example—and copy it over to
the USB drive later if desired. Writing a logfile on the same USB drive being
scanned can slow down the scan by a factor of 3 or 4 in tests run, due to
the read/write combination.
Perhaps also obvious to some, but on Windows, pathnames denote connected drives
by device letter, and name shared network drives by volume syntax. Examples:
C:\folder\... --for a folder on your main drive (normally)
D:\folder\... --for a folder on your USB flashdrive (or other letter)
\\Computer\folder\... --for a shared folder on a computer in your network
Such path formats are passed to the main script as arguments, but are automatic
when selecting folders in the GUI. Other platforms use different naming schemes
(e.g., /dev/..., /mnt/...); see your system docs.
Also on this topic: drives shared on a Windows home network seem to be very
slow (often 35-50 times slower than recent USB drives) and tedious to set up,
but your mileage may vary. Private clouds may or may not be faster, but seem
likely to be bound by similar constraints imposed by network transmission
speed in general (and public clouds are loaded with tradeoffs: see the last
section of Whitepaper.html).
See also USB flashdrive and Internet speed comparisons in
Lessons-Learned.html.
Allowed for FAT32 file system's 2-second resolution/granularity of file
modification times, by replacing equality with a +/- 2 second range test;
else copies on the more accurate NTFS file system (and others) may register
a mismatch despite having identical content. Update: see
Lessons-Learned.html
for more on this issue.
Fixed encoding disagreement between mergeall subprocess streams and
launcher's Popen text mode auto-decoding, by using PYTHONIOENCODING and
locale module setting used by Popen; else aborts on Unicode exception in
stdlib when reading subproc's stdout lines for non-ASCII filename in report.
This was later revisited in version 1.4.
Please note: the following is now largely subsumed by the new User Guide
added in version 3.0, but is retained for any extra details or context it may provide.
For current information in using the source, app, and executable formats of mergeall, see
the User Guide's pointers, as well as
the main README file. This section covers the original and
still-available source-code format.
Download mergeall's zipfile here, and unpack on your
computer. Its unpacked content may be viewed either locally or
online.
This source-code version of the system requires just Python 3.X or 2.X.
Python 3.X is preferred for Unicode filenames, per version 1.6's release notes.
Python 3.5 or later (or a separate scandir() PyPI install) was recommended for speed on Windows and
Linux, per version 2.2's release notes, but no longer as of 3.0.
The system is known to run on
Windows 7,
Windows 8.1,
Windows 10, and
Linux, and, as of
version 3.0,
Mac OS X. Most usage to date has been on Windows
for archives of normal files and folders, though Mac and Linux have seen more action lately;
as of 3.0, Mac has emerged as both major enhancements source and platform of choice.
This program may be launched in 3 ways, from simplest to advanced:
Run launch-mergeall-GUI.pyw to run mergeall.py easily from a
desktop GUI.
Run launch-mergeall-Console.py to run mergeall.py with
interactive inputs.
Run mergeall.py directly via manual command lines in a
console window.
For more screenshots of modes 1 and 2, see Whitepaper.html.
For script usage details in mode 3, see mergeall.py's topmost docstring,
and the example sessions here.
This system began as a command-line-only tool with this file as its sole documentation in plain
text format, and was later extended with HTML documents. Largely due to this legacy,
you can find documentation for it in multiple places and forms:
UserGuide.html is the latest user guide, added in version 3.0.
Whitepaper.html is the original usage guide, with extra background.
Version history in this file logs project changes and assorted usage notes.
mergeall.py's docstring (among others) has implementation-focused details.
Please note: the following has grown redundant with the new
User Guide and original and
similarly dated
Whitepaper,
but is retained as an alternative (if now largely historical) overview.
mergeall.py, the main script, synchronizes a destination tree to be the
same as a source tree, by copying only differing and unique items in
the source to the destination, and pruning unique items in the destination.
This process is applied to both files and folders in the trees.
For speed, file differences are detected by checking only modification
times and sizes (with an optional limited content test), and all updates
are made in-place in the destination and limited to changed items only.
As of version 2.0, prior versions of changed items can be saved to a
backup folder automatically; as of 2.1, backups may be restored automatically.
This can be useful for both quick backups of changes made in large trees,
as well as one-way synchronization of multiple tree copies. In the former
role, a single run suffices to backup changed items. In the latter
role, multiple runs work to broadcast changes to multiple
copies—backup changes to an external device (e.g., USB flashdrive,
backup drive, or network drive), and propagate them from there to one
or more destination devices.
In selective/interactive mode, this system may also be used as a more
peer-level synchronization tool.
In the target use case (currently 73G space, 45k files, and 2.6K directories)
total runtime fell from 2 to 3 hours for a full copy and compare, to just
1 minute for a typical mergeall run with moderate changes on devices tested.
Running twice to leverage an intermediary device normally takes 5 minutes
or less.
The main script is command-line and console based, and runs in report-only,
automatic-update, and selective/interactive modes. The launcher scripts
simplify common usage modes by inputting settings in a shell
console
or a [tT]kinter
GUI,
and spawning the main script automatically. The GUI
launcher scrolls the main script's output in its main window, and
saves the output to a logfile on request.
All scripts in this system run on both Python 3.X and 2.X (and mergeall.py
works around a 2.X library issue regarding modtime digits). To date, this
system has been tested and used on Windows, Linux, and Mac OS X, on Python 3.5,
3.4, 3.3, and 2.7; other Pythons are likely supported, but await formal testing.
This is an extension to similar tools in the book
Programming Python, 4th Edition (PP4E),
from which the
cpall.py and diffall.py here
were borrowed and reused.
See code docstrings for open issues (TBDs) and shortcomings (CAVEATs),
and the first two items in the next major section's table for additional context.
Hints on making clickable desktop icons to launch this (and other) scripts
on Windows (this scheme is largely subsumed by the two launch-* scripts,
coded later in the project)