Mergeall — Backup and Sync Your Stuff Your Way |
Summary: | A content backup and propagation tool, and a manual but private alternative to cloud storage |
Version: | 3.3, October 16, 2022 (see all version history) |
Author: | © M. Lutz, 2014-2022, learning-python.com |
License: | Provided freely, but with no warranties of any kind (see also README.txt) |
Screenshots: | This program's GUI and scripts run on macOS, Windows, Linux, and Android |
Usage: | Download and start source code, Mac app, and Windows and Linux executables |
History: | Some of the code in this system originally appeared in the book PP4E |
This is Mergeall's usage guide, which provides user-focused documentation for the system. For finer-grained details on installs and download packages, also see the README file. For the complete story on recent changes, consult the revisions doc. If you're reading a local copy of this guide in a frozen executable, some referenced items may be missing; see the complete version online.
fixunicodedups.py
.
All users are encouraged to upgrade to 3.3.
deltas.py
script.
This script saves changes instead of applying them, so they can be
archived, or applied later on demand with the
mergeall.py
-restore
mode. Fetch the new package
here,
and see the new script's docs,
demo, and
client
for details.
This release is provisional, pending rollout to other
packages.
Welcome to Mergeall—a cross-platform and open-source program for managing backups and duplicate copies of the content stored on your computers. Mergeall quickly propagates changes in your content to other copies. When combined with an intermediate USB drive, this allows you to synchronize multiple devices on demand, both directly from full copies, and indirectly from change sets. Though manual, Mergeall syncs avoid platform limitations, the complexities and sloth of networking, and the change-conflict perils of automatic peer-to-peer syncs.
Mergeall is also software with a message. While it can be used as a simple but portable backup utility, its broader content-synchronization role provides a manual but free, offline, and completely private alternative to commercial cloud storage. With Mergeall, your stuff is your stuff, not someone else's point of control. If you're ready to take charge of your digital property, Mergeall is ready to help.
The Mergeall system includes a user-friendly GUI that runs merges automatically; a command-line script for ultimate control; and a console-based launcher that inputs details interactively. The Mergeall package also comes with extra related tools, including diffall to compare folders, and cpall to copy them. All tools in the package work on macOS, Windows, Linux, and Android (see the screenshots above and the update below), and are available as a Mac app, Windows and Linux executables, and portable source-code; for the latter only, you'll need a Python 3.X and its tkinter GUI library (more on installs ahead).
This document is Mergeall's main user guide. It covers usage fundamentals—what Mergeall does, how it is used, and its GUI—and runs down a collection of pointers designed to help you get the most out of the system. Beyond this guide, you'll find additional help resources in the Mergeall package: Revisions chronicles version history; the original (but now dated) Whitepaper provides more background on features and roles; the screenshots and run logs capture Mergeall in action; and the package's many README files describe its contents.
Before we jump into Mergeall's GUI, let's start with the what, how, and why basics of Mergeall use.
Plus Android: Mergeall's scripts and GUI are also now known to run on Android smartphones and other devices in source-code form. See the separate Android coverage for Mergeall's scripts and GUI, as well as the latter's Android screenshots. Android usage was hashed out after this doc and isn't included in all the platform lists or instructions here, but it's a valid Mergeall host too.
Footnote: Android 11 removed USB access for POSIX programs like Mergeall, so Mergeall can no longer be used directly in 11 and later. Instead, use the Android Deltas Sync package, which runs Mergeall to indirectly sync changes stored in a zipfile. However it's deployed, maintaining copies of your content on both your PCs and your phones is an arguably compelling Mergeall use case.
In short: this system makes a destination folder the same as a source folder quickly.
Folders are locations on your harddrive, SSD, USB flashdrive, or network drive, where you've stored your content (a.k.a. data) in named files. Folders are also known as directories, and sometimes called trees because they may have nested folders with additional content; when nested this way, Mergeall processes all the tree's subfolders as well.
Whatever they are called, Mergeall quickly makes an entire destination folder ("TO") the same as a source folder ("FROM"), by updating TO in-place only for items changed in FROM since the last run. It does this without having to read files in full, by:
The net result allows large data sets (a.k.a. archives) to be brought up to date much faster than brute-force copies or compares. This is sometimes called an "incremental backup," because it updates TO only for items changed in FROM since the last run. For instance, if only two items in the FROM source tree have been changed, only two items will be updated in TO, regardless of FROM's size.
In more tangible terms, if FROM is your photo archive's folder, Mergeall will copy just the photos you've recently added or edited, not the entire archive. The same goes for your movies, music, books, websites, and anything else you store in your computer's folders—only your latest changes need to be copied when you run Mergeall to update a TO folder. Depending on your computers and storage devices, this can shave an update's time from hours to just minutes or seconds.
Plus deltas:
Mergeall
3.2 added a new usage mode, implemented
by its new deltas.py
command-line script. In this new mode,
changes in FROM are saved to a separate folder (a "deltas set") instead of being
applied to TO immediately, and can be applied to TO later and on demand with
Mergeall's -restore
option.
This new deferred-sync mode can be useful both for archiving incremental changes (e.g., after burning a large content collection to optical disk), and applying changes indirectly (e.g., on a device with limited USB access). Android Deltas Sync, for example, uses this mode for on-phone syncs. This new mode isn't yet covered in this guide; see the new script's docstring for full details.
As just described, Mergeall does the job of quickly making a TO folder the same as a FROM. You can leverage this basic utility to process your content in two different modes:
Regardless of the computers you may work on, Mergeall can be used to backup data sets to archive devices quickly, because it updates the archive for changed items only. For example, you can use Mergeall in this mode to periodically save your changed content to a USB or network drive; after a Mergeall run, your external drive will be the same as the original. This mode is especially handy if some of the devices you use are USB flashdrives or other portable drives—Mergeall backup runs make quick copies of your content "to go."
If you work on multiple computers, Mergeall can also be used to echo data-set changes across all your computers quickly. When you want to update your other devices, simply run Mergeall twice: once to copy changes to an intermediate device such as a USB or network drive, and then again to propagate the changes from the intermediate device to other computers. This mode allows you to make changes on one device and keep others in synch if and when needed—your computers will all "mirror" the content at its original location. As a bonus, your intermediate devices serve as automatic backup copies.
In both these modes, you'll want to start with a complete copy of your data set initially,
which either Mergeall or its companion cpall.py
script can create.
From that point forward, though, your updates will be limited to just the items changed
since the latest Mergeall run. This not only makes your updates fast, it minimizes wear-and-tear
on your drives.
Though a lesser role, you can also use Mergeall's report-only mode to isolate differences between any two folders on your computer. If you need a reminder of what you've changed in a working folder, for instance, compare it against your stable folder with Mergeall sans updates, and apply the changes to other copies later as you wish.
Also keep on mind that Mergeall works on folders of any size and role. Whether it's a small folder of photos from a recent trip, or the folder containing everything you keep on your computer, Mergeall will quickly backup and synchronize its changed contents to other locations or devices at your request.
For more pointers on using Mergeall to mirror your content to multiple computers, be sure to watch for the additional coverage ahead.
Now that you know what Mergeall does, you may be asking yourself why you should bother with it, especially given the pervasiveness of other backup tools today. In brief, Mergeall can offer the advantages of portability, transparency, privacy, and speed, depending on how it is used:
Mergeall does work similar to other backup programs, but its portable code and open-source model can be strategic advantages. Portability means that you can use Mergeall on all major desktop platforms—Windows, Mac, and Linux. Mergeall handles these systems' differences, so you can copy and use your content seamlessly across all three. Open source means that you can audit and even change the program however you wish. Changes require some programming knowledge, of course, but the fact that you can read Mergeall's code means that it cannot do anything that you cannot know about—a crucial feature in a tool you must trust with your valuable data.
Mergeall provides an alternative to "cloud" storage, where your intermediate devices take the place of cloud servers, and Mergeall runs do the same work as cloud uploads and downloads. Mergeall requires quick-but-manual steps to update content copies, but Mergeall is free, you are not dependent on a cloud provider, and your data remains your private asset. Because you don't need to upload your content to a third-party's cloud, there is no risk of it being unavailable, subject to price increases, or covertly scanned by advertisers, governments, or worse. Moreover, when using Mergeall with local devices, it will usually run much quicker than an Internet-based cloud.
In both modes, Mergeall provides additional advantages we'll explore in this guide, including:
...to exclude files unwanted in cross-platform archives
| ...to propagate links on both Unix and Windows
| ...to remove folder path-length limits on Windows
| ...to allow you to undo any Mergeall change
| ...to safeguard against both computer and human mistakes
| |
As we'll see, such features help keep your content archives safe, robust, and portable.
In the end, content backup requires some diligence regardless of the software or model you use. The real differentiator today is whether you'll delegate control of this important task to an unknown and self-interested third party, or retain it yourself. Mergeall is dedicated to the notion that your stuff is more valuable than any program, company, or device—and important enough to remain yours.
For an arguably more-provocative look at the tradeoffs between manual merges and third-party cloud providers, see the original Whitepaper's coverage. Here, we'll leave the politics aside and focus on how to use Mergeall to manage your digital property on your own terms.
Mergeall's GUI—run by starting its app, executable, or source-code script
launch-mergeall-GUI.pyw
—provides
the simplest way to use the system. The GUI does not offer Mergeall's "selective-updates"
mode, a use case that is rarely employed and covered briefly
ahead, but
does support both report-only and automatic-update modes, and makes it easy to configure runs
and save and view log files.
If you're not sure how to start the GUI's program, check out its README file and the platform pointers ahead (spoiler: a click generally works). The GUI itself is straightforward enough that a test drive probably suffices for most users. As a more formal reference, though, the following is a quick rundown on GUI's widgets by its major sections:
A logistical note up front: the screenshots on the left below were captured on macOS, Windows 7, and Ubuntu Linux; click to view them full-size, and see the screenshots collection for more GUI captures. The last few here are shown only on a Mac for space, but you can find Windows and Linux equivalents in the screenshots folder. You can disable images in most browsers to go text-only, but they add context.
Recent updates: see also the GUI's off-page Android shots, and Mergeall 3.3's minor GUI changes. The screenshots below are still largely representative for all download packages.
|
This section explores the most common issues that you may come across in the data-archiving wild, and gives useful hints and advice along the way. Here are the topics we'll be looking at, each of which comes with a quick summary:
This section is also the majority of this document, and provides ample details that will help you use Mergeall well. In the end, Mergeall is more than just its GUI. By nature it relies on system-level properties, and employing it requires a bit of background on things like filesystems, cruft files, and change management. You don't need a PHD in these subjects, of course, but a basic understanding will help you avoid or resolve issues that may crop up.
That said, these notes are mostly self-contained, and some users may wish to pick and choose those most relevant to their goals. Symlinks, for example, may be best avoided (at all costs!), and people who work on just one platform can skim or skip some material here. If you prefer to jump right in to test-driving the system live, be sure to come back and take a second look here when you're ready for more details.
For additional pointers, see also the older (and now somewhat-dated and partially redundant) Whitepaper's list.
Short story: Mergeall allows you to customize its GUI, as well as some of its behavior, by changing simple assignments in a module file. Edit this file to tailor the program as you like.
In addition to the per-run option settings available in its GUI and command line,
some of Mergeall's appearance and behavior can optionally be customized by changing
assignments in the file
mergeall_configs.py
.
For instance, you can tailor the color, font, and initial size of the GUI's scrolled-messages text area; you can provide an initial value for the log-file popup toggle to save a click; and you can set the maximum number of changed-file backups that are retained in each TO destination folder (per ahead). In general, this file has settings that are unlikely to vary per run; others are options in the GUI or Mergeall command line.
As touched on
ahead, this file also has advanced cruft-file pattern
settings that most users can safely ignore (though if you know why you
may want to change these, you probably also know how). Hint: see also
pickcolor.py
in the docetc/Tools
folder for a simple color-chooser GUI you might find useful for GUI configuration.
Tip: be sure to save and restore your mergeall_configs.py
file (or changes you've
made to it) if you upgrade to a new version of Mergeall in the future;
because this file is located in the install's folder, it may otherwise be replaced.
Short story: if you use Mergeall to copy your content to multiple computers, be sure to limit your changes to one copy (the "golden" copy) at a time between Mergeall runs. This avoids problems that can arise if you change the same data differently.
This section provides some pointers for users who plan on using Mergeall to mirror copies of the same content to multiple computers. If this includes you, there's one guideline that's inherent in multiple-copy scenarios, and important enough to get straight up front: you should generally make changes in only one copy of your content at a time between Mergeall runs.
This isn't required, and may be overkill when your changes are few and trivial. Despite the small investment in discipline it requires, though, this guideline is also generally a very good idea: if you do make changes in multiple copies without mirroring their changes to others, you'll likely wind up either facing a major and manual synchronization job, or losing some changes altogether.
Luckily, this is simpler than it may sound. The way you'll arrange to limit changes to one copy at a time really boils down to where you'll keep your "golden" copy—the official, up-to-date, and changeable version of your archive. It may be kept on one device or many, but this choice more or less determines most of your data-archiving tasks. In brief, you might locate the golden copy:
In this scheme, your golden copy may be on a USB or network drive that all computers either access and change directly, or mirror copies from and to when they wish to make changes. Direct-access drives may not require mirrors, but all common drives can benefit from Mergeall backups (especially network drives, given their well-known reliability issues), and can take advantage of Mergeall's other assets, including its cruft-skipping tools (a topic covered ahead, of special relevance to multiple-platform users).
If you work mostly on one computer, you might locate your golden copy on the primary computer, run Mergeall to mirror read-only content copies to other computers to use but not modify, and run Mergeall to mirror updatable content copies to and from other computers when they are given temporary ownership to make changes. You'll also use Mergeall to backup the golden copy, wherever it may currently reside, and leverage Mergeall's other benefits to keep your content cross-platform.
If you regularly use many computers, you can locate the golden copy in a more ad-hoc fashion, with a rotating temporary ownership for changes granted to one device at a time. This is essentially the data-archiving equivalent of the aboriginal talking stick model: any device can be used, but only the device currently holding the "stick" is allowed to update the data, and others must wait until a peer-to-peer Mergeall run transfers the stick. You don't need an actual stick to use Mergeall, of course, but devices should take turns at updating data that's stored on all of them. Mergeall can also be used here to backup the golden copy from its current owner.
None of these ideas must be followed dogmatically, and there are additional variations we'll omit here. However you proceed, though, you'll likely spare yourself problems down the road if you keep track of where your up-to-date data lives.
As an example, Mergeall's proprietor locates content on a primary computer, and uses Mergeall both for increment backups, and to mirror copies to other devices as needed for either read-only access or temporary ownership for changes. Both backups and mirrors use USB drives, because home-networking drives proved too slow, unreliable, and unportable. Because there is a Mac in the device mix, it is also crucial to employ exFAT drives and cruft-file skipping for cross-platform use (more on these two options later).
As always, your mileage may vary. Some, for instance, also use Mergeall for full backups and mirrors, but more ad-hoc techniques when changes are few enough to be manageable with a simple staging folder. In the end, Mergeall is just a tool for quickly propagating changes; its role and scope are yours to decide.
Short story: your stuff (a.k.a. content) shows up under different pathnames on different platforms. Either use the GUI's Browse to find folders portably, or follow the naming patterns outlined here.
If you use Mergeall's GUI, the Browse buttons allow you to select your data sets' folders easily. If you're using Mergeall in command-line mode, or need to enter folder locations manually in the GUI for any reason, this section gives pointers on the syntax commonly used for folder locations (a.k.a. pathnames) on various platforms. It may also help you find your data in the GUI if you jump between multiple computers.
Let's assume that you've put all your content in a root folder called YOUR-STUFF
which you've placed at the top of all your drives to help minimize pathname lengths.
If this folder is stored on:
EXTREME128G
USB_Storage
on a router-based server named
readyshare.routerlogin.net
Local:
C:/YOUR-STUFFUSB:
D:/YOUR-STUFFNetwork:
//readyshare.routerlogin.net/USB_Storage/YOUR-STUFF
Local:
/YOUR-STUFFUSB:
/Volumes/EXTREME128G/YOUR-STUFFNetwork:
/Volumes/USB_Storage/YOUR-STUFF
Local:
/YOUR-STUFFUSB:
/media/<username>/EXTREME128G/YOUR-STUFFNetwork:
/media/readyshare/YOUR-STUFF (once mounted)
Disclaimer: your paths may vary. For instance, you can use either forward or
backward slashes (/
or \
) as separators on Windows when entering paths manually;
drive letters may differ on Windows if you have additional devices; network drives
may also be mapped to drive letters like Z:\
on Windows; Linux allows network
drives to be mounted anywhere; and network drive details and requirements can
vary more widely still (not to mention their reliability!). See your system's help
resources for more details if needed.
Also notice that the paths above assume that the root of your data folder is stored at the top of your drives, to minimize the length of your folder pathnames. This makes paths easier to read, but is usually no longer required as of Mergeall 3.0, which lifts pathname length limitations on Windows (see the details ahead), and is not even possible on macOS as of Catalina. If you instead locate your data in your per-user account folder, your local-drive paths may look like this:
Windows:
C:\Users\<username>\YOUR-STUFFmacOS:
/Users/<username>/YOUR-STUFFLinux:
/home/<username>/YOUR-STUFF
As always, explore your content on your devices for the full story.
Android paths:
if you wish to use Mergeall on Android, see also
this page
for pointers on that platform's file paths. In short, content in internal storage may be located
in shared (e.g., /sdcard
), app-specific (e.g., /sdcard/Android/data/app
),
or app-private (e.g., /data/data/app
) folders, and removable drives are identified
by numeric ID (e.g., /storage/25C9-1405
). Due to Android's long history of fragmentation,
these paths and their access rules can vary by version and vendor; USB drives, for example,
are no longer accessible to POSIX programs like Mergeall at their former /storage
mount points
in Android 11.
Short story: format external drives as exFAT to avoid FAT32's time-change problems. exFAT is built-in on Windows and macOS, and solves the issue for drives used on either or both. Linux requires an install for exFAT, and may benefit from a provided fixer script or other techniques.
To achieve its speed, Mergeall detects differences in files by checking their last-modified timestamps, instead of reading them byte-for-byte. This normally works well and allows Mergeall to compare archives very quickly, but it's also a dependency that can cause issues in some contexts.
For example, the FAT32 filesystem—born on Windows but supported everywhere, and commonly used by default on older portable devices like USB flashdrives—handles file last-modified times in a unique way that throws off comparisons to internal drives when your computer's clock is adjusted for Daylight Savings Time (DST). If you use a FAT32 drive with Mergeall, you'll need to adopt a policy to address this issue, or all your files will be recopied after DST rollovers.
And while this section focuses mainly on DST updates because they happen automatically, any time-zone change may trigger similar FAT32-to-internal mismatches. For content-management tools like Mergeall which rely on timestamp fidelity, FAT32 is truly an interoperability nightmare.
Without going into all the gritty details, modern filesystems used for internal drives on Windows, macOS, and Linux, record file times in UTC time, which is the absolute number of seconds since a fixed starting point in the past. Because these time values are standard and absolute, they compare correctly across all filesystems and platforms. By contrast, the older FAT32 filesystem records file times as local time; a file changed at 2PM records 2PM, not seconds since a fixed reference point (no, really).
The problem with this is that the two schemes' times may differ after adjusting for time zones or DST. In particular, the timestamps of files stored on a FAT32 external drive are prone to be skewed from those on internal drives that use UTC-based filesystems. Hence the drama—if you compare copies of your archive on USB sticks to copies on internal drives on a computer that's set to change its clock at DST rollover, timestamp-based programs like Mergeall may report all your files as different twice a year, even though you haven't touched them!
It's easy to verify this for yourself. First, copy some files from your computer's local drive to a FAT32-formatted external drive. Then, to trigger the time skew, either:
Time changes are a well-known source of problems on Windows. FAT32 DST rollovers don't impact the diffall program because it reads byte-for-byte (and is much slower as a result); but some programs can be derailed by them just like Mergeall (including software build and source-control systems), and other programs are sensitive to any automatic clock changes (including PyMailGUI, whose timer loop can hang indefinitely). FAT32's time-change issue may be a relic of computers past, but it's not going away anytime soon.
Luckily, this issue is easy to work around. If you will be using Mergeall in a way that makes DST rollover issues a possibility, you can address them with one of the following schemes, in roughly decreasingly recommended order:
fix-fat-dst-modtimes.py
script
at DST rollovers to quickly and automatically adjust last-modified
times on your archive copy's files by an hour. This script changes
file timestamps only, and doesn't rewrite any of your data. It's a fast fix
that can be applied to either your FAT32 drive or local drive to bring them
in synch twice a year as needed.
There's more coverage later on formatting drives with exFAT to work around the DST issue; it's presented in the context of cross-platform use, but applies to single-platform merges as well. The drive-formatting options above may be the best cure for DST tragedy, but they are not to be taken lightly—because formatting erases existing data, you'll want to either format drives up front, or recopy the drive's content from another copy after reformatting to use a new filesystem.
For tips on formatting your drives, try your computer's help resources or a web search. In brief: on Windows, right-click a drive's icon in Computer and select Format; on macOS, open Disk Utility in Launchpad and select your drive and Erase; and on Linux, right-click on the drive in Files and choose Format. We'll skip other techniques for space here, but add that Windows may not allow you to pick some FAT filesystems for large drives, though Mac will.
Also note that many external drives—especially larger ones—are shipped preformatted to use exFAT today, to leverage its features and portability; be sure to check your drive first to see if formatting is required.
This section's recommended fix—formatting external drives with exFAT in order to sidestep DST rollover issues—has now been proven definitively to work on Windows and macOS with no add-ons; and on Linux with the exFAT driver add-on described above. Specifically, the March 12, 2017 DST time change passed without making file timestamps out of synch between any internal and exFAT-formatted external drives, on any of these three platforms. That said, Linux can still be influenced anytime by changes in a system clock shared with Windows on dual-boot machines; fence hoppers beware.
Some systems, including some clouds, try to work around FAT32 time issues by assuming that a file changed exactly one hour (3,600 seconds) later or earlier hasn't really been changed at all, especially if its size is also unchanged. This seems a heuristic hack, that's a recipe for disaster—what if you do change a file in an hour in a way that doesn't make it larger or smaller? Worse, what if you discard the original, assuming the 3,600-second system has copied it elsewhere? For such reasons, Mergeall refuses to employ such "good enough" and "probably never" solutions; software is not supposed to be opinion-based, especially when your content is on the line.
It's worth noting that the FAT32 filesystem also records file
modification times with a limited two-second granularity—modtimes
are accurate only within a two-second range, because they save seconds as two-second
intervals. This can also throw comparisons off: files that are really the same
may look different due to their limited timestamp accuracy. In this case, though,
no user action is required, because Mergeall works around the problem automatically.
Although this also applies a heuristic, you're less likely to change and discard
a file just one second after Mergeall compares it. Check out the docstring formerly
in
mergeall.py
for the full story.
Both exFAT and FAT32 also have symlinks limitations only on Windows—as in, they cannot recognize or record symlinks at all!—but this is a rare and obscure type of file that is not a part of normal content, and most (if not all) Mergeall users do not need to care. If you may, though, see the coverage ahead.
As of late 2019, the exFAT implementation on Samsung Android devices is nonstandard enough to qualify as unusable for most Mergeall use cases; see the update at the end of this section. On these devices alone, FAT32 is generally your best interoperability bet for external drives. Update: this glitch was fixed in 2020, per the same updates list; exFAT is now the normally best removable-drive option on Samsung Android too.
Short story: ignore spurious diffall reports for older Excel files that you've viewed but haven't changed. Their content may differ in trivial and unimportant ways, even if their modified times are the same.
Also in the timestamp department: if you use a newer Microsoft Excel to open an older spreadsheet, you should be aware that Excel may change the file's content trivially, without updating the file's last-modified timestamp. This does not make the file register as a difference in the timestamp-based Mergeall, but it does in the byte-by-byte diffall. It's safe to simply ignore these files in diffall, as the Excel change is just metadata that doesn't have anything to do with your spreadsheet's content. Still, it's a special-case for archive tools—and arguably a bug in Excel's behavior!
Though exceedingly rare, it's also worth noting that other programs which change
file modtimes may also subvert file timestamp-based programs like Mergeall.
In particular, any program that copies over prior modtimes after changing content
may make changed files register as unchanged—and prevent Mergeall propagation.
This was the case with an initial design in
PyPhoto's thumbnail-file
generator, a tool which uses modtimes for its own change detection, but was fixed by
a later design that stored original modtimes separately from thumb files. Modtime
cheaters are rare (and no other instances have been seen or reported to date), but any
other cases are officially outside Mergeall's scope. For more on the PyPhoto use case,
see its file viewer_thumbs.py
available in its online
source code.
Short story: Mergeall can't remove or replace files that are marked as read-only, locked, or otherwise in-use. Change permissions and rerun Mergeall to synch these files.
Besides timestamps, Mergeall is also dependent on permission settings of files it must modify to bring your archive copies in synch. If a file to be updated or removed is marked as read-only, is locked, or is otherwise in use when Mergeall reaches it, it will likely fail to update, generate an error message in the Mergeall log file, and leave an unresolved difference to be addressed in future runs.
To spot permission-related errors, either:
To fix permission-related errors, either:
Either way, fixing permission failures may require a manual step, but Mergeall never removes read-only settings itself, because your data is your personal property. If you mark a file as read-only to protect it, Mergeall will respect your choice until you lift the restriction. Shouldn't every program?
Curiously, some macOS systems may automatically lock files that have not been edited for two weeks—which can all but guarantee future Mergeall update failures in some scenarios (propagating files changed elsewhere to the machine with locked files, for instance). This may have been in support of Apple's Time Machine backup system, and may not be present in all Macs, but seems pointless and extreme (its only rationale seems to be Mac auto-saves—another curious and extreme model). To disable automatic file locking, unclick the option in Time Machine's System Preference form, if present. For more information, run a web search.
Short story: Mergeall reports differences based on file modification times and sizes, and folder structure. diffall does a more-complete bytewise comparison that's much slower, but should be run periodically to verify your content.
Mergeall excels at comparing folder trees fast, but its results are only as
good as file last-modified timestamps, which, as we've seen in prior sections,
can sometimes be unreliable. If you want to be really
sure that an archive copy matches the original, run the included
diffall.py
script
to perform a byte-by-byte comparison of each file.
diffall compares full content instead of detecting last-modified timestamp mismatches. It also runs much slower—one large tree that compares in 8 seconds with Mergeall requires 12 minutes in diffall for a given set of drives. That's almost 100X slower, and an example of the reason Mergeall was written in the first place. Life is too short for brute force copies and compares.
Still, given the many ways that timestamps and storage devices can fail, it's recommended to run diffall on archive copies occasionally to verify that your content is truly in synch. So start a diffall, grab a coffee, and watch the bytes fly. Unlike Mergeall, diffall has no GUI, and is run only from a command line; see ahead for pointers on this mode.
Short story: errors happen—especially when using storage devices with limited lifespans. Be sure to check for error messages in Mergeall logs, and address as needed.
Speaking of verifying results: you should also generally verify Mergeall's actions by checking its saved log file for error messages. See above for notes on enabling log files. They report a brief summary at the end which normally suffices, but they may also contain error messages for operations that may have failed along the way.
When errors occur, you will usually see a line at the end of the "*Summary" section that looks like this:
**There are error messages in the log file above: see "**Error"When this message appears, search the saved log file for string "**Error" to find any updates-related error messages quickly. This is especially useful for isolating file read-only permission failures in a large archive's merge (more on these above).
For more fidelity, a quick Mergeall rerun in report-only mode can also verify success or pinpoint any files that failed to merge, and diffall can be used to verify results when you want a full content comparison per the preceding note. Mergeall may be automatic, but automation can and should only go so far, especially when handling your valuable content; human operators (i.e., you) should still expect to step in when things go wrong.
It's worth noting that Mergeall is clever enough to ignore some errors that are irrelevant to your
merge. On Macs, for example, hidden ._X
AppleDouble resource-fork
files (described ahead) may be automatically
removed with their X
data-fork (i.e., real) file counterparts. If a folder removal
gets to the data-fork file first, it ignores errors that may arise when trying to remove
a resource-fork file that has already been removed automatically (rare, but true!). A similar
error is skipped for auto-removed folders on Windows. Like others, both of these cases would be
cleared up by simply rerunning Mergeall, but you should check the log to be sure.
Short story: Mergeall's -backup
mode saves backup copies of every
file removed or replaced during a run, so you can restore them manually or
by automatic rollbacks. You want to do this, unless your drive is out of space,
or you're recopying an archive in full.
When running Mergeall in updates mode, always be sure to use its -backup
mode argument—and its corresponding toggle in the
GUI—to
save modified files, unless you are too tight on space to store backup copies.
If used, backups mode allows you to back out any of Mergeall's changes in the
future, and completely rollback a Mergeall run immediately after it finishes.
Backups mode keeps a copy of every file replaced or removed in the TO tree,
and notes all files added to TO. This saved data and additions list show up
in the TO tree's __bkp__
folder—Mergeall's equivalent of a recycle bin—and
is retained for a fixed number of runs. When available, this data can be
used to manually back out specific items, or cancel an entire run's changes
with the included rollback.py
script—indispensable
if you accidentally swap FROM for TO!
Because they record only items changed in the TO tree, backups are usually very small, and there's generally no reason to avoid them (they incur a minor speed hit to save changed data, but it's usually negligible). On the contrary, skipping backups mode means there is no way to undo Mergeall's changes—a sizeable risk, given the tendencies of computers, drives, and, yes, humans to fail. In fact, backups are so important that they are "on" by default in the GUI launcher, so no action is normally required. As a rule, they should be disabled only in the very unlikely event that your target drive has space for changes only.
One exception worth noting: you may want to not use backups mode
if you're allowing Mergeall to rewrite your archive in full twice a year
on DST rollover, as described earlier. Because this
scheme replaces every file in the archive, using backups means you'll
create a complete and redundant copy of your archive in its __bkp__
folder—and
may exceed your drive's space limits in the process. In all other use
cases, though, Mergeall's backups mode is strongly recommended.
For the full story on backups, see the original whitepaper's coverage of backups and rollbacks (a.k.a. restores). We'll revisit this topic in the usage cautions ahead, because it's one of the best things you can do to safeguard your content.
Short story: when backups are enabled, Mergeall's -restore
mode or
rollback.py
script can be used to back out all the changes made by the prior run. Use this in case
of catastrophic failure—if a drive dies, or you accidentally transpose your FROM and
TO trees, for example.
This pointer is an offshoot of the former, but it's worth calling out separately: if a run goes horribly awry, you can back out all its changes in a single step, by running a full rollback of the prior run's updates. This restores your archive to its former self, before the merge gone bad. Rollbacks can be kicked off in two ways:
rollback.py
script
via command-line or Windows click, and provide the directory path of your archive's
root (top-most) folder as either command-line argument, or console input when prompted.
-restore
command-line argument, though you'll
need to also locate and provide the latest backup's folder path manually on a command line.
Both rollback techniques assume that you haven't made any changes you wish to save since the merge being rolled back, and require that the prior run was made with backups enabled—yet another reason to do so. Assuming your archive qualifies, though, a rollback will put back items replaced or removed and erase items added, restoring the archive to its state before the latest Mergeall run.
It's even possible to rollback multiple Mergealls: simply delete the
most-recent backup (__bkp__
) folder after each rollback.py
run. This
allows you to completely reset all the content on a backup device to the
state it was in on a prior date, as long as that device was used only for
backups-enabled Mergealls since that date.
Rollbacks are an emergency measure, and shouldn't be performed lightly; you're always better off being careful with your selections when launching a data-changing tool like Mergeall. When needed, though, both piecemeal restores and full rollbacks from backups can set your archive right again.
For more details on how to invoke rollbacks, see the original whitepaper's in-depth coverage which we we'll omit here for space.
Deltas and restores:
as of Mergeall 3.2
in 2021, the -restore
rollback mode is also used to apply and undo changes
saved with the new
deltas.py
script, and rollbacks themselves can finally
be rolled back too. Use cases for the former are potentially many, though the latter may
be rare. See the new script for the latest info. Rollbacks work to apply delta sets because
these sets are saved in the same format as __bkp__
backup folders—though the
effect for deltas is really a "roll forward."
Short story: use the -skipcruft
mode to avoid propagating platform-dependent
metadata files to archive copies and other computers, and/or run a provided script to
remove such files on demand. The mode retains these files on their creating platform—and
only!
If you plan on using your archive on multiple platforms, you should also be aware that some are prone to create numerous files in your archive's folders, to store metadata used only by the platform on which these files are created. These items, including both simple files and complete folders, are usually small in size and are normally hidden by default. But they are also generally useless clutter—a.k.a. cruft—on other platforms, and can wreak havoc in many usage scenarios and programs. They can be especially problematic for data archiving tools like Mergeall which process files generically.
macOS is particularly notorious for generating cruft files alongside your content. As a very partial sample:
.DS_Store
file can be created on a Mac just by viewing a folder in Finder.
.TemporaryItems
folder might be left behind by file operations.
.com.apple.timemachine.donotpresent
pops up on drives excluded from backups.
.fseventsd
folder is added to manage file-system events.
._*
companion file may appear whenever a file is used on a non-Mac
drive—even if it's just opened.
.DS_Store
and a ._.DS_Store
companion!
This can easily get out of control; like them or not,
Macs can create so many cruft items that it can be difficult to see your content's actual files.
Nor is Mac the only offender. Windows and Linux computers can add cruft files too
(e.g., desktop.ini
folder-view options and Thumbs.db
icon caches on Windows),
though far less often than Macs. Installed programs may also create platform-specific
items—including Python's own .pyc
bytecode files, which always report
differences if compared between platforms, always trigger recompiles if copied
between platforms, and may be fairly labeled as undesirable cruft in
cross-device archives.
To see how big an issue cruft files may be, unhide them on your computer. On Windows, set your Folder View options to show hidden files in Explorer (or try its View tab if it has one). On Mac, run this in Terminal (and replace "TRUE" with "FALSE" to rehide files after you've recovered from the shock!):
defaults write com.apple.finder AppleShowAllFiles TRUE;killall Finder
Even with this trick, Mac's Finder still won't show you ._
companion files on non-Mac
drives (or .DS_Store
files as of Sierra), but a ls -a
command in its Terminal will. If you are familiar with programming, a Python
os.listdir()
run at its interactive prompt on any platform shows all files
too—hidden or not.
While some casual users may safely keep hidden cruft files hidden and ignore this issue altogether, anyone who produces content on computers will likely need to care about these files' presence eventually. If your job title or hobby includes uploading websites, packaging programs for release, writing file-processing tools, or exchanging any sort of data in any sort of way, phantom cruft files can be a major nuisance.
Luckily, Mergeall provides two options to avoid duplicating these files to other copies and computers, as the next section explains.
If you work on just one platform, you may not need to care about any of these items—they serve roles (e.g., Mac companion files simulate Mac filesystems), might be useful to include in your archives, and probably shouldn't be blindly deleted in any event. If you use your archive on multiple platforms, though, you may not want such files and folders to clutter your content. To prevent cruft from showing up in your archives, you can do either of the following:
nuke-cruft-files.py
to delete
all these items from an archive copy on demand. You can run this
to clean up on the platform that creates these files; on an intermediate
transfer device (e.g., a USB or network drive); or on the platform
to which you're mirroring the archive. In any of these modes,
this script provides a brute-force but manual way to purge cruft
from your archive copies. See it's code for more on its operation,
as well as more about cruft in general.
-skipcruft
option, available in the
mergeall.py
,
diffall.py
, and
cpall.py
programs,
to ignore these items entirely in both difference reports
and updates (the related ziptools
system supports this flag too). This option automatically keeps platform-specific
cruft items out of your archive copies, except on the platform on which they are
created. Mac cruft stays on the Mac, and ditto for Windows and Linux.
The first of these options—the script—can also be used to create
an initial cruft-free archive copy, and to clean a folder accessed on a Mac
but never the subject of a Mergeall (for a graphic example of why this might be
useful, see your Windows USB or network drives with hidden files visible after
a Mac session).
The second option, -skipcruft
, is more automatic, and is supported by all
three of the Mergeall system's main programs as follows:
mergeall.py
:
The -skipcruft
command-line option—and its corresponding toggle in the
GUI—ignores
cruft files and folders in both the FROM and TO folder trees.
This means that cruft items:
In other words, this option allows you to avoid both copying metadata files to TO if absent, and removing or replacing them on TO if already present. For example, a Mac's cruft is neither deleted when it is TO, nor copied to other drives when it is FROM. It stays on the Mac, but isn't copied to drives or computers where it's irrelevant
The net effect is that all your archive copies still wind up the same after merges, except for their unique cruft items, which are allowed to vary on each device. When used for all your merges, platform-specific items remain on the creating platform, but are not transferred to other copies or computers. That's ideal for multiple-computer users.
The -skipcruft
option may slow Mergeall runs slightly,
but not enough to be a concern. As one metric, a large archive of 98G space and 60k
files generally compares with and without cruft skipping in 3.8 and 1.8 seconds on
a fast computer, respectively; and in 10 and 7 seconds on a moderately fast computer,
respectively—a trivial 2- or 3-second penalty. In exchange, you can focus on your
actual content, instead of dealing with the union of all your platforms' cruft.
diffall.py
:
The -skipcruft
command-line option ignores cruft files and folders in both trees,
so they won't be reported as differences. This works much like Mergeall's
comparison reports, and has similar benefits: you can focus on your content,
not system cruft. This option has no noticeable impact on speed in diffall,
because the script spends most of its time reading data in full.
cpall.py
:
The -skipcruft
command-line option ignores cruft files and folders in the source tree,
thereby preventing them from being copied to the destination. This is similar
to what some copy/paste and drag-and-drop copiers do, but it's a switchable
option in cpall. The speed overhead of this option is also irrelevant here,
as the time needed to write files overshadows all else.
Cruft filename patterns are defined in the
mergeall_configs.py
file described
earlier. You can tailor
them if needed, but the "factory presets" include common cruft file names on Mac,
Windows, and Linux, and most users can safely use the definitions as shipped.
For instance, the Python bytecode preset means it's skipped in both FROM (so
it's never copied) and TO (so it's never removed).
To wrap up, keep in mind that you might not want to
skip cruft files in your archives if you work on a single platform—and
if this is your story and you can't imagine why you should care, you probably don't need to.
For such users, merges without the -skipcruft
option still treat cruft files
like any other, copying them to and from other drives and computers, and reporting and
synchronizing them when they differ. In this mode, what you save is what you'll get;
Mergeall doesn't treat cruft files specially, but your platform may (if you use a
Mac, see the upcoming pointer on resource forks).
On the other hand, most people who use—or may someday use—an archive copy on
multiple platforms are likely to care about their content being corrupted with system files
which serve no purpose on other platforms, may or may not be hidden in other programs,
and can seem downright rude. Especially for users in this category,
-skipcruft
is generally recommended for content portability.
It keeps your archive copies free of both files that naturally vary per platform,
and unfortunate artifacts of proprietary engineering choices.
For a more technical look at cruft handling, see also
its new feature summary in the Whitepaper;
the cruft-skipping examples sketched in mergeall_configs.py
's comments;
and the run logs available in the test folder (the
HTML files there provide the quickest look).
Cruft also rears its head in the ziptools package developed for
Mergeall testing but available separately, as well as website
generation and
upload scripts.
And for more on how to leverage cruft-skipping, read on to the
next pointer's coverage of Mergeall cross-platform usage patterns.
One final crufty hint: Mac users may also wish to turn off the
auto-save mode of common apps, which writes files whenever they
are opened and closed—and updates their last-modified timestamps in the process,
guaranteeing a possibly pointless Mergeall copy, and usually useless ._*
cruft files
on non-Mac drives. System Preference's
"Ask to keep changes when closing documents" or Terminal commands of the following
form may do the job
(see more details here
and here):
defaults write com.apple.Preview ApplePersistence -bool no defaults write com.apple.TextEdit ApplePersistence -bool no defaults write com.apple.TextEdit AutosavingDelay -int 0
Caveat: you'll need to save your files yourself after doing this, but in-place auto-save is a controversial and dubious feature in the first place (why would a text editor automatically overwrite files with experimental edits without your consent?), and its impact on backup and archiving tools seems unfortunate at best. Luckily, Mac users seem to have convinced their vendor to make it optional.
As of macOS Sierra (10.12), setting your defaults to display hidden files as
described above still works as before, but Finder has been special-cased to
never display .DS_Store
files. That is, the .DS_Store
files are still there
(and can be seen via a ls -a
in Terminal, an os.listdir()
in Python,
or a view on Windows, Linux, or Android),
but Finder will no longer show them to you—even if you ask it to.
There's more discussion on this
here.
This seems almost antagonistic towards content producers, who need to care about
all the files in their folders. Hopefully, macOS will someday find a better way
to store Finder metadata than dumping it all over our drives and pretending it's not
there, but this policy is still in force as of High Sierra (10.13).
As it stands, most macOS users are left to puzzle over why the act of viewing a folder
is enough to change its modtime.
Short story: this section ties together prior topics in a generally recommended
model. To simplify using your content on multiple platforms, format external drives as
exFAT where possible, and keep them cruft-free by always using -skipcruft
on all platforms.
You probably also want to run a script to adjust nonportable filenames before transfers
from Unix, per one of this section's updates.
As we've seen, Mergeall works portably on Windows, macOS, and Linux. We've also seen that there are different ways to use Mergeall when working on multiple computers, which we won't rehash here. If you work on computers with multiple operating-system platforms, though, and use external USB drives in one of the multiple-computer usage patterns described earlier, then you will probably want to:
Making a smart choice on filesystems turns out to be complicated but crucial in a multiple-platform scenario. In short, FAT32 is still the gold standard in portability, and is supported by nearly every device out there with a USB port. On the other hand, the newer exFAT is almost as portable, and completely eliminates FAT32's timestamp-skew issues on DST rollovers discussed earlier on both Windows and macOS, though Linux users must enable support, and may still need to address some timestamp skew with a procedural solution.
In a bit more detail, filesystem choice is simpler when you're running on just one platform. Each platform provides a set of native—if often proprietary—options, including FAT32, exFAT and NTFS on Windows; HFS+ (a.k.a. macOS Extended) on Mac; and the ext variants on Linux. Unfortunately, most of these are off the table when you go cross-platform. Linux, for example, lacks exFAT out of the box; Mac's NTFS support is just read-only as shipped; and Windows does neither Mac's HFS+ nor Linux's ext by itself.
FAT32 and exFAT are the exceptions to these rules. Of these, FAT32 is the most widely supported filesystem across Mergeall's platforms today. In fact, it's currently the only direct option for full portability that does not require unsupported switches or third-party drivers. FAT32 may perform less optimally than some alternatives in some contexts, but the difference is probably trivial for most users.
The newer exFAT is almost as portable as FAT32, but not quite. It is supported natively on both Windows and Mac without any extra steps, but must currently be enabled on Linux with an additional install. For the latter, a command-line like the following suffices on modern Ubuntu Linux platforms, after which exFAT drives mount in read/write mode automatically (see the web for other options if this doesn't work for you):
sudo apt-get install exfat-fuse exfat-utils
But wait—if FAT32 if most portable, why not always use it everywhere? In a word, timestamps. As noted earlier, FAT32 records them as a "local time" which triggers spurious differences in tools like Mergeall and others when time zone or DST changes kick in. The exFAT filesystem records timestamps using UTC standard time, which sidesteps the nasty comparison issues of FAT32 altogether on both Windows and Mac—your files will just continue to synch normally after the system adjusts your time at DST changes. exFAT also supports larger files than FAT32 (per earlier), though this is incidental to Mergeall's synchronization.
The only significant catch here: Linux exFAT support isn't quite as complete as it is on Windows and Mac, and may do no better in some contexts than FAT32. Its currently available version may survive DST rollovers, but might not adjust on time zone changes, and might not record UTC times on file writes. For the full story, try this page's bug report, these field notes, or a general web search.
The upside is that Linux can be addressed with one of the more custom approaches
described earlier; the fix-fat-dst-modtimes
script,
for example, can be used to bring timestamps back in sync with a local Linux drive
if needed.
If the exFAT story on Linux seems too iffy, you can also resort to FAT32 drives;
if so, a timedatectl set-local-rtc 1
may help if drive times seem radically off
(details). Neither exFAT nor
FAT32 support symlinks on Windows, but this is likely a factor to few (if any)
Mergeall users; see the details ahead.
In sum: exFAT is your best option if you'll be using Mergeall to manage a drive on Windows, Macs, or both—it's an automatic DST fix for both single- and multiple-platform users. exFAT is recommended if Linux will be in your platform mix too, but you may want to either run merges to a shared partition on the Windows side only; use one of the other fixes described earlier to manage timestamps on a local Linux drive; or lobby Linux developers to get past exFAT's patent issues and make it a first-class filesystem citizen. (Update: the exFAT solution has now been proven to work on Windows, macOS, and Linux, by the March, 2017 DST rollover; see above.)
To keep your archives portable, allow your computers to generate as much cruft as
they wish, but keep your external drive cruft-free. The first step in this
scheme is making archive copies on your drives.
To create an initial cruft-free copy for your external drives from another
copy, use either Mergeall or the cpall.py
script
with the -skipcruft
option. To decruft an existing copy on an external drive,
use the nuke-cruft-files.py
script described
earlier. To verify your copies, run Mergeall's
report mode and/or diffall.py
's bytewise compare,
again with -skipcruft
in both.
-skipcruft
for local drives:
To keep your external drive copies cruft-free, always use the -skipcruft
option
(and its toggle in Mergeall's GUI)
when reporting or transferring changes made on any
computer's local drive, per the prior section.
For reports, this avoids treating the local drive's cruft files as differences.
For transfers, it ensures that platform-specific items will be retained on the
creating computer, but kept off external drives, and thus never propagated to other
computers. More specifically:
The combination of these two means that cruft will remain on platforms where it is used, but won't be propagated to platforms where it is pointless, and won't accumulate in your external copies. Each computer will retain just the cruft that is created by that computer: Mac cruft will never be mirrored through external drives to Windows, and vice versa.
While there are many ways to use Mergeall (and advice for network drive users may vary), these guidelines allow each platform to use its own proprietary files normally, while your external drives remain free of platform-specific additions. That is, your content will retain just your actual content, and will not include the union of each platform's clutter. Yours will be a data-archiving world blissfully ignorant of proprietary platform quirks, designed, perhaps, to rope you in to a single vendor's offerings—except, of course, for other oddments such as end-lines and file pathname syntax which are beyond this note's scope.
The filesystem Tower of Babel recently grew a new floor. As this was being written, Apple announced a new Apple File System (APFS) which is optimized for flash storage, and poised to subsume filesystems on multiple Apple products, including HFS+ on macOS (a.k.a. Mac OS, whose "X" suffix has also been strangely deprecated). Though the future remains to be written, this seems likely destined to be as proprietary as other single-platform filesystems: using it for your external drives may lock you into an Apple-only world. When in doubt, choose a portable filesystem like exFAT. (Update Update: APFS is now the default filesystem as of macOS High Sierra 10.13, and is even mandatory for flash-based system drives; engineers love to change things—and some seem to enjoy imposing them too)
This guide recommends using the exFAT filesystem on external drives to avoid FAT32's time-change problems. Regrettably, some devices do not implement exFAT correctly, and may still have interoperability issues when comparing content on exFAT drives to others. Notably, the implementation of exFAT on Samsung Android devices is nonstandard, and cannot generally be used for Mergeall as of late 2019. For more details, see the case study of Samsung's exFAT problem at this page. Until Samsung resolves this, FAT32 is recommended for external drives on this platform alone. Which leads us to the next update...
As of 2020, Samsung has fixed the prior update's exFAT glitch in its version of Android 10, which makes its exFAT fully usable for Mergeall syncs; Linux has added fledgling exFAT support to its kernel software, which may make it more widely available on PCs and smartphones; and Android 11 has revoked general access to USB drives from POSIX programs like those coded in Python (including Mergeall), though a procedural work-around is emerging. As usual, the world forges ahead in two steps forward and one step back.
Filenames containing characters like "?" and "|" cannot be saved in some context, including Windows, some Androids' shared storage, and FAT32, exFAT, and BDR drives. Some tools skip such files, and other adjust such names so they can be stored, but the adjustments can cause both file overwrites, and sync failures when filenames no longer match the source. More subtly, some tools may also interpret backslashes legal in Unix filenames as path separators on Windows, and create unintended folders.
To avoid such interoperability issues, you should generally run the new
and included utility script,
fix-nonportable-filenames.py
,
before transferring content from Unix-like platforms to any context which
imposes these filename restrictions. See the script's in-file docs for more
details. Mergeall assumes that filenames in FROM and TO have been adjusted
as needed to match during syncs, and survive saves on TO;
running the fixer script satisfies both these rules.
See also the related coverage of backslashes and filenames in the embedded ziptools system; Mergeall doesn't mangle nonportable names like ziptools, but fails on and skips them. The newer Android Deltas Sync package also takes a look at the consequences of not running the fixer script. Filenames remain a curiously persistent interoperability issue, in the age of Unicode filenames; Windows somehow manages to support emojis, but not a question mark.
The Unicode standard oddly allows the same text to be represented with different code-point sequences. This poses yet another interoperability hurdle, which can wreak havoc with programs that match filenames. As of late 2021, Mergeall 3.3 accommodates this by normalizing Unicode filenames for comparisons, to avoid mismatches and skew. For more details, see the 3.3 module and release note. In addition, Android 11 shared storage has a bug (at least on some devices and in some contexts) which precludes writing files whose names use one of the alternative Unicode representations (composed form, e.g., NFC). For more details and a work-around, see the converter script. Alas, "Uni" does not necessarily mean "one."
Short story: as a cross-platform tool, Mergeall processes the platform-neutral data portion of files, and ignores the proprietary and normally optional "resource fork" extension that has meaning only on a Mac. You may not need to care, but this section spells out the tradeoffs.
Mergeall works well on macOS (in fact, this is the primary platform of Mergeall's proprietor), but it has an intentional cross-platform focus. Because it aims to provide the same functionality on Windows, Linux, and Macs, Mergeall deals in concepts common to all three, and may not support some platform-specific paradigms as directly as some single-platform tools. Although the cruft-skipping tools described above are recommended on the Mac too (and Mac is by far the biggest cruft offender), some cruft-related Mac scenarios merit a few extra words.
If you make use of some of the Mac's many unique filesystem features, you may be interested to know that Mergeall primarily processes Mac "data forks" only—the normal bytes used to store content by name, that users of Windows, Linux, and almost every other computing system ever created would call the "file." Although this is deliberate, it may impact some Mac users in two ways. Namely, Mergeall:
If you don't know what these forks are all about, check out the watercooler-level
overview, and other
resources.
In short, Mac's native filesystem can represent files in two parts—data
and resource—called forks. Actual content (e.g., the bytes of an image or
text of a memo) is stored in the data fork, and extra metadata (e.g., icons or
last cursor position) can be stored in the resource fork instead of another file.
Resource forks are part of a "file" on Mac drives, but are not accessible to most
normal file interfaces and tools, and may show up as separate
._*
files on non-Mac drives (discussed earlier).
In effect, the Mac's resource forks are a non-standard and proprietary extension to a file's main data, which are a legacy of computers past, have meaning only on Macs, and are not meant for storing data crucial to your digital property. They have also been somewhat subsumed by more recent structures like extended attributes and application bundles and are not used by many programs or files today. In fact, Office appears to create empty resource forks on Macs just for historical reasons. Still, resource forks can cause confusion for cross-platform users if present—especially in the context of mirroring files across machines and drives, which is Mergeall's domain.
The good news is that you probably don't need to care. If you (like Mergeall's developer) use tools on the Mac that create platform-neutral files, you can probably stop reading this note now, and use Mergeall as it is intended. The files that truly contain your images, text, music, Office documents, ebooks, web pages, and other content will be copied to and from your drives by Mergeall as advertised and expected.
For more Mac-centric users, though, Mergeall's behavior can pose tradeoffs that are worth discussing up front. For one thing, you may lose resource-fork attributes associated with some files, but these are Mac-only extensions that are meaningless elsewhere; do not contain the actual content of your files; and are rarely important and generally trivial metadata that can be recreated the next time you use a file on a Mac.
As another consequence, Mac ._*
AppleDouble
resource-fork files created by Mac apps or Finder on a non-Mac external drive will be either:
-skipcruft
option
-skipcruft
In the latter case, Mergeall will delete ._*
files if they are unique in its
TO folder, and will copy them verbatim if they are in FROM—without
merging them with corresponding data fork files on Mac drives.
That is, files that were split by Mac programs into two parts (data + resource) on a non-Mac
drive remain in two parts if copied back to a Mac drive by
resource-agnostic
programs like Mergeall.
Luckily, this is a rare scenario: ._*
files show up only on drives using a non-Mac filesystem;
Mergeall never creates such files itself (as mentioned, on Mac drives it normally processes
data forks only); and most resource forks can be safely ignored in any event. Still,
Mac-only users facing the prospect of lost resource forks when files are round-tripped
to and from non-Mac drives may wish to either:
._*
files altogether
dot_clean -m
command in Terminal to merge resource fork files back to data files after they
are copied to Mac drives
If that sounds like an extra hassle, it is, but it's an unavoidable consequence of the Mac's special-case dual-file format on non-Mac drives. Still, most users can safely ignore resource forks completely, and skip them with Mergeall automatically. For programmers interested in more details on this front, check out this session log that demos the main concepts.
Short story: if your archive contains any symbolic links, Mergeall copies the links themselves, not the files or directories they reference. This avoids creating duplicate copies of content when it's both stored and referenced by links, though symlinks by nature also have major portability constraints that may be better addressed with ziptools in cross-platform content.
Symbolic links (a.k.a. "symlinks") are by-name reference points to other files and folders, which are usually followed automatically, and are more common on Unix systems like macOS and Linux than on Windows, due in part to usability constraints on the latter. (Android has symlinks too, but it has rules that vary by storage type covered separately here.)
Symlinks are also relatively rare. In fact, if you've never heard of them, chances are good that you can skip the rest of this note—and you'll probably be able to sleep better if you do; symlinks are a thorny topic of interest mostly to advanced users accustomed to thorny topics.
Even if you have heard of them, these links are generally discouraged in a tree managed by Mergeall: your archives are better populated with actual content data, not links between locations that may become invalid when items are renamed or moved. Moreover, symlinks cannot generally be used across multiple platforms, due to path-syntax and filesystem-support constraints; symlinks created on Unix are usually best kept on Unix, and ditto for Windows.
That being said, some types of content make use of symbolic links to avoid data repetition, especially in the Unix development world. Mac app bundles, for example, commonly use both links and links to links, and enough of each to confuse many an intrepid code explorer. If your archive does contain such links, Mergeall supports them on both Unix and Windows, and is careful to always compare and copy the link itself—the pathname to the referenced item—instead of the file or folder to which the link ultimately refers.
This is intentional: it avoids making duplicate copies of files and folders that both reside in an archive and are referenced from links within it. If links were followed instead of copied, such duplicates could multiply your storage space requirements arbitrarily: for 1 item and N links to it, you'd wind up with 1 + N copies of the item—and wipe out your symlinks in the process. This means you can't use symlinks to trick Mergeall into copying items external to your archive, but you can always copy such items yourself, and symlinks themselves record important structural information that should be retained in data archives.
The only real downside of copying symlinks instead of following them is the constraints that this policy comes with. In short, only intra-archive links relative to the archive itself will survive relocation. Here are the specifics:
Your links should not generally reference items outside the archive's tree, because those items may not be present in a copy on another computer. An out-of-archive file referenced by a link, for example, will not be copied by Mergeall, and won't be part of your archive. If the file is absent where another archive copy is used, the link will be broken.
Your links' paths should usually be relative to the archive itself. For instance, they should start with "." (current folder), ".." (parent folder), or the name of a file or subfolder in the link's own folder. They should not use absolute pathnames that begin with "C:" on Windows or "/" on Unix, because those paths may not be valid in other copies stored on other drives or computers.
The prior section's rules are aimed at making links transportable with the rest of your archive. Perhaps more fundamentally, though, symlinks impose major constraints on portability that link-aware, cross-platform programs like Mergeall reluctantly inherit:
Symlinks work on both Unix (e.g., Mac and Linux) and all recent Windows under Python 3.X, but only on Unix under Python 2.X, and Windows symlink support isn't complete until Python 3.3. In other words, if your archives have symlinks, they will work on Unix in any Python, but require 3.3 or later to be updated on Windows. Windows itself doesn't support symlinks until Vista, so even with Python 3.X, symlinks on XP are right out.
Your link updates won't work at all on Windows without either escalated permissions
or special set-up tasks.
To create symlinks on Windows, for example, you can launch a Command Prompt window with
a right-click to select "Run as administrator" privileges, and run the
mergeall.py
script there using a command line.
If you installed Mergeall and its GUI as
a self-contained executable, you may need to launch it the same way to make your link
updates work. There are other ways to run programs with administrator permissions which
we'll omit here for space; see your system resources for more details.
Windows 10 relaxes these rules somewhat, per this blog note—though this still requires special "Developer Mode" software and the initial blessing of an admin user, and hardly constitutes Unix symlink compatibility, given the path and filesystem interoperability issues up next. It may be slightly easier to make symlinks on Windows 10, but don't expect to copy them to or from a Unix box any time soon.
Windows 11 symlinks: Windows 11 further relaxes the rules for symlinks. Some of this came online in later releases of Windows 10, but this guide is not a history book. In 11, you no longer need to use "Run as administrator" or obtain special permissions. Instead, you need only enable Developer Mode in Settings => Privacy & security => For developers. Once you do, symlinks can be made from the shell and Python in both native Windows and WSL. This allows Mergeall and its nested ziptools cousin to propagate symlinks to Windows, but with caveats: link-path syntax and non-NTFS filesystems like exFAT are still perilous, per the next sections.
Your links will not be portable between Unix and Windows if their destination paths contain any path-separator characters, or harbor other platform-specific syntax. Such links will always work on similar platforms, but fail on the other side of the Unix/Windows fence. For instance, a Windows link path with any "\" separators won't work on Unix, and a Unix link path with any "/" won't work on Windows. Similarly, Windows-only drive letters are Unix showstoppers, and absolute paths are generally nonportable as noted earlier.
Mergeall cannot automatically compensate for such differences, because it's impossible to know all the places where your archive copy may ultimately be used (a USB drive, for example, might be later plugged into Windows, Unix, or both). Even so, this is probably a moot point for most users: you probably won't be able to throw your links over that cross-platform fence in the first place, for reasons the next and final point explains.
On macOS Unix, symlinks can be created and used on drives formatted with the cross-platform exFAT and FAT32 filesystems. Windows, however, supports symlinks (a.k.a. "soft links") only on drives formatted to use its NTFS filesystem, per both testing and MSDN pages here and here. Though path syntax and other issues described above make symlinks unlikely candidates for cross-platform use, filesystem constraints may pose an absolute catch-22 for some people working on multiple platforms.
Notably, the exFAT and FAT32 portable drive formats are not an option for transporting symlinks between Unix and Windows, and NTFS is a one-way trip to nowhere on macOS:
Windows won't recognize symlinks created by Unix on exFAT or FAT32 drives (they show up as non-link files with link-description text), and won't be able to create new symlinks on such drives to be used on either platform.
macOS won't recognize symlinks created by Windows on NTFS drives (they show up as zero-length non-link files), and its read-only support for NTFS noted earlier precludes creating symlinks on such drives.
For users of portable drives, the combination of these two platforms' policies and implementations renders symlinks even more nonportable. And much like path syntax, even if symlinks retained their content across platforms, automatic conversions in Mergeall would be ruled out by the fact that an archive copy may be used on either platform—or both.
Nor are external drives the only factor here: the platforms themselves record symlinks in proprietary forms. In testing on shared network drives, for example, links made by Windows on NTFS drives were not recognized by Windows or Mac; links made by Mac on exFAT and FAT were not recognized by Windows; and permission issues cropped up regularly. Your networks' file interfaces may vary, and third-party filesystem drivers may lift some constraints, but further options seem limited for most users.
Naturally, this issue concerns cross-platform users only, and symlinks' path syntax may negate their portability before they are ever written to drives. To be sure, you can still use symlinks on the platform that created them—Windows symlinks work on Windows as long as they are stored on NTFS drives, and macOS symlinks work on macOS as long as you save them on non-NTFS drives. For users hoping to use their archives on multiple platforms, however, symlinks are interoperability's end.
The silver lining for Mergeall users here is that symlinks created on Unix will generally survive archive round-trips to and from Windows. A Mergeall run on Windows from an exFAT drive will treat any Unix symlinks in the archive as simple files, and drop their symlink type. Still, because there is no reason to modify these files on Windows (they are broken links there, after all), they won't be recopied back to the intermediate drive as simple files by later merges, and thus won't overwrite the originals on Unix.
In more gory detail: Unix symlinks propagated to Windows on an exFAT drive will be seen by Windows as simple files for both exFAT-to-Windows and Windows-to-exFAT merges, but as unchanged links by Unix when merging from exFAT back to Unix. The net result is that they will be left intact back on Unix, as long as they are unchanged on Windows (and there is no reason to change them there).
Symlinks created on Windows may fare worse: they can't be added to exFAT or FAT32 drives in the first place, making them non-archivable data without NTFS, which, as noted, falls short on Macs. On the other hand, symlinks are very rare on Windows; their relative newness and history of permission requirements virtually guarantee their absence in personal content archives. Thus, for the vast majority of users, exFAT still remains the recommended best external-drive option for cross-platform archives maintained by Mergeall.
Finally, if you want to be really sure that your links survive round-trips between multiple platforms, you can always zip their enclosing folders for transport with the ziptools package described ahead in this section. This package allows you to zip symlinks for transit to negate platform and filesystem skew, and unzip them if and where they are used. It also translates link path-separator syntax, and allows you to avoid any spurious symlink differences that may be reported by Mergeall runs. When in doubt, zip and unzip links in your archive to make their content visible on a "need to know" basis.
As you can see, the rules of engagement for symlinks are complex, and symlinks may be best avoided in most content archives, especially for cross-platform use. To summarize, Mergeall:
Thorny, indeed; but if you are bold enough to use conforming links on your platform of choice, they will happily redirect accesses in all your Mergeall archive copies.
But wait: if you really need to copy your symlinks between Windows and Unix portably, all is not lost. You can still do so by zipping and unzipping them using the ziptools system—a complementary tool which stores files, folders, and symlinks in zipfiles, and is both shipped with Mergeall in its test/ziptools folder and available separately.
Like Mergeall, ziptools by default copies links, not the items they reference, to avoid duplicate content. Because of its very different goals and purpose, though, ziptools has some strategic advantages when it comes to symlinks:
Unlike Mergeall's incremental file-by-file approach, ziptools stores symlinks in a platform-neutral format within a single, generic file copied as a whole—which makes it immune from both representation and filesystem portability concerns.
Unlike Mergeall's copy-once-use-anywhere model, ziptools by default assumes you'll use data only where you unzip it—which frees it to translate link-path syntax portably for the target platform (Windows "\" is translated to "/" when unzipping on Unix, and vice-versa).
Though this requires zipping and unzipping steps, and is subject to some of the other constraints outlined above, the net effect allows you to transfer symlinks cross-platform on exFAT drives, and makes symlinks almost completely portable between Unix and Windows. As a bonus, ziptools can also follow symlinks instead of copying them; in this mode, links are replaced with the items they reference, yielding self-contained (if potentially redundant) archives.
See ziptools' guide for more details—especially its in-depth coverage of symlink support by platforms and Python. ziptools is not a replacement for Mergeall's incremental updates, because it processes data only as a whole—a 100G archive always requires making, copying, and unpacking a 100G zipfile (sans compression), no matter how few changes you've made! Still, ziptools can be used to manage symlinks in isolated parts of an archive: simply zip symlink-laden folders with ziptools before they are the subject of a Mergeall, and unzip them when needed... and your archives shall forever dwell in portable-symlink Valhalla (where supported).
A footnote for Unix users (mostly): as a policy, Mergeall never automatically discards any non-cruft items in your archive, unless they are impossible to copy. That is, everything that can be propagated is propagated. Even potentially invalid symbolic links—which point to non-existent or non-file/folder items—are propagated by Mergeall on the grounds that such links may hold a purpose for you, or become valid when moved to other computers. This policy is also adopted by the cpall and ziptools programs; your invalid links are your business, and your asset.
Unlike invalid symlinks, though, Mergeall never processes or propagates any FIFO files in a data archive; it simply prints messages to the log denoting their presence and skips the entry altogether. If you know what FIFOs are, you'll understand why this is so. If not, consider this an introduction to named pipes used in client/server dialogs, which have an inherent and temporal system state; are not normal files but can masquerade as such in folders; and really have no business being mixed in with your archived content!
Short story: on all versions of Windows, mergeall.py
,
diffall.py
, and cpall.py
automatically remove the usual 260-character length limit on pathnames used to access content,
so you can nest folders with wild abandon (and use archives created on less-restrictive platforms).
Since the dawn of PC time, Windows has had a habit of imposing limits that grow absurd as new hardware comes online. Pathnames—the lists of slash-separated names of nested folders used to organize and access files—are a prime example. They have historically been limited to just 260 total characters on Windows including the filename at the end, which makes it difficult (and in some programs impossible) to use nested content. Oddly, this limit is baked into Windows itself, not filesystems or devices; whether you use exFAT or NTFS on flashdrives or SSDs, long paths can fail.
Though this path-length limit was retained partly for backward compatibility, its lifespan was also prolonged by dubious justifications. The rationale that most users would never create such long file or path names has grown moot in today's world of digital storage of everything, and web browsers that regularly save pages with arbitrarily rambling titles as filenames. Windows 10 finally lifts the limit as an option, but not by default: users must enable long paths with registry settings or Group Policy choices, and this obviously doesn't help the 1 billion people (more or less) using prior versions of Windows.
For most Windows users in 2017, Windows pathname limits still seem a throwback to hardware of some PC Paleolithic gone by.
Though not widely known and cumbersome to apply, Windows luckily provides a fix that removes normal pathname length limits. In short, prefixing pathnames that exceed the limit with a "\\?\" substring invokes alternative Windows API tools that lift the limit altogether. Windows network paths require a bit more transformation that we'll ignore here, and some tools that traverse folders require the prefix regardless of length as a preemptive measure. When extended this way, though, long pathnames work as they should.
This trick is automatically applied as needed to remove the Windows path-length limit in
mergeall.py
and its diffall.py
and
cpall.py
companion tools so you can use
archives with richly nested folder trees and absurdly long filenames without errors or skips.
The related and included ziptools system uses the same technique
to also support long paths on Windows both when adding to and extracting from zip archives.
In other words, long paths on Windows just work in all these tools, with no extra action required on your part. This is especially useful in cross-platform use cases, when archives are extended on platforms without such draconian limits (e.g., macOS and Linux).
Because we all inhabit a physical universe with limited resources, there are still a few constraints to keep in mind. Pathnames cannot be infinitely long on any platform even with this Windows fix (after 1K characters, portability grows murky); path components—the names between and after the slashes—still can't be longer than 255 characters apiece; and Windows' own file explorer won't be able to handle long paths in your archive that Mergeall correctly propagates.
Still, Mergeall's extended limits should be adequate to handle any archiving path that qualifies as reasonable. At least by today's definition of reasonable...
New path option: Python 3.6 and later installers for Windows 10 include an option to automatically remove the Windows platform's former path-length limit. Mergeall, and its ziptools cousin, instead use the manual but more-inclusive technique described here to lift the limit for users of all Pythons and all Windows. While useful, the Python 3.6+ enhancement is optional and easy to miss; won't apply to frozen executables like Mergeall's; and doesn't help those using Windows 7 and 8, or Pythons 2.X through 3.5—still-substantial audiences all. For more details, see this online usage note.
Short story: Mergeall ships as both standalone executable and source-code; For source, you'll use standard techniques to install software that Mergeall requires, and launch the programs that Mergeall provides; this section gives tips on both for novices.
Mergeall, its GUI, and the diffall and cpall programs included in the Mergeall package work and are used on all versions of macOS, Windows, and Linux released in recent years. For instance, Mergeall's development team to date has used these programs on a regular basis on Windows 7, 8, 10, and 11; macOS El Capitan though Catalina (10.11 through 10.15); and Ubuntu Linux through version 20. More recently, Android Nougat through 12 have also hosted Mergeall syncs (see the update box ahead). Each platform has usage idiosyncrasies that are beyond this guide's scope, but this section runs over a few basics for users who may be new to running Python on these systems.
First off, Mergeall is packaged and shipped in multiple formats at its download site. It's available in both source-code form that runs on all platforms and provides complete transparency, as well as "frozen" standalone executable forms that are not portable but are easiest to install and more closely reflect many users' concept of a "program."
In the standalone category, Mergeall is available as a Mac app, a Windows .exe
executable, and a Linux executable, each of which installs with a simple unzip. These
Mergealls each run on only one platform, but:
If you won't be digging through Mergeall's code, chances are good that the standalone packages are the best Mergealls for you. They all install with a single download and unzip, and run with a simple click, so we won't say more about them here. See the README file for complete usage details on these packages; for this purposes of this note, standalone Mergealls are fully off-page.
Use the source: in its defense, Mergeall's source-code package is also prone to be updated more frequently, due to the Herculean efforts required to rerelease frozen executables. Upgrades in Mergeall 3.2, for example, were initially released only in the source-code package. In addition, the source-code package is required on Android (see the later update), and may be the only work-around for breakages introduced by platform flux—including recent destructive morph on Linux.
Mergeall's source-code package may be a bit more novel to some users, however, and merits extra coverage here. The programs in the source-code package run on all platforms, and are provided as a zipfile—installing the Mergeall system in this form is as simple as unzipping into a folder on your computer.
When using programs in Mergeall's source-code package, you'll also need to install a Python to run them (if one isn't already available), and standard Python source-code launching techniques to start them. On the former topic, the latest Python 3.X is generally recommended, and has been the most-used Mergeall host to date. In more detail:
Python 2.X also runs Mergeall, but may have Unicode issues on some platforms, and won't support symlinks outside Unix. If at all possible, install and use a Python 3.X for Mergeall. Prior versions of Mergeall also recommended Python 3.5+ for speed on Windows and Linux, but this constraint has been removed in the latest Mergeall release.
Because this story diverges from this point forward, the following provides some additional pointers on both Python installs and program-launching techniques for the source-code package on each of Mergeall's supported platforms:
The Python self-installers for Windows from python.org ship with everything you need, including the tkinter GUI library and the Tcl/Tk libraries it uses. Get the latest and greatest 3.X if it's not present in your machine's Start menu or screen, and click to install.
After Python is installed, you can run a program from a Command Prompt
command line (e.g., py -3 <program> ...
); from IDLE's Run menu after
opening the program's source file; and by clicking or tapping on the program's
file icon in Explorer. Programs which require command-line arguments
(including the basic mergeall.py
script) won't work if clicked directly; run these
in Command Prompt. You can also create a shortcut to the GUI launcher on your
desktop for quick access (e.g., Copy + right-click). See the docetc/launcher-configs
folder for .ico
desktop icon files to use with shortcuts.
Clicks (and taps) launch scripts without setup on Windows, because
Python associates itself when installed to open script files—an automatic
scheme that makes usage simple for running programs written in scripting
languages and shipped as source code. Python also comes with the py
launcher
on Windows, which makes it easy to specify a version to run; see
this page
for an introduction.
New Windows options: Windows now also includes PowerShell for running Windows command lines. More radically, it has grown a Linux subsystem (WSL) which runs a Linux kernel and Bash shell, and in theory should support standard Unix install and launch techniques like those described for macOS and Linux here. See the intro and other docs for more details. Virtual machines and Cygwin provide additional options for Unix-oriented content caretakers, but all are out of scope here.
Your computer has a Python preinstalled by Apple, but at this writing it's not very recent, and its tkinter GUI library is buggy. Per the instructions here, download and click to run the latest Python 3.X self-installer for Mac from python.org, and do the same to get the recommended Tcl/Tk—which at this writing is 8.5 from ActiveState. This story is prone to change (e.g., Python might someday include the newer Tcl/Tk 8.6 for Mac as it does for Windows), so watch python.org for details. Some Mac users might also be interested in installing both Python and Tk using the Homebrew package manager, which may offer more recent versions; see its Python page.
After the installs,
you can run a program from a Terminal command line (e.g.,
python3 <program> ...
); from IDLE's Run menu after opening the program's
source file; by dragging the program's file icon to the Python Launcher you get with
the install; or by clicking the program's file icon in Finder after associating it
with the launcher using a right- (or control-)click. The Mac Python Launcher can be set to open
scripts without a console; you may not want one for GUIs with no text output (like Mergeall's).
Once associated, you can also create an Alias for a script and drag it to the desktop
for quick access.
Your machine almost certainly already has a usable Python, its tkinter GUI library,
and Tcl/Tk, because they are core tools on this platform.
If not, or if your versions are out of date, an apt-get
on Ubuntu or a yum install
on Fedora should allow you to install required packages. For instance,
sudo apt-get install python3-tk
fetches tkinter for Python 3.X on Ubuntu.
It's also straightforward to build Python from its
source code
on Linux, if you've ever dabbled with configure
and make
commands.
Once Python is verified or installed, you can run a program from a terminal command
line (e.g., python3 <program> ...
); from IDLE's Run menu after opening
the program's source file; or by clicking on the program's file icon in the
system's file explorer after you've configured to run in this mode.
Running by icon clicks may require giving the file executable
permission with a chmod +x <filename>
or file icon right-click;
setting the file explorer's Properties to run scripts on clicks instead
of opening them in a text editor; ensuring that the script's top #!
line
references your Python (see which python3
); and
converting end-lines in the file to Unix form if needed (depending on the site of
their latest edits, they may ship in either Windows/DOS or Unix form—see the
fixeoln.py
script
in the docetc/Tools folder if you have no other converter).
Naturally, you may find additional install and launch schemes on each platform, but we'll cut this story short here. For more platform pointers, see your local help resources or try a web search. For tips on formatting drives on each platform, see this note. For more on command-line modes, read on to the next and final pointer.
Android usage: though not discussed explicitly in this doc, Mergeall's scripts and GUI can also be installed and run on Android devices in source code form. For the full story on using Mergeall on this platform, see the newer and separate coverage here. Per both that doc's intro and the earlier note above, most Android 11+ users are best served by the Android Deltas Sync package, which runs Mergeall and its ziptools cousin as nested tools.
Short story: you can use a command line to run Mergeall, and must use one to run most other programs in its package. This section gives a brief platform-agnostic tutorial on the subject aimed at current or future power users, and provides links to examples.
Mergeall comes with a GUI described earlier to make launches easy,
but all the major programs in the Mergeall
system—mergeall.py
,
diffall.py
, and
cpall.py
—can
also be run with direct command lines, and most utility scripts require this mode.
Mergeall also includes a console launcher which asks for run parameters at a console
instead of collecting them in a GUI. While many Mergeall users may never need to type
a command line, they're useful enough to warrant a quick overview here.
Two up-front notes:
first, if you are using any of the standalone (a.k.a. "frozen") executable
packages described in the prior section, your Mergeall programs
are executables instead of source files, but all of the command lines in this section work the
same without any extra steps if you omit .py
extensions
and any Python reference at the front
of the command. See the README file for supplemental details on these
package's command lines omitted here for space.
And second, command lines used to invoke the newer
deltas.py
script,
as well as apply the deferred changes it saves, are not covered here.
Please see the script's top-of-file docstring
for these later details.
Command lines are somewhat advanced, but also powerful and fast. They're typed into whatever your platform provides to run shell commands—Command Prompt on Windows, Terminal on macOS and Linux, and so on. For instance, Mergeall is structured as a command-line script which is normally launched by the GUI, but you can also run it yourself with direct commands of this sort:
mergeall.py <from> <to> -report -skipcruft mergeall.py <from> <to> -auto -backup -quiet -skipcruft
Replace "<from>" and "<to>" with your folders' path names, and be sure
to either run these in Mergeall's folder (cd
there first), or replace "mergeall.py"
with the full directory path to this script file on your computer.
The first command above runs Mergeall's report-only mode, and the second runs its
automatic updates. On some systems the above is all you must type; for source-code
programs Windows automatically runs .py
files with Python, and Unix (Mac and Linux)
know to do the same if the file's opening
#!
line points to the Python you want to use and
the file has been marked as executable (e.g., with chmod +x <filename>
)
per the prior section.
For more control, on Windows add a "py -3" at the front of the command to use an
installed Python 3.X instead of 2.X; on macOS and Linux, a "python3" has the
same effect (as does a shorter "py3" if you also alias py3=python3
):
py -3 mergeall.py <from> <to> -report -skipcruft # Windows python3 mergeall.py <from> <to> -report -skipcruft # Unix py3 mergeall.py <from> <to> -report -skipcruft # Unix for the lazy
You can also send the command's output to a file for later viewing on any platform, by adding a ">" redirection at the end—especially handy for processing large folders that generate lots of output:
mergeall.py <from> <to> -report -skipcruft > results.txt
The diffall bytewise folder-comparison program and the cpall folder-copy program don't have GUIs, but can be run from simple command lines too:
diffall.py <from> <to> -skipcruft > results.txt cpall.py <from> <to> -skipcruft > results.txt
The same goes for the other programs shipped with Mergeall and mentioned earlier in this guide; see these scripts' source files for more details on their command-line arguments:
fix-fat-dst-modtimes.py <rootpath> -add nuke-cruft-files.py <rootpath> -listonly -alldots > savereport.txt
Not all these programs require a command line—mergeall.py
can also be run from its
GUI (which itself can be started by either command line or icon click) and
console interface (up next); and nuke-cruft-files prompts for inputs if you don't list
any in the command, or simply click its file's icon. In general, though, the command-line
technique is both quick and direct, and supported across all desktop computers.
As mentioned, Mergeall also includes a console launcher that asks you for each input instead of collecting them from a GUI. Though most users will prefer the convenience of the GUI, the console interface is simple to run, and supports a mode that asks you to approve or disapprove each update along the way (Mergeall's "selective updates" mode). The console launcher itself can be started by command line or icon click, and runs like this:
c:\...\mergeall> launch-mergeall-Console.py mergeall 3.0 FROM path = "test\test1" use this? (y=yes): y TO path = "test\test2" use this? (y=yes): y Report differences only? (y=yes): n Automatically resolve differences in TO (else asks)? (y=yes): y ...and so on: try it yourself...For a screenshot of the console launcher at work, click here. Better yet, run it live—its defaults use the shipped test folders, and won't change your content. While Mergeall's GUI is simpler for common usage, both the console interface and direct Mergeall command lines support selective updates that provide more control over changes when needed; we'll skip this lesser-used mode here for space, but check out this screenshot to sample its flavor.
For examples of Mergeall command lines in action, browse the results of the same basic test sequence run on all three of the leading desktop platforms, formatted as HTML for easy viewing—choose your favorite, or collect the whole set:
These files present a series of test commands and their outputs, including Mergealls, diffalls, rollbacks, and all with and without skipping cruft. For more command-line fun, there are additional test results for study in the expected outputs folder, and a screenshot of a command-line run here.
Finally, because you can use command lines to run
mergeall.py
, diffall.py
, and cpall.py
,
so too can jobs you might schedule to run regularly, and programs you might code
in the future. For instance:
Scheduled runs—if a backup drive is always accessible, you
could schedule a mergeall.py
command line to update it automatically using
a cron job on Unix, Task Scheduler on Windows, or other.
Other programs—Python's os.system(), os.popen(), and
subprocess.Popen() can run a mergeall.py
command line, and the latter
two can even read its output for custom purposes.
The first of these requires some system knowledge, the second crosses over into the larger realm of programming, and both are officially outside this guide's scope. See system and Python resources to get started on these fronts.
Windows command-line users: per Python's requirements, be sure to make the following environment-variable setting in your session, startup file, or Control Panel, to avoid exceptions when some non-ASCII filenames are printed in output messages on your platform:
set PYTHONIOENCODING=utf8As of Python 3.6, this is required for stdout output (e.g., printed text) only when it's redirected to a file with
script > outputfile
command-line syntax.
Python 3.6 changed direct console output to use UTF8 Unicode to avoid the issue,
but left output redirected to files dependent on the Windows code-page encoding
(see the
3.6 mod,
and more on encodings
here,
here, and
here).
Naturally, this makes output behavior version specific: text printed to the
console itself works without the setting in 3.6 and later, though not in
Pythons 3.5 and earlier.
Hence, without this setting, mergeall.py
, diffall.py
,
and all other scripts in this package may fail on Windows when printing non-ASCII
text to files in all Pythons, and when printing such text to the console in older Pythons.
This setting may or may not also be required when using
Cygwin or Windows Subsystem for Linux
(WSL);
Cygwin uses either Windows' Python or its own, and WSL is both bifurcated and TBD.
The good news is that this setting is required only for direct command lines when using the source-code package on Windows. Mergeall's GUI and console launchers automatically make the setting for you, and Mergeall's frozen executables on Windows change non-ASCII prints to ASCII to work around a freeze-tools bug. The related and embedded ziptools system also avoids this issue and setting by always changing non-ASCII characters to an ASCII representation in text printed on Windows; this is more extreme, but ziptools is command-line mostly.
In closing, here's a friendly reminder from our legal department. By design, this program may change your TO destination folder tree in-place, by adding, replacing, and deleting files and folders as needed to make TO the same as the FROM source. Before using this program on folders with content you care about, it is strongly suggested that you do all of the following:
-report
mode to preview changes.
-backup
mode to save changed data.
mergeall.py
and the original
whitepaper
for additional usage details.
Lest that sound too scary, the -backup
option (and its toggle in the GUI) greatly lessens data loss risk,
by making automatic copies of all items replaced or deleted in the TO destination folder,
and noting new additions. Unless you are extremely tight on space, this should always be
used, as it allows Mergeall changes to be rolled back—by either manual piecemeal copies,
or full automatic rollbacks of changes immediately
after a run. See earlier for more on the backups and
rollbacks options.
That being said, Mergeall's backups and restores should not be considered foolproof, given the many ways that storage devices (and sometimes even humans like us) may fail. Users are encouraged to keep multiple archive copies, whose Mergeall updates are rotated by age. With USB drives getting cheaper every day, there's little good reason for a single point of failure in your backup plans.
Mac Users: please also read this usage pointer about resource forks before using Mergeall on content you care about. Mergeall may be used successfully on Macs too, but has a cross-platform orientation that limits its scope to files that work on all supported computers.
If you like this program, you may also be interested in these other productivity tools brought to you by the makers of Mergeall:
Frigcal | — | Personal Calendar GUI; No Login Required |
---|---|---|
PyEdit | — | Edit Text. Run Code. Have Fun. |
PyMailGUI | — | Email Without the Evil |
PyGadgets | — | GUI Toys, Just for the Hack of It |
You can find these and other free software packages at the programs site.