File: mergeall-products/unzipped/mergeall.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Python 3.X and 2.X are both supported by this script
# Python 3.X  is strongly recommended for trees with non-ASCII Unicode filenames
# Python 3.X  is recommended for trees with symlinks on Unix
# Python 3.3+ is recommended for trees with symlinks on Windows

r"""
==================================================================================
mergeall.py:
  main file-processing script (part of the Mergeall system)

A folder tree-merge tool, and a supplemental example for PP4E/LP5E readers.
See this package's UserGuide.html for version, license, platforms, usage,
and attribution.  See docetc/MoreDocs/Revisions.html for version history.

SYNOPSIS

Basic syncs:
  $ python3 mergeall.py FROM TO -auto -backup -skipcruft 

Report diffs only:
  $ python3 mergeall.py FROM TO -report -skipcruft 

Apply deltas or backups:
  $ python3 mergeall.py FROM TO -restore -auto -skipcruft -backup -quiet 

This script makes a destination folder the same as a source folder quickly.
It works by inspecting tree structure; comparing just file modification 
times and sizes to detect changes; and backing up items changed.  The net 
effect synchronizes a folder tree to be the same as another rapidly, safely, 
and on demand, without reading or copying content in full.  

By employing an intermediate USB drive, this system can also be used to 
synchronize multiple devices on request.  These syncs are invoked manually,
but work across platforms, are not subject to networking tradeoffs, and
are immune to change conflicts inherent in automatic peer-to-peer syncs.

This script is run automatically by the GUI and console launchers provided
(see launch-*), and may be run directly by command lines.  Mergeall is also 
a package of related tools; see its other major scripts in this package:

  - cpall.py performs full tree copies
  - diffall.py performs byte-for-byte tree comparisons
  - fix-* scripts adjust modtimes and sanitize nonportable filenames
  - deltas.py is a variant of this file, which saves changes separately
    instead of applying them, so they can be archived, or applied later 
    with this file's "-restore" mode
  
*USAGE CAUTION*
  Depending on your command-line options or interactive inputs, this script 
  may by design irrevocably change the content of the directory tree named 
  in its "dirto" argument in-place as needed to make it the same as "dirfrom".
  Do not run it against a tree you care about unless you fully understand 
  its operation.  A backup copy of "dirto" tree is recommended.

  Update: The 2.0+ "-backup" option makes automatic copies of items replaced
  or deleted in "dirto" to mitigate data loss risk, and 2.1's "-restore" can
  fully rollback a run that used the "-backup" option.  Still, these should
  not be considered foolproof, given the many ways that devices may fail.  
  Though designed to be useful and robust, this script and its launchers are 
  provided as is, without warranties of any kind.  By using this system, 
  you accept all responsibility for any actions it takes.

This system's files are heavily documented, because it is intended to be both
useful program and learning resource.  Search for "CODE" to skip opening docs.

--------------------------------------------------------------------------------

USAGE

  [py[thon]] mergeall.py dirfrom dirto
                [-report] [-auto]
                [-peek]   [-verify]
                [-backup] [-restore] 
                [-quiet]
                [-skipcruft]

Where:
  dirfrom    => source-tree pathname       (this tree is never changed)

  dirto      => destination-tree pathname  (-auto changes this tree to == dirfrom)

  -report    => report differences only and stop, making no changes

  -auto      => update dirto for dirfrom differences automatically without asking

  -peek      => check N start/stop bytes too, when comparing same-named files

  -verify    => at end, run diffall.py to check update results (or rerun with -report)

  -backup    => save items in dirto that will be replaced or deleted, note adds [2.0]

  -restore   => run mergeall to restore/rollback changes from a prior backup [2.1]
                also used to apply changes-set folders created by deltas.py [3.2]

  -quiet     => suppress per-file backing-up log messages (show just one) [2.4]
                also suppress Unicode-normalization messages during comparisons [3.3]

  -skipcruft => ignore cruft (a.k.a. metadata) files/dirs in both FROM and TO [3.0]


Main run modes:
  if "-report": report differences only
  elif "-auto": report and resolve differences automatically
  else:         report and interact to resolve differences selectively
 
Option dependencies:
  - "-backup" and "-restore" apply to "-auto" and [not "-report"] only
  - "-peek" is used by comparisons, and hence applies to all update modes
  - "-skipcruft" applies to all 3 run modes: "-report", "-auto", and neither
  - "-restore" requires a prior "-backup" run, or a prior run of deltas.py
  - "-quiet" applies only if "-backup" is used, or normalization occurs [3.3]
  - "-verify" won't work for rollbacks or delta-set applies; rerun with "-report"

Update: as of [3.3], "-quiet" also suppresses Unicode-normalization messages for 
non-NFC names during comparisons in all runs (whether "-backup" is used or not).
Else, emitted for every name not already in NFC composed format (possibly many).
Also used in -restore mode when morphing __added__.txt path names to match TO.

See UserGuide.html's for Windows' and other platform's pathname syntax used for
dirfrom and dirto.  See backup.py, UserGuide.html, and Whitepaper.html for more 
on the automatic backup and restore features added in releases 2.0 and 2.1.  
See deltas.py for the alternative changes-set deferred mode added in 3.2.  See
fixunicodedups.py for the Unicode normalization of filenames added in 3.3.

--------------------------------------------------------------------------------

DETAILS

This script quickly synchronizes "dirto" to be the same as "dirfrom", by 
updating "dirto" in-place for just the items that differ between the trees.
This is called a merge, because changes in "dirfrom" are merged to "dirto".
It's useful for quick backups and managing multiple tree copies, and can
serve in some contexts as a manual alternative to cloud-based storage.

This Python 3.X/2.X command-line script performs one-way synchronization of
directory trees.  It may be run to update for all differences automatically
(if "-auto"); report differences only (if "-report"); or update differing
items selectively per console user interaction (if no "-auto" or "-report").

Differing items include both unique items and changed files.  Unique items
are found by tree content.  Changed files are normally detected by checking
just file modification-times and sizes.  The script may also inspect the
first and last bytes of files as an option (if "-peek"); can spawn a full
byte-wise comparison as a post-merge step (if "-verify"); can backup items
before they are destructively changed or removed (if "-backup"); can rollback
changes made by a prior run with backups (if "-restore"); and can skip
platform-specific metadata files and dirs in both trees (if "-skipcruft").

When allowed to perform updates, this script writes to "dirto" only the items
that are unique or changed in "dirfrom", and deletes items unique to "dirto".
The net effect synchronizes "dirto" to be the same as "dirfrom" quickly,
without changing "dirfrom" in any way, and without requiring complete tree
copies or full content compares.


PURPOSE

This script allows multiple local tree copies to synchronize their changes,
either to and from a common base, or between each other directly.  It was
written as an alternative to PP4E's cpall and diffall, and to avoid:

1) Long-running full copies and compares of large trees.  Such backups over
   USB 2.0 to flashdrives or other devices can take hours (the target use case
   was 50G, 30K files, 1700 dirs--photos, music, books, and everything else).
   
2) Relying on the semantics and interaction requirements of platform specific
   merges (e.g., drag-and-drop, cut-and-paste, swipe-and-pray).

3) Giving access to and control of important and private digital assets to
   cloud providers (and/or the NSA...).

Unlike brute-force copies, this script updates only for differences,
updates in-place, and allows selective updates via its interactive mode.
Unlike a typical Unix "cp -r" merge, this script copies to dirto only
differing items in dirfrom, and prunes unique items in dirto.  The net
effect allows many typical mergeall runs to finish in just 1 minute.


USAGE PATTERNS

On Windows, first run this for trees with Unicode filenames (see TIP ahead):
    set PYTHONIOENCODING=utf8

--Quick check for differences only:
    mergeall.py dirpath1 dirpath2 -report  

--Quick check for differences only, slightly slower for reads, save results:
    mergeall.py dirpath1 dirpath2 -peek -report > saveoutput
    
--Upload changes from working copy to common copy, automatic:
    mergeall.py workingdirpath commondirpath -auto -backup > saveoutput
    
--Download changes from common copy to other, interactive/selective, with backups:
    mergeall.py commondirpath otherdirpath -backup

--Download changes from common copy to other, automatic, no backups for changes:
    mergeall.py commondirpath otherdirpath -auto 

--Synchronize changes to other work copies directly, automatic, no peek reads:
    mergeall.py workingcopy1path workingcopy2path -auto -backup

--Synchronize changes to other work copies directly, skipping all cruft files:
    mergeall.py workingcopy1path workingcopy2path -auto -backup -skipcruft


COMMON USE CASES

--Sync changes to an intermediate drive and propagate to another copy:
    1) mergeall.py maindirpath  drivedirpath -auto -backup -quiet -skipcruft
    2) mergeall.py drivedirpath otherdirpath -auto -backup -quiet -skipcruft

--Create and apply a deltas.py changes set to TO indirectly:
    1) deltas.py DIRDELTA FROM TO -skipcruft -quiet
    2) mergeall.py DIRDELTA TO -restore -auto -skipcruft -backup -quiet

--Rollback changes from an immediately preceding run that used -backup:
    mergeall.py archiveroot\__bkp__\dateyymmdd-timehhmmss archiveroot -auto -restore
    mergeall.py archiveroot\__bkp__\dateyymmdd-timehhmmss archiveroot -restore
    rollback.py archiveroot

--Verify results after a merge:
    mergeall.py dirpath1 dirpath2 -report -skipcruft (quicker, not byte-by-byte)
    diffall.py  dirpath1 dirpath2 -skipcruft         (slower, but more thorough)
    mergeall.py dirpath1 dirpath2 -verify -skipcruft (runs diffall auto at end)
    diffall.py  dirpath1 dirpath2 -recent -skipcruft (compare recent changes only)

--------------------------------------------------------------------------------

OPERATION

This script's behavior consists of three phases, run in series:


1) COMPARISON PHASE

It first collects and reports differences between dirfrom and dirto, by
comparing structure and modification times.  These differences include:

  --Unique items by name in either tree (files, symlinks, and folders)
  --Same-named items that appear as different types in the two trees
  --Differing same-named files
  
The latter by default is detected by checking just the files' modification
date/times, and sizes (times are compared with a 2-second granularity to 
accommodate content on FAT32 drives).  If "-peek" is used, the detection also 
compares just the first and last 10 bytes of each file (or < 10 for very small 
files); this is slightly slower, but not nearly as slow as full content reads.

This comparison is not 100% accurate, but suffices for tree merges, and yields
a much quicker comparison than the byte-for-byte scans of diffall (whose output
is also too terse to parse and use here in any event).

Version 2.2 sped the comparison phase with scandir() when using Python 3.5+
or a PyPI install: see Revisions.html.  This was phased out in 3.0 because
the non-scandir() version grew as fast or faster: see scandir_defunct.py

Version 3.0's "-skipcruft" ignores cruft files in both TO and FROM during this
phase, so they are not reported, copied, deleted, or replaced; see ahead.

Version 3.3 uses Unicode normalization in this phase to equate filenames that
are canonically equivalent, but represented by differing Unicode code points.


2) RESOLUTION PHASE (optional)

If directed to do so, the script next resolves all the differences in dirto,
such that dirto is made the same as dirfrom, but dirfrom is unchanged.
That is, dirto becomes a "mirror" of dirfrom, by the following updates,
run in the following order ("items" means files, symlinks, and folders):

  a) Differing same-named files and links are copied from dirfrom to dirto
  b) Unique items in dirto are removed from dirto
  c) Unique items in dirfrom are copied to dirto
  d) Mixed-mode same-named items are replaced in dirto by their dirfrom version

As these updates are fully disjoint (a name can appear in only one category),
they cannot interfere with each other's correctness, though order matters for
renames on case-insensitive platforms like Windows (deletes must precede adds).

The command-line "-report" argument disables the resolution phase, and "-auto" 
directs it to perform all these updates automatically.  Otherwise, the console
user is asked to confirm each update interactively, and hence may run updates 
selectively.  Updates change dirto in-place, but impact differing items only,
and this yields a much quicker sync or backup than the full tree copies of 
cpall (or drag-and-drops or similar).

Mixed-mode items are replaced in dirto only if they are a file/dir or dir/file
mix; other mixed-mode cases and unknown-mode uniques are ignored (and may include
FIFOs on some platforms, but not Unix symlinks which are always copied instead of
followed as of 3.0: see the symlinks TBD resolution ahead).

All file errors during resolution are caught and reported, and do not end the
script; scan its results for "**Error".  This error message pattern is used both
for top-level file errors here, as well as for file error messages during the
recursive folder copy in cpall.copyfile(), and errors during the comparison phase
(which terminate the run before updates).  Any resolution failures skipped also
register as differences on the next run.

As of version 2.0, prior versions of items (both files and directories) replaced
or removed during the resolution process are automatically backed-up to the TO
archive's __bkp__ folder, if the new "-backup" flag is used.  See TBD 3 ahead.

As of version 2.1, resolution here can also be run in "-restore" mode to 
rollback changes made by a prior run with backups enabled.  This mode merges 
from backup set to archive root, omitting step (b) above, and removing items 
added by the prior run's step (c).

As of version 3.2, resolution here can also be run in "-restore" mode to apply 
changes recorded by the deltas.py script.  This run brings TO in sync with
FROM on demand, and its updates can be backed up and rolled back as usual.
deltas.py also replaces the resolution phase here with code that saves changes.


3) VERIFICATION PHASE (optional)

If "-verify" is used, also runs a byte-for-byte diffall.py comparison as a post
step, to verify results.  The diffall summary appears at end of its output, and
should show "No diffs found." at the end if the merge was successful; search
this output for "*DIFFER" and "*UNIQUE" strings for further diff details.

Note that you can generally skip the (possibly very) slow -verify diffall step,
and simply rerun with -report to view any lingering diffs; this report differs
in form and semantics, but contains the same data.  In practice, diffall may be
better run rarely and by separate command lines, than as part of each mergeall.

--------------------------------------------------------------------------------

ADDITIONAL NOTES

1) See test\expected-output-3.0 for recent logs with example commands and output.
See examples\{Logs, _older\other\mergeall-run.txt} for example commands and output.

  UPDATE: see also later releases' subfolders in test/ for more-recent mods.

2) This script runs on Python 3.X and 2.X.  It should be platform neutral, but
has been tested only on Windows to date. 

  UPDATE: besides its Windows origin, this system was later also verified to work 
  on Linux, macOS, and Android, in both command-line and GUI modes.  Python 3.X was 
  also later recommended for all content having Unicode (non-ASCII) filenames.

3) TIP: on Windows, set environment variable PYTHONIOENCODING=utf8 in your shell 
session or Control Panel if you receive Unicode errors when scripts like mergeall.py 
attempt to print non-ASCII filenames on your platform.  This manual setting isn't 
required for the GUI launcher, as it automatically sets and propagates this variable 
to its mergeall.py subprocess, and does not route text to a console (only to a GUI 
and a log file).  However, this setting may be required when mergeall.py is run
directly from a command line; because it prints filenames to the console, visiting 
any file with a non-ASCII name may otherwise abort this script (and others)

  UPDATE: this setting is now required only when running command lines with the
  source-code package on Windows.  If you're using Python 3.6 or later, it's further
  required only when redirecting script output to a file; Python 3.6 changed output
  sent to the console itself to be UTF8, but left its encoding to be that of the 
  Windows code page when written to files.  Mergeall's GUI and console launchers 
  make the setting automatically, and its frozen executable for Windows neutralizes
  the issue by changing output to be ASCII (similarly to ziptools).

4) This reuses some PP4E book examples: diffall.py logic, and cpall.py file/dir tree
copiers, though the latter required extension to call shutil.copystat() to also copy
file modification times after file content, so that files are the same when later
compared again here (see 2.X caveat ahead).  shutil.copy2() would work too, but
PP4E code reuse was a goal.  Also added __future__ imports of print_function for
2.X in cpall and diffall; these are 2.X compatible with this insert.  [[1.7.1]:
also extended cpall's file error message text slightly to match that here.]

  UPDATE: as of 3.3, compatibility with original PP4E code has now been long 
  abandoned, for the sake of utility.  The real world changes software.

--------------------------------------------------------------------------------

DEVELOPMENT

Recent major changes:
  - [3.3] filenames are now run through Unicode normalization for comparison
  - [3.2] deltas.py is an alternative run mode, which saves changes as a set 
  - [3.1] modtimes are propagated for folders too, and Linux exes flush output
  - [3.0] cruft-file handling, symlink support, and Windows long paths
  - [Etc] FAT 2-second granularity, Windows deferred removals, backups/rollbacks

**TRIMMED**

  [3.3] The rest of this section, including former TBDs, caveats, and notes
  referenced above, has been split off to ./mergeall.py-devdocs.txt.  It's 
  useful info and covers many of the subtler aspects of this program, but is 
  now party historical, and became too much to scroll through to get to the code.
  See also ./docetc/MoreDocs/Revisions.html for parallel dev docs and history,
  and ./UserGuide.html for additional user-level coverage of this program.

==================================================================================
"""




###################
# CODE STARTS HERE
###################



from __future__ import print_function         # Py 2.X compatibility
import os, sys, pprint, shutil, stat          # shutil has rmtree (and copystat)
if sys.version[0] == '2': input = raw_input   # Py 2.X compatibility

# this script is mostly platform-neutral
RunningOnMac     = sys.platform.startswith('darwin')
RunningOnWindows = sys.platform.startswith('win')
RunningOnLinux   = sys.platform.startswith('linux')

# [3.0] for frozen app/exes, fix module+resource visibility (sys.path)
import fixfrozenpaths    # __file__ may have an empty dir
        
# reuse PP4E book examples
from dirdiff import intersect                 # in both a and b, retains order [3.3]
from dirdiff import difference                # in a but not b, retains order
from cpall   import copyfile, copytree        # copy utils, with own trace/trys

# [2.0/2.1] automatic backups/restores extensions
import backup                                 # save change/deleted files/dirs in TO 

# [3.0] filter out system metadata files
from skipcruft import filterCruftNames        # no longer: filterCruftDirentrys   

# [3.0] fix too-long paths on Windows (only)
from fixlongpaths import FWP

# [3.2] import and display version number
from __version__ import VERSION

# [3.3] Unicode normalization for filename matching
from fixunicodedups import normalizeUnicodeFilenames    # list => (list, map) 
from fixunicodedups import normalizeUnicode             # fix a single string




#-------------------------------------------------------------------------------
# Message control: 2=more, 0=less
# see also print redefinition hack below: printing is custom for some exes
#-------------------------------------------------------------------------------

traceLevel = 1
def trace(level, *args, **kargs):
    if level <= traceLevel: print(*args, **kargs)




#-------------------------------------------------------------------------------
# Use ints for modtimes (losing fractions of a second), not floats;
# else shutil.copystat() values differ in copied files in Py 2.X (only).
# stat_float_times deprecated in Py 3.3: if gone, simply truncate modtimes.
# UPDATE: the [3.0] os.stat rewrite made this code moot - getmtime is now 
# unused, but stat result modtimes are similarly truncated ahead.
#-------------------------------------------------------------------------------

if hasattr(os, 'stat_float_times'):           # use while it lasts?
    os.stat_float_times(False)                # 2.X compatibility (fix)
else:
    orig_getmtime = os.path.getmtime
    os.path.getmtime = lambda path: int(orig_getmtime(path))




#-------------------------------------------------------------------------------
# Sums for comparison and resolution phases (reusable coding) [2.0]
#-------------------------------------------------------------------------------

class Totals:
    """
    a collection of named sums that display nicely;
    each sum is an attribute of the instance object;
    """
    def __init__(self, *sums):
        for name in sums:
            setattr(self, name, 0)
    def __str__(self):
        return ', '.join(('%s: %d' % kvpair)
                         for kvpair in sorted(self.__dict__.items()))

class MultipleTotals:
    """
    a collection of named Totals that display nicely;
    each Total is an attribute of the instance object;
    """
    def __init__(self, kinds, sums):
        for name in kinds:
            setattr(self, name, Totals(*sums))
    def __str__(self):
        maxlen = max(len(k) for k in self.__dict__.keys())
        return '\n'.join(('%s => %s' % (k.ljust(maxlen), v))
                         for k, v in sorted(self.__dict__.items()))




#-------------------------------------------------------------------------------
# Actual sums - e.g., countcompare.files, countresolve.files.replaced.
# [3.2] resolve does not count/show symlinks separately: algorithmically difficult
#-------------------------------------------------------------------------------

countcompare = Totals('files', 'folders', 'symlinks')    # [3.2] +symlinks
countresolve = MultipleTotals(('files', 'folders'), ('replaced', 'deleted', 'created'))




#-------------------------------------------------------------------------------
# [3.0] For summary indicator line; global because too many parameters already
#-------------------------------------------------------------------------------

import cpall                # errors in copytree() (a global in cpall module)
anyErrorsReported = False   # errors printed here  (a global in this module)




#-------------------------------------------------------------------------------
# [3.0] Hack! PYTHONIOENCODING fails in Windows PyInstaller exes: force ASCII.
# Without this, non-ASCII filename prints throw exceptions in this context ONLY.
# This impacts message display only: all files are still processed as usual.
# Works in Python 3.X only, but that is what is used for the frozen executable.
#
# ALSO: force prints (stdout) to flush output to simulate ubufferred mode, which
# is broken in this context (PYTHONUNBUFFERED fails too and -u doesn't apply).
# flush=True works only in Py 3.3+, but the frozen exe embeds Py 3.5 or later.
# See also reportdiffs() which must replace sys.stdout for pprint() calls
# (this looks like the only context that uses sys.stdout directly in mergeall).
#
# ALSO [3.1]: flush stdout in Linux exes too (but don't encode Unicode text).
# Alernatives: this could have used PyInstaller "spec" files (but they offer 
# less control and don't address encodings), or Python stream-proxy classes
# as in autoflush.py (which was added later and is used by diffall and cpall)
# and PyEdit's subprocproxy.py (though they may be slower).
#-------------------------------------------------------------------------------

if (hasattr(sys, 'frozen') and 
   (RunningOnWindows or RunningOnLinux) and sys.version[0] >= '3'):
     
    def isascii(string):
        try:    string.encode('ascii')
        except: return False
        else:   return True

    def _print(*pargs, **kargs):
        if RunningOnWindows:
            pargs = [(arg if isascii(str(arg)) else ascii(arg)) for arg in pargs]
        if float(sys.version[:3]) >= 3.3:
            oldprint(*pargs, flush=True, **kargs)
        else:
            oldprint(*pargs, **kargs)            
            sys.stdout.flush()

    import builtins
    if not hasattr(builtins, '_printredefined'):
        
        # also reset in builtins in case any other modules in the exe print badness too;
        # need redefined flag else may redefine twice due to multiple top-level scans;
        # wrapping the redefine in a function called from __main__ code also avoids this
        # (see "Customizing open" in Learning Python 5E page 539), as does wrapping sys's
        # stdout in a class whose write() flushes on newlines (see PyEdit's subprocproxy);

        assert builtins.print != _print
        oldprint = builtins.print
        builtins.print = _print
        builtins._printredefined = True
        print = _print   # optional: also found in builtins scope




"""
#-------------------------------------------------------------------------------
RIP: This following was blatantly evil monkeypatching: instead,
changed cpall.copyfile in-place to call copystat by default [2.0];

# must copy file _and_ its modtime, else always differs here;
# this is a bit of a hack, but reuses book examples intact

import cpall
cpall_copyfile = cpall.copyfile        # save original

def copyfile(pathfrom, pathto):
    cpall_copyfile(pathfrom, pathto)   # copies file content
    shutil.copystat(pathfrom, pathto)  # extend with modtime step

cpall.copyfile = copyfile              # reset for cpall.copytree
copytree = cpall.copytree              # ...which runs copyfile here
#-------------------------------------------------------------------------------
"""




##################################################################################
# COMPARISON PHASE: analyze trees to collect changes in FROM to apply to TO
##################################################################################




def comparedirs(dirfrom, dirto, namesfrom, namesto, origfroms, origtos, uniques):
    """
    ----------------------------------------------------------------------------
    Compare directory contents, but not actual files, changing uniques in-place.
    dirfrom is not needed for uniques['to'] in the resolution phase, but added
    here for use in difference summary reports (show both from and to paths).
    
    This comparison is by filename text, without normalizing case on case-
    insensitive platforms (e.g., Windows).  This is deliberate, so that file
    renames trigger a delete of the old followed by an add of the new when
    merged.  Normalizing case would trigger same-files for mixed case, not
    uniques, and wouldn't implement the rename.

    [2.0] Moved listdir call here to comparetrees; no need to return lists.
    [3.3] Compares Unicode-normalized names: map to originals for resolution.
    [3.3] Froms need only original froms (copied), tos need only tos (deleted).
    ----------------------------------------------------------------------------
    """

    # compare normalized forms
    countcompare.folders += 1
    uniquefroms = difference(namesfrom, namesto)    # in arg1 but not in arg2, ordered
    uniquetos   = difference(namesto, namesfrom)

    # map back to unnormalized forms
    if uniquefroms:
        uniquefroms = [origfroms.get(name, name) for name in uniquefroms]
        uniques['from'].append((uniquefroms, dirfrom, dirto))

    if uniquetos:
        uniquetos = [origtos.get(name, name) for name in uniquetos]
        uniques['to'].append((uniquetos, dirfrom, dirto))




def modtimematch(statfrom, statto, allowance=2):    # [1.3] 2 seconds for FAT32
    """
    ----------------------------------------------------------------------------
    Allows for 2-second modtime granularity on FAT32 file-system drives.
    See comparefiles() for notes: this was pulled out from a nested def in
    that function both for speed, and because it's now shared by comparelinks().
    Minor nit: 2-second granularity is used on all filesystems (not just FAT).
    Note: truncated with int() for a Py 2.X precision issue; see top of file.
    ----------------------------------------------------------------------------
    """

    time1 = int(statfrom.st_mtime)                  # [3.0] not os.path.getmtime(path)
    time2 = int(statto.st_mtime)
    return time2 >= (time1 - allowance) and time2 <= (time1 + allowance)




def comparelinks(namefrom, nameto, dirfrom, dirto, statfrom, statto, diffs):
    """
    ----------------------------------------------------------------------------
    [3.0] Compare symbolic links (symlinks) to either files or dirs specially,
    by their link paths (but see update below).  Record link diffs on 'diffs',
    the same list used for files; cpall.copyfile() will copy links specially.  

    This compares links themselves, not the possibly large items they refer to.
    When called, both of the two items are links.  Mixed cases are never routed
    here, and are handled by resolution-phase logic.  We don't care what the
    links refer to, only that their linkpaths differ (invalid links are okay).
    Unlike files, it's just as quick to read these as to check link modtimes.
    Size also does not matter here: full-content compares subsume content size. 

    UPDATE - ABOUT TIME: this now _does_ check modtimes too (outside Windows),
    and flags a diff if the modtimes differ.  This may seem overkill, but when
    links are merged to Windows on a non-NTFS drive, Windows sees them as simple
    files, which causes merges to process them in comparefiles() below.  In that
    case, if only modtimes differ, Windows will resolve the diff and copy over
    the newer item.  But if modtimes are not compared here too, Windows' copies
    may grow out of synch with those on other non-NTFS drives written on Unix
    with identical content but differing modtimes.  In rare use cases, unchanged
    links might even be propagated wrongly from Windows back to Unix as files,
    via intermediate drives with different modtimes.  Hence, we must mimic files
    here because merges on Windows with non-NTFS drives will too.  

    (Edit: in hindsight, this scenario is so rare that it's difficult to imagine 
    as possible, but syncing for dates might prevent a symlink from being blown
    away by a back sync, if dates change but content does not - e.g., by touch
    or re-edits.  Symlinks assume they won't return from devices that botch them.
    At the least, modtimes may be lost if they are not pushed out to other copies.)

    AT THE SAME TIME: Windows os.utime() does not support follow_symlinks=False,
    and hence cannot propagate symlink modtimes correctly, even when run with
    admin permission and NTFS drives so symlinks work (see cpall.copyinfo()).
    Thus, symlinks always get stamped with the current time on Windows, which
    would make them always register diffs with other copies.  To avoid this, we
    do _not_ test modtimes here on Windows (only), in case Windows is processing
    symlinks as true symlinks, and not simple files.  Naturally, this helps only
    if Windows is used to both create and compare links.

    [3.2] PLUS PYTHONS: Similarly, symlinks get stamped with creation time on 
    all Pythons that do not support follow_symlinks=False; this includes Py 2.X 
    and 3.0..3.2 (see cpall.copyinfo().  We skip the modtime test here on these 
    Pys too to avoid marking every symlink as a diff, though this also helps iff
    the same Py is used to create and compare (it's harmless if that's not so).

    [3.2] AND BDRS: Per deltas.py testing, BDRs burned by macOS don't record 
    symlinks properly, and trigger OSErrors during the comparison phase, which 
    ends the full program run prematurely.  Added a try/except here to catch and
    skip; this may also become an "-ignorelinks" command-line option eventually.
    UPDATE: this also required a work-around in diffall.py, but not cpall.py;
    see also the problem's new demo at test/macos-bad-bdr-symlinks-3.2.txt.

    [3.3] namefrom/nameto are original and unnormalized names here: they differ
    if one has been changed by Unicode normalization, and paths must match the 
    filesystem here.  Also include both in diff lists (resolution has no access 
    to the originals dicts present during the comparison tree walk), and 
    normalize link paths from files prior to comparison (they may differ too!). 
    ----------------------------------------------------------------------------
    """

    trace(2, 'links:', namefrom, 'in', dirfrom, dirto)
    countcompare.symlinks += 1                   # count/display num links too [3.2]

    pathfrom = dirfrom + os.sep + namefrom       # rarely run, avoid os.path.join
    pathto   = dirto   + os.sep + nameto

    # ignore modtimes on Windows and on Pythons 3.2 and earlier
    ignoremodtimes = RunningOnWindows or (float(sys.version[:3]) < 3.3)

    if (not ignoremodtimes) and (not modtimematch(statfrom, statto)):
        # try modtime 1st on some: the easiest diff
        diff = (namefrom, nameto, dirfrom, dirto, 'modtime')
        diffs.append(diff)
    else:
        # compare content on all: link-path strs
        try:
            linkpathfrom = os.readlink(FWP(pathfrom))
            linkpathto   = os.readlink(FWP(pathto))
            if normalizeUnicode(linkpathfrom) != normalizeUnicode(linkpathto):    # [3.3]
                diff = (namefrom, nameto, dirfrom, dirto, 'linkpaths')
                diffs.append(diff)
        except OSError:
            trace(1, 'Unreadable link skipped:', os.path.join(dirto, nameto))




def comparefiles(namefrom, nameto, dirfrom, dirto, statfrom, statto, diffs, 
                 dopeek=False, peekmax=10):
    """
    ----------------------------------------------------------------------------
    Compare same-named files by modtime and size, and possibly by start+stop
    bytes read (up to min file size) if dopeek, changing diffs in-place.
    Test are run in series until the first difference is found, or all have
    been tried.  This is not 100% accurate (and is subject to filesystem
    diffs), but avoids full reads, and is sufficient for synching large trees.

    Uses binary byte files to prevent Unicode decoding and endline transforms,
    as trees might contain arbitrary binary files as well as arbitrary text.
    Requires shutil.copystat() to also copy file modtimes, else copied files
    will still always differ here; see hack to reused copy utilities above.

    Update: version 1.3 allows for 2-second modtime granularity in FAT32 file
    system (as well as NTFS's fractional seconds), by using a +/- 2-second range
    test instead of !=.  Modtime timestamps are returned in seconds, possibly
    truncated.  See details in the CAVEATs section of this file's docstring.

    [3.0] This now gets stat objects, to avoid triggering additional stat calls
    in os.path.getmtime()/getsize().  On Windows, that made this non-scandir()
    comparison phase variant an extra 50%-100% faster, and finally at least as
    fast as the prior scandir() variant (and this remains 2X faster on Mac).

    [3.3] namefrom/nameto are original and unnormalized names here: they will 
    differ if one has been changed by Unicode normalization, and paths must 
    match the filesystem here.  Also include both in diff lists: resolution has
    no access to the originals dicts present during the comparison tree walk. 
    ----------------------------------------------------------------------------
    """

    trace(2, 'files:', namefrom, 'in', dirfrom, dirto)
    countcompare.files += 1     

    # [3.0] don't make pathfrom/pathto yet
    startdiffs = len(diffs)
    
    if not modtimematch(statfrom, statto):                        # try modtime 1st:
        diff = (namefrom, nameto, dirfrom, dirto, 'modtime')      # easiest case
        diffs.append(diff)                                        # [3.3] both names

    else:                                                        
        sizefrom = statfrom.st_size                               # [3.0] not os.path.getsize(path)
        sizeto   = statto.st_size
        if sizefrom != sizeto:                                    # try size next: 
            diff = (namefrom, nameto, dirfrom, dirto, 'filesize') # unlikely case
            diffs.append(diff)
            
        elif dopeek:                                              # rarely: iff peek arg
            pathfrom = dirfrom + os.sep + namefrom                # [3.0] not os.path.join
            pathto   = dirto   + os.sep + nameto                  # try start+stop bytes
            peeksize = min(peekmax, sizefrom // 2)                # scale peek to size/2
            filefrom = open(FWP(pathfrom), 'rb')                  # sizefrom == sizeto
            fileto   = open(FWP(pathto), 'rb')                    # [3.0] long Windows paths 
            if filefrom.read(peeksize) != fileto.read(peeksize):
                diff = (namefrom, nameto, dirfrom, dirto, 'startbytes')
                diffs.append(diff) 
            else:
                filefrom.seek(sizefrom - peeksize)
                fileto.seek(sizeto - peeksize)
                if filefrom.read(peeksize) != fileto.read(peeksize):
                    diffs = (namefrom, nameto, dirfrom, dirto, 'stopbytes')
                    diffs.append(diff) 
            filefrom.close()
            fileto.close()
            
    return len(diffs) == startdiffs    # true if did not differ, else extends 'diffs'




def excludeskips(dirfrom, dirto, namesfrom, namesto, skips):
    """
    ----------------------------------------------------------------------------
    [2.0] Remove __bkp__ changes-backup folders at top-level only.
    Could use set difference, but want to retain filesystem order;
    or [name for name in nameslist if name != skip], but extra copies.
    Could be used to exclude other items too, but currently is not.
    
    Update: [3.0]'s later "-skipcruft" added a more general filter
    which might have included __bkp__ if it was available earlier,
    but __bkp__ is a mandatory skip and crufts are user-configurable.
    We could have automatically inserted __bkp__ in the cruft list,
    but retain the "excluding __bkp__" message for this special case,
    and incur 'in' speed hit here just once per run (for skip True,
    at the trees' top levels only).  Also pulled this out from nested 
    def for speed: don't remake a function object at each level.

    [3.2] Generalize arg "skip" to "skips" to also support skips of
    top-level __added__.txt files in "-restore" runs for both deltas
    applies and rollbacks.  This is now passed with both __bkp__ and 
    __added__.txt at the comparison phase's top recursion level only;
    no list scans are performed for lower levels.  This effectively 
    removes __added__.txt from consideration in the resolution phase,
    and avoids both a spurious copy, and a former not-found error 
    message in later "-restore" runs.

    Details: formerly, the __added__.txt file at the root of delta
    sets and backups folders was copied to TO normally as a unique
    FROM during "-restore" runs, but forcibly removed from TO's root 
    at the end of a merge.  That's not quite enough when "-backup" 
    is used with "-restore": in this case, an "__added__.txt" line was
    also added to the new __added__.txt in TO's backup set, and would 
    generate a not-found error message if the backup set was later 
    applied in a "-restore" rollback, because of the forcible removal.

    This never cropped up earlier, because rolling back a rollback was
    previously unsupported (__added__.txt removals were not backed up). 
    It grew important for rolling back a delta set's changes, which is 
    now supported in full; in this use case, two "-restore" runs may be
    made: one to apply deltas and save backups; and another to roll
    back deltas from backups.  Disqualifying __added__.txt here avoids 
    both the TO copy and later "-restore" error, in both deltas rollbacks
    and the now-supported rollbacks of rollbacks (obscure, but true).
    ----------------------------------------------------------------------------
    """

    # remove skips in place
    if skips:
        for skip in skips:
            if skip in namesfrom:
                trace(1, 'excluding', os.path.join(dirfrom, skip))
                namesfrom.remove(skip)

            if skip in namesto:
                trace(1, 'excluding', os.path.join(dirto, skip))
                namesto.remove(skip)

    # else a no-op at all lower recursion levels




def comparetrees(dirfrom, dirto,                # subjects: source, destination paths
                 diffs, uniques, mixes,         # outputs: collected differences 
                 dopeek, skipcruft, quiet,      # modes: reads, skips, messages [3.3]
                 skips=None):                   # items to skip at top-level only [3.2]
    """
    ----------------------------------------------------------------------------
    Compare all subdirectories and files in two directory trees, noting
    differences in-place in diffs, uniques, and mixes, for later updates.
    TBD: Permission error exceptions here end this script; should they?
    
    [2.0] Added skip argument for __bkp__ at top of archives, and moved
    os.listdir calls from comparedirs to here to make removals possible.
    May need bytes listdir arg for undecodable filenames on some platforms.

    [3.0] Added skipcruft arument and code for the new "-skipcruft" option
    described near the top of this file, and in the UserGuide.html document.

    [3.0] Coding note: any exceptions during the comparison phase (e.g., for
    permission errors on listings here) are deliberately ignored, and allowed
    to terminate the run.  Else, error message in this phase's log would be 
    too easy to miss, and failed folders would go silently unprocessed.
    But these are now caught at the top-level and reported (see __main__).

    [3.0] Optimization: don't scan the 'common' list more than once, but
    recur into subdirs immediately (unlike diffall, there is no need to
    postpone subdirs recursion here, because we're building difference
    data structures to be used later).  This automatically avoids calling
    os.path.join() twice on each item name, and halves big-O complexity
    in both this and its 3.5+ optimized variant ahead.

    [3.0] Optimization: also replace os.path.join() calls here with +os.sep+.
    os.path.join() is complex and slow overkill for known path+file cases,
    especially on Windows (see Python's Lib\ntpath.py).  Also replaced in
    comparefiles() above (the savings for passing paths instead is likely
    trivial).  This was not required in the 3.5+ os.scandir() variant
    (which was eventually dropped: see scandir_defunct.py).
 
    OPTIMIZATION RESULTS:
      The prior 2 changes reduced comparison-phase time for an 87G SSD tree
      with 59k files and 3.5k dirs from 19 to 14 seconds on Pythons 3.4 and
      older (which use os.listdir()), but did not impact a 7.2 second runtime
      on Pythons 3.5+ (which use an os.scandir() variant that fully accounts
      for its faster speed).

      Thus, 5 seconds were shaved in Pythons 3.4-, but filesystem call
      overheads overshadow code here in the 3.5+ variant's case.  Moreover,
      nearly all of the 5 second 3.4- gain is due to reduced 'common' scans,
      not os.path.join() removal.  For a typical results set, see log file
      test/expected-output-3.0/optimizations-3.0/mergall-results.txt; its
      relative findings are immune to test variables.
      
      Caveat: tested on Windows only; os.scandir() is not used on Mac OS X.
      Caveat: the following 2 notes' later recodings also impacted speed.

    [3.0] For links, recoded to use os.lstat()+stat instead of os.path.is*()
    to avoid multiple stat calls and narrow type tests (the new calls don't
    classify a link as a file or dir too).  All optimization results noted
    above were true before this recoding, but are likely similar after (TBD).

    [3.0] Also pass comparefiles() stat objects to avoid other os.path.*()
    calls' internal stat calls.  This made this non-scandir() variant's 
    speed >=  scandir()'s on Windows too, obsoleting the scandir() 3.5+
    variant maintained redundantly (see scandir_defunct.py).  Vaya con Dios!

    [3.0] Support long paths on Windows by running paths through FWP in all
    Python file tool calls; this is a no-op if non-Windows or within limits.

    [3.3] Lots of code changes for Unicode normalization here; see above.
    See the new fixunicodedups.py for the implementation of normalization.
    Note that dirfrom and dirto are always unnormalized names at each level.
    Nit: some normalization tasks could be pushed down to dirdiff.py's
    intersect/difference rather than handling them both here and in diffall.py,
    but that still requires normalizing just once here, and seems too implicit.
    ----------------------------------------------------------------------------
    """

    trace(2, '-' * 20)
    trace(1, 'comparing [%s] [%s]' % (dirfrom, dirto))
        
    # get raw dir content lists here
    namesfrom  = os.listdir(FWP(dirfrom))                       # [1.7] or pass bytes?
    namesto    = os.listdir(FWP(dirto))                         # would impact much

    # [3.3] map Unicode variants to common form
    tracer = print if not quiet else lambda *args: None
    namesfrom, origfroms = normalizeUnicodeFilenames(namesfrom, dirfrom, tracer)
    namesto,   origtos   = normalizeUnicodeFilenames(namesto,   dirto,   tracer)

    # drop __bkp__ and __added__.txt names at roots (in place)
    excludeskips(dirfrom, dirto, namesfrom, namesto, skips)

    # [3.0] filter out system metadata files and folders
    if skipcruft:
        namesfrom = filterCruftNames(namesfrom)
        namesto   = filterCruftNames(namesto)

    # compare dir filename lists (normalized) to get uniques (unnormalized)
    comparedirs(dirfrom, dirto, namesfrom, namesto, origfroms, origtos, uniques)

    # analyse names in common: same (normalized) name and case
    trace(2, 'comparing common names')
    common = intersect(namesfrom, namesto)         # compare normalized names lists
    
    for name in common:                            # scan common names just once [3.0]

        origfrom = origfroms.get(name, name)       # use unnormalized names now [3.3]
        origto   = origtos.get(name, name)         # for paths and filesystem access        

        pathfrom = dirfrom + os.sep + origfrom     # avoid os.path.join overkill [3.0]
        pathto   = dirto   + os.sep + origto       # paths have unnormalized names [3.3]

        statfrom = os.lstat(FWP(pathfrom))         # [3.0] os.path.is*() => os.lstat(): 
        statto   = os.lstat(FWP(pathto))           # narrow results, avoid N stat calls

        # 0) compare linkpaths of links in common [3.0]
        if stat.S_ISLNK(statfrom.st_mode) and stat.S_ISLNK(statto.st_mode):
            comparelinks(origfrom, origto, dirfrom, dirto, statfrom, statto, diffs)
        
        # 1) compare times/sizes/contents of (non-link) files in common 
        elif stat.S_ISREG(statfrom.st_mode) and stat.S_ISREG(statto.st_mode):
            comparefiles(origfrom, origto, dirfrom, dirto, statfrom, statto, diffs, dopeek)
                           
        # 2) compare (non-link) subdirectories in common via recursion
        elif stat.S_ISDIR(statfrom.st_mode) and stat.S_ISDIR(statto.st_mode):
            comparetrees(pathfrom, pathto, diffs, uniques, mixes, dopeek, skipcruft, quiet)

        # 3) same name but not both links, files, or dirs (mixed types, fifos)
        else:
            mixes.append((origfrom, origto, dirfrom, dirto))




#-------------------------------------------------------------------------------
# DEFUNCT: redefine comparison phase functions if 3.5+ scandir() applies.
# This was once faster on Windows/Linux (only), but no longer is: punt.
# The production version above used uses original/portable os.listdir() names.
#-------------------------------------------------------------------------------

# this is normally a no-op, to be deleted altogether in 3.N 
# now stubbed-out: break the dependency for frozen apps/exes
# use fixfrozenpaths.fetchMyInstallDir(__file__) if restored
"""
this_mod_dir = os.path.dirname(__file__)
scandir_code = os.path.join(this_mod_dir, 'scandir_defunct.py')
exec(open(scandir_code).read())   # as if pasted here
"""




##################################################################################
# RESOLUTION PHASE: reconcile trees by updating TO for changes in FROM
##################################################################################




def mergetrees(diffs, uniques, mixes,
               doauto, dobackup, toroot, dorestore, fromroot, quiet, skipcruft):
    """
    ----------------------------------------------------------------------------
    Using the comparison phase's result lists, reconcile tree differences per
    the rules given in this script's docstring - replacing diffs, deleting
    uniques in dirto, copying uniques in dirfrom, and resolving mixed types.

    This is a one-way mirror only: it makes dirto same as dirfrom, without
    changing dirfrom.  Because change sets are disjoint (the same item can
    appear in only one category) they cannot interfere with each other's
    operation or results.  Still, order matters on case-insensitive machines
    (per ahead), and dirto deletes should be run first in case space is
    limited on the target dirto device.
    
    ----
    SUBTLE THING: the order of steps here also matters for correctness.
    In case-insensitive contexts like Windows, it's crucial to delete
    before adding, or else mixed-case renames won't work.

    Because folder contents are compared by name strings, renames result
    in a delete of the old name in TO, and an add in TO of the new name
    in FROM, regardless of the modtimes on either version.  This is as it
    must be to implement a rename; treating differing case versions as the
    same file name on Windows would avoid updating the file if its modtime
    matched, but would not rename it (as it should).

    However, it's critical that we delete the old version in TO before
    adding the new (and similarly, delete the new before adding the old
    in "-restore" mode), or else a new add would be removed by a later
    old delete on Windows.  This is so, because a delete of any case will
    delete any other case.  If adds were first, deletes would negate them.

    Although the same is true for rewrites on Windows -- opening any case
    for output erases any other case -- this isn't an issue for replacements
    of same-named files for modtime differences, because this can only happen
    when case matches; case mismatches are always instead classified as unique
    items by the tree comparison, triggering a delete and add (in that order).
    By the same logic, mixed-type updates are also safe on Windows, because
    this category can only arise if case matches during tree comparison,
    though this category also deletes before adding for space.

    Order is a non-issue on case-sensitive platforms like Linux, because
    mixed-case filenames yield distinct files: deletions and rewrites cannot
    impact a file whose name is differently cased.
    ----
    
    [2.0] Backup mode: if dobackup, save files and dirs that will be
    destructively replaced or removed, to the TO archive's __bkp__ folder.
    Any exceptions during backups cause the change operation to be skipped.
    Dropped old ".bkp" prototype code; insufficient, must special-case;

    [2.1] Restore mode: backups also list added files in __bkp__/__added.txt__,
    and if dorestore, don't delete unique items in the TO tree, but do delete
    items listed in FROM's __added__.txt (first: order matters on Windows!).
    This allows complete rollback of a prior run by merging a __bkp__ subfolder
    to the archive root -- restoring all items replaced and removed, and removing
    all items added.  noteaddition() failures don't cancel copies here, as
    these are non-destructive updates.

    [3.0] Support symlinks, by always coying links themselves, not the items
    they refer to (referents).  See notes below and in this module's docstring.

    [3.0] Support long paths on Windows by running paths through FWP in all
    Python file tool calls; this is a no-op if non-Windows or within limits.

    [3.2] In "-restore" mode, print just one message for unique TO items 
    skipped.  In rollbacks, and especially in the new deltas-set applies,
    this category includes nearly every file in the archive, and makes for
    too much output.  Rollbacks are rare, but this is common with deltas.py.

    [3.2] In "-restore" mode, no longer forcibly delete the __added__.txt
    file formerly added to TO as a unique FROM; excludeskips() now disqualifies
    these, so they never appear in uniques['from'], and won't wind up in TO 
    or its own __added__.txt (producing errors if forcibly removed here). 

    [3.3] Assorted Unicode normalization changes: use the new difference 
    list formats, adjust __added__.txt paths.  Search for [3.3] below.
    Same-name replacements must delete first to avoid dups, but only if the 
    names differ: Android 11 shared-storage deletes are horrifically slow!
    ----------------------------------------------------------------------------
    """
    
    # defs for brevity and uniformity
    join = os.path.join
    from backup import (      # also: handles recursive/circular import
        backupitem,           # save items to be replaced or deleted
        rmtreeworkaround,     # hack/fix for shutil.rmtree: see backup.py
        noteaddition,         # list files added for info and restores
        removeprioradds,      # if restoring, remove prior run's adds
        dropaddsfile,         # if restoring, delete adds file (till [3.2]) 
        indent1)              # same look-and-feel for related message here


    class SkipUnknowns(Exception):
        # for isolated link+other cases handled here
        pass 

    
    def askuser(prompt, query, filename):
        """
        Ask console user (hook for future GUI?).
        """
        print('\n' + prompt)
        domanual = input(query).lower() in ['yes', 'y', '1']
        if not domanual:
            print('no action taken for [%s]' % filename)
        return domanual

        
    def error(message, *args):
        """
        Standard message format + exception data? ([1.7.1] show message too!).
        """
        global anyErrorsReported           # [3.3] mar22: pull out of docstr!
        anyErrorsReported = True           # [3.0] for summary line

        print('**Error', message, *args)
        trace(1, sys.exc_info()[0], sys.exc_info()[1])


    #---------------------------------------------------------------------------
    # 0) For items listed in FROM's __added__.txt in -restore mode: *Delete*
    #
    # - This is prior-run adds for rollbacks, unique TO items for deltas.
    # - This must happen before adds, to support Windows mixed-case renames.
    # - [2.1] On merge of backup to root, delete items added in prior run.
    # - [3.2] The adds file may now also reflect a deltas.py changes-set 
    # - [3.2] Now backs up removed items, for rollbacks of deltas and rollbacks.
    # - [3.3] Now converts path parts to Unicode variants used on the TO device.
    #---------------------------------------------------------------------------

    if dorestore:
        # make counts match prior run
        totals = removeprioradds(fromroot, toroot, dobackup, quiet)     # [3.2] +args
        countresolve.files.deleted, countresolve.folders.deleted = totals
        trace(1, indent1 + 'removed %d/%d files/dirs listed in __added__.txt' % totals)


    #---------------------------------------------------------------------------
    # 1) For differing same-named files and links: *Replace*
    #
    # - This is FROM changes in syncs and deltas, TO replacements in rollbacks.
    # - [2.0] Backup target first, if backups enabled in command arg.
    # - [3.0] This case also handles differing links implicitly via copyfile().
    # - [3.3] Names and dirs in diffs are unnormalized Unicode, and may differ.
    # - [3.3] Explicitly delete TO because its name may differ (else copy=>dup).
    # - [3.3] But iff names differ: Android 11 shared-storage deletes are SLOW.
    #---------------------------------------------------------------------------

    for (namefrom, nameto, dirfrom, dirto, why) in diffs:
        pathfrom, pathto = join(dirfrom, namefrom), join(dirto, nameto)
        if not doauto:
            prompt = '[%s] differs by %s in\n\tFROM dir [%s]\n\tTO dir   [%s]'
            prompt %= (namefrom, why, dirfrom, dirto)
            domanual = askuser(prompt, 'use FROM version?', namefrom)

        if doauto or domanual:
            try:
                backupitem(pathto, toroot, dobackup, quiet)    # save original [2.0]
                if namefrom != nameto:                         # explicit delete [3.3]
                    os.remove(FWP(pathto))                     # but not if same [3.3]
                copyfile(pathfrom, pathto)                     # copy content + modtime
            except:
                error('copying same file: skipped FROM', pathfrom)
            else:
                countresolve.files.replaced += 1
                trace(1, 'replaced same file, using FROM', pathfrom)


    #---------------------------------------------------------------------------
    # 2) For unique files, links, and dirs in TO: *Delete* (unless -restore)
    #
    # - This is FROM removals in syncs, unchanged items in deltas and rollbacks.
    # - This step must be run before #3 below: order matters for renames.
    # - [2.0] Backup target first, if backups enabled in command arg.
    # - [3.0] This case also routes links to os.remove() (rmtree disallows).
    # - [3.3] Names and dirs in uniques are unnormalized Unicode (originals).
    # - [3.3] Don't need original namefrom here: deleting nameto only.
    #---------------------------------------------------------------------------

    if dorestore:
        # [2.1] in -restore mode, leave all formerly unchanged items alone
        # [3.2] this now includes both rollbacks and deltas-set applies
        # [3.2] don't trace each item: irrelevant and possibly numerous
        # [3.3] lifted out of for loop: requires nesting, but cleaner logic

        if uniques['to']:
            trace(1, indent1 + 'retaining all common TO items')

            # punt: delta.py's numto is not meaningful here: some uniques['to'] may 
            # have also been in __added__.txt, and hence been deleted at step #0;
            # numto = sum(len(uniqs) for (uniqs, dirfrom, dirto) in uniques['to'])

    else:
        for (uniqs, dirfrom, dirto) in uniques['to']:      # dirfrom unused here
            for nameto in uniqs:
                pathto = join(dirto, nameto)
            
                if os.path.isfile(FWP(pathto)) or os.path.islink(FWP(pathto)):
                    if not doauto:
                        prompt = '[%s] is unique file in\n\tTO dir [%s]' 
                        prompt %= (nameto, dirto)
                        domanual = askuser(prompt, 'delete from TO tree?', nameto)

                    if doauto or domanual:
                        try:
                            backupitem(pathto, toroot, dobackup, quiet)
                            os.remove(FWP(pathto))
                        except:
                            error('removing TO file: skipped', pathto)
                        else:
                            countresolve.files.deleted += 1
                            trace(1, 'removed old TO file,', pathto)

                elif os.path.isdir(FWP(pathto)):
                    if not doauto:
                        prompt = '[%s] is unique dir in\n\tTO dir [%s]'
                        prompt %= (nameto, dirto)
                        domanual = askuser(prompt, 'delete from TO tree?', nameto)

                    if doauto or domanual:
                        try:
                            backupitem(pathto, toroot, dobackup, quiet)
                            shutil.rmtree(FWP(pathto, force=True), onerror=rmtreeworkaround)
                        except:
                            error('removing TO dir: skipped', pathto)
                        else:
                            countresolve.folders.deleted += 1
                            trace(1, 'removed old TO dir,', pathto)

                else: trace(1, 'ignored unknown type, TO:', pathto)


    #---------------------------------------------------------------------------
    # 3) For unique files, links, and dirs in FROM: *Copy*
    #
    # - This is FROM additions in syncs and deltas, TO removals in rollbacks.
    # - [2.1] No backups required, but add note for restores.
    # - [3.0] This case also handles new links implicitly via copyfile().
    # - [3.3] Names and dirs in uniques are unnormalized Unicode (originals).
    # - [3.3] Don't need original nameto here: copying namefrom verbatim.
    #---------------------------------------------------------------------------
   
    for (uniqs, dirfrom, dirto) in uniques['from']:
        for namefrom in uniqs:
            pathfrom, pathto = join(dirfrom, namefrom), join(dirto, namefrom)
            
            if os.path.isfile(FWP(pathfrom)) or os.path.islink(FWP(pathfrom)):
                if not doauto:
                    prompt = '[%s] is unique file in\n\tFROM dir [%s]' 
                    prompt %= (namefrom, dirfrom)
                    domanual = askuser(prompt, 'copy to TO tree?', namefrom)

                if doauto or domanual:
                    try:
                        noteaddition(pathto, toroot, dobackup)
                        copyfile(pathfrom, pathto)
                    except:
                        error('copying FROM file: skipped', pathfrom)
                    else:
                        countresolve.files.created += 1
                        trace(1, 'copied new FROM file,', pathfrom)

            elif os.path.isdir(FWP(pathfrom)):
                if not doauto:
                    prompt = '[%s] is unique dir in\n\tFROM dir [%s]' 
                    prompt = promtp % (namefrom, dirfrom)
                    domanual = askuser(prompt, 'copy to TO tree?', namefrom)

                if doauto or domanual:
                    try:
                        noteaddition(pathto, toroot, dobackup)
                        os.mkdir(FWP(pathto))
                        copytree(pathfrom, pathto, skipcruft=skipcruft)
                    except:
                        error('copying FROM dir: skipped', pathfrom)
                    else:
                        countresolve.folders.created += 1
                        trace(1, 'copied new FROM dir,', pathfrom)

            else: trace(1, 'ignored unknown type, FROM:', pathfrom)


    #---------------------------------------------------------------------------
    # 4) For same-named items that are both file and dir (rare): *Delete+Copy* 
    #
    # - This is mixed-mode names in syncs, deltas, and rollbacks.
    # - [2.0] Backup item being replaced first, if enabled in command arg.
    # - [3.0] This case now also handles mixed types of a link and nonlink.
    # - [3.3] Names and dirs in mixes are unnormalized Unicode (originals).
    # - [3.3] This already explicitly deletes TO: okay if FROM name differs. 
    #---------------------------------------------------------------------------

    for (namefrom, nameto, dirfrom, dirto) in mixes:
        pathfrom, pathto = join(dirfrom, namefrom), join(dirto, nameto)

        # [3.0] link+other or other+link (case #1 above handles differing links);
        # this code almost subsumes dir+file and file+dir too, but differs slightly
        # for unknown FROM types, and better to keep original more-specific cases;
        
        if os.path.islink(FWP(pathfrom)) or os.path.islink(FWP(pathto)):
            if not doauto:
                prompt = '[%s] is mixed with links in\n\tFROM dir [%s]\n\tTO dir   [%s]'
                prompt %= (namefrom, dirfrom, dirto)
                domanual = askuser(prompt, 'use FROM version dir?', namefrom)

            if doauto or domanual:
                try:
                    # backup+delete to: link or ?
                    if os.path.isfile(FWP(pathto)) or os.path.islink(FWP(pathto)):
                        backupitem(pathto, toroot, dobackup, quiet)
                        os.remove(FWP(pathto))
                    elif os.path.isdir(FWP(pathto)):
                        backupitem(pathto, toroot, dobackup, quiet)
                        shutil.rmtree(FWP(pathto, force=True), onerror=rmtreeworkaround) 
                    else:
                        # don't fail in backupitem(), not error: TO unchanged
                        raise SkipUnknowns()   # e.g., fifos

                    # copy from ~ to: link or ?
                    if os.path.isfile(FWP(pathfrom)) or os.path.islink(FWP(pathfrom)):
                        copyfile(pathfrom, pathto)
                    elif os.path.isdir(FWP(pathfrom)):
                        os.mkdir(FWP(pathto))
                        copytree(pathfrom, pathto, skipcruft=skipcruft)
                    else:
                        # log an error message: TO was backed up and removed
                        # slightly inconsistent, but too rare to code specially 
                        raise OSError('Unknown FROM not copied')   # e.g., fifos
                    
                except SkipUnknowns:
                    trace(1, 'ignored unknown types, FROM:', pathfrom, 'TO:', pathto)
                except:
                    error('replacing item with FROM item: skipped', pathfrom)
                else:
                    countresolve.files.replaced += 1    # close enough (?)
                    trace(1, 'replaced links mixed-type target, using FROM', pathfrom)

        # original mixed-cases code: make more common cases more explicit
        
        elif os.path.isdir(FWP(pathfrom)) and os.path.isfile(FWP(pathto)):
            if not doauto:
                prompt = '[%s] is mixed dir/file in\n\tFROM dir [%s]\n\tTO dir   [%s]'
                prompt %= (namefrom, dirfrom, dirto)
                domanual = askuser(prompt, 'use FROM version dir?', namefrom)

            if doauto or domanual:
                try:
                    backupitem(pathto, toroot, dobackup, quiet)
                    os.remove(FWP(pathto))
                    os.mkdir(FWP(pathto))
                    copytree(pathfrom, pathto, skipcruft=skipcruft)
                except:
                    error('replacing file with FROM dir: skipped', pathfrom)
                else:
                    countresolve.files.replaced += 1
                    trace(1, 'replaced file with dir, using FROM', pathfrom)

        elif os.path.isfile(FWP(pathfrom)) and os.path.isdir(FWP(pathto)):
            if not doauto:
                prompt = '[%s] is mixed file/dir in\n\tFROM dir [%s]\n\tTO dir   [%s]'
                prompt %= (namefrom, dirfrom, dirto)
                domanual = askuser(prompt, 'use FROM version file?', namefrom)

            if doauto or domanual:
                try:
                    backupitem(pathto, toroot, dobackup, quiet)
                    shutil.rmtree(FWP(pathto, force=True), onerror=rmtreeworkaround)
                    copyfile(pathfrom, pathto)
                except:
                    error('replacing dir with FROM file: skipped', pathfrom)
                else:
                    countresolve.folders.replaced += 1
                    trace(1, 'replaced dir with file, using FROM', pathfrom)

        else: trace(1, 'ignored unknown types, FROM:', pathfrom, 'TO:', pathto)


    #---------------------------------------------------------------------------
    # [2.1] Remove the __added__.txt file copied over by merge, if any;
    # this could be excluded during comparison, but quicker to special-case.
    # [3.2] Defunct: comparisons now do skip __added__.txt (see note above).
    #---------------------------------------------------------------------------
    """CUT
    if dorestore and dropaddsfile(toroot):
        countresolve.files.created -= 1    # make counts match prior run
        trace(1, indent1 + 'removed __added__.txt file from TO tree root')
    CUT"""




##################################################################################
# UTILITIES
##################################################################################




def getargs():
    """
    ----------------------------------------------------------------------------
    Get command-line arguments, return False if any are invalid.

    [2.0] Added new -backup switch here, and in both launchers;
    [2.0] Do more error checking, catch and report bad paths;
    [2.1] Added "-restore": merge __bkp__ to root, no deletes, trim adds;
    [3.2] deltas.py cmds differ much, and use a custom version of this.

    [3.2] This now drops trailing / or \ on folder args, if any.  Else, they
    break unnested path archtail len calcs in backup.noteaddition() (only).
    Note that this fixes folder args from the GUI and console launchers too.
    ----------------------------------------------------------------------------
    """

    def usageerror(message):
        """
        Display usage, show script's docs?
        """
        print('**%s' % message)
        print('mergeall run cancelled.')
        print('Usage:\n'
                   '\t[py[thon]] mergeall.py dirfrom dirto\n'
                   '\t\t[-report] [-auto]\n'
                   '\t\t[-peek] [-verify]\n'
                   '\t\t[-backup] [-restore] [-quiet]\n'
                   '\t\t[-skipcruft]')
        
        if sys.stdin.isatty() and sys.stdout.isatty():
            if input('More?') in ['y', 'yes']:           # [2.0] for shell, not pipe
                try:
                    help('mergeall')                     # never used by launchers
                except NameError:                        # and absent in frozen exe [3.3]
                    print('help unavailable in this package')
    
    class cmdargs: pass   # a set of attributes
    
    try:
        # required args
        cmdargs.dirfrom = sys.argv[1].rstrip(os.sep)     # [3.2] drop trailing / or \
        cmdargs.dirto   = sys.argv[2].rstrip(os.sep)     # else bad len calcs possible
    except:
        usageerror('Missing dirfrom or dirto paths')
        return False
    else:
        if not os.path.isdir(FWP(cmdargs.dirfrom)):      # unlikely long, until it is
            usageerror('Invalid dirfrom directory path')
            return False
        elif not os.path.isdir(FWP(cmdargs.dirto)):
            usageerror('Invalid dirto directory path')
            return False
        else:
            # optional args
            options = ['-report', '-peek', '-auto', '-verify',
                       '-backup', '-restore', '-quiet', '-skipcruft']
            for option in options:
                setattr(cmdargs, option[1:], False)               
            for option in sys.argv[3:]:
                if option in options:
                    setattr(cmdargs, option[1:], True)
                else:
                    usageerror('Bad command-line option: "%s"' % option)
                    return False

    return cmdargs  # this class is True




def reportdiffs(diffs, uniques, mixes, dorestore, stream=sys.stdout):
    """
    ----------------------------------------------------------------------------
    Report tree differences found to file/stream.
    
    [2.1] For consistency in log files, changed the order here to match that in
    which updates are run and summarized; order matters for renames on Windows
    (deletes must precede adds to make mixed-case renames work).

    [3.0]: In PyInstaller Windows frozen exes ONLY, pprint() can raise excs
    for non-ASCII text because it uses direct sys.stdout.write() calls (print()
    is already redefined); fix by sending a stream argument; see start of file.

    [3.2] The labels here aren't completely appropriate for deltas.py 
    apply runs (or rollbacks), but are close enough to give the idea.

    [3.3] Drop all unique TOs in -restore mode (deltas|rollback); this is 
    useless info--items unmatched in TO during comparison to a deltas set.
    The number of these items is also now omitted in the summary report.
    ----------------------------------------------------------------------------
    """
    
    if hasattr(sys, 'frozen') and RunningOnWindows:
        # fix pprint writes 
        class AsciiFlushStream:
            def write(self, text):
                if not isascii(text):         # defined in same context earlier 
                    text = ascii(text)        # drop non-ascii text in string
                sys.stdout.write(text)
                if text.endswith('\n'):       # force a flush while we're at it
                    sys.stdout.flush()
            def __getattr__(self, attr):            # all others to sys's stream 
                return getattr(sys.stdout, attr)    # though pprint uses write() only

        stream = AsciiFlushStream()    # else default arg

    # [3.3] lists now have extra names: match the prior one-name report 
    diffs = [(namefrom, dirfrom, dirto, why) 
         for (namefrom, nameto, dirfrom, dirto, why) in diffs]

    # ditto, but no why
    mixes = [(namefrom, dirfrom, dirto) 
         for (namefrom, nameto, dirfrom, dirto) in mixes]

    # [3.3] also drop unique TOs in -restore mode: TMI (pointless info)
    if dorestore:
        uniques = uniques.copy()
        uniques['to'] = []

    sepln = ('-' * 79) + '\n'
    print(sepln + 'SAMEFILE DIFFERENCES: (name, dirfrom, dirto, why)', file=stream)
    print('**These items will be replaced in dirto by automatic resolution**\n')  # [1.7]
    pprint.pprint(diffs, stream)

    print(sepln + 'UNIQUE ITEMS IN DIRTO: (names, dirfrom, dirto)', file=stream)
    print('**These items will be deleted from dirto by automatic resolution**\n')
    pprint.pprint(uniques['to'], stream)

    print(sepln + 'UNIQUE ITEMS IN DIRFROM: (names, dirfrom, dirto)', file=stream)
    print('**These items will be copied over to dirto by automatic resolution**\n')
    pprint.pprint(uniques['from'], stream)

    print(sepln + 'MIXED MODE NAMES: (name, dirfrom, dirto)', file=stream)
    print('**These items will be replaced in dirto by automatic resolution**\n')
    pprint.pprint(mixes, stream)




def summaryreport(diffs, uniques, mixes, dorestore=False, deltas=False):
    """
    ----------------------------------------------------------------------------
    [2.0] Show cmp/mod totals at end of run (only, else may be lost in text).
    Also report len of difference lists, to summarize the difference report.
    Counters are in global scope; diffs, uniques, mixes are too, but also passed.
    A dict comp works, but seems too complex: {key: sum(...) for key in uniques}.

    [3.0] Add errors-present indicator, to alert user to search for "**Error";
    this may not be 100% complete, but it's enough to handle most error cases;

    [3.2] The labels here might be mildly misleading for deltas.py apply
    runs (or rollbacks in general), but are close enough to give the point.

    [3.3] mar22: If 'dorestore', show 'uniqueto' in Differences as 'n/a'; it's
    all items in TO unmatched by comparison to deltas|backup set which really 
    doesn't mean anything useful, and is arguably confusing.  This could be
    the number of lines in the __added__.txt file, but the extra work is moot.
    The pertains only to mergeall.py -restore runs, not deltas.py compare+saves.
    Also mod the last label to 'Saved' for deltas.py runs: nothing is 'Changed'.
    ----------------------------------------------------------------------------
    """

    global anyErrorsReported
    trace(1, '-' * 79, '\n*Summary')
    trace(1, 'Compared    =>', countcompare)
            
    numuniqueto    = sum(len(names) for (names, dirfrom, dirto) in uniques['to'])
    numuniquefrom  = sum(len(names) for (names, dirfrom, dirto) in uniques['from'])
    numuniqueto    = 'n/a' if dorestore else '%d' % numuniqueto   # [3.3] deltas|rollback
    trace(1, 'Differences => '
             'samefile: %d, uniqueto: %s, uniquefrom: %d, mixedmode: %d' %
             (len(diffs), numuniqueto, numuniquefrom, len(mixes)))

    label = 'Changed:\n' if not deltas else 'Saved:\n'    # [3.3]
    trace(1, label + str(countresolve))

    if anyErrorsReported or cpall.anyErrorsReported:      # [3.0]
        trace(1, '**There are error messages in the log file above: see "**Error"')

    trace(1, '-' * 79)
    trace(1, 'Finished.')    # add \n for GUI, else last line hidden after resizes [2.0]
                             # nevermind: new enable/disable "GO" model fixes this [3.0]




##################################################################################
# MAIN LOGIC
##################################################################################




if __name__ == '__main__':
    trace(1, 'mergeall %.1f starting' % VERSION)    # [3.2]

    import time
    gettime = (time.perf_counter if hasattr(time, 'perf_counter') else
              (time.clock if RunningOnWindows else time.time)) 

    # get and verify parameters from command line
    cmdargs = getargs()
    if not cmdargs:
        sys.exit(1)


    #---------------------------------------------------------------------------
    # COMPARISON PHASE: collect differences
    #---------------------------------------------------------------------------
    
    trace(1, '-' * 79, '\n*Collecting tree differences')
    if cmdargs.skipcruft:
        trace(1, 'Skipping system cruft (metadata) files in both FROM and TO')

    diffs   = []                         
    uniques = {'from': [], 'to': []}     # lists/dict changed in-place by walker
    mixes   = []
    starttime = gettime()
    try:
        comparetrees(cmdargs.dirfrom, cmdargs.dirto,       # from/to roots
                     diffs, uniques, mixes,                # noted differences
                     cmdargs.peek,                         # file reads?
                     cmdargs.skipcruft,                    # exclude cruft files [3.0]
                     cmdargs.quiet,                        # omit normalization msgs [3.3]
                     skips=['__bkp__', '__added__.txt'])   # exclude top-level specials [2.0] [3.2]

    except Exception as Why:
        # [3.0] friendlier message on comparison failure exits
        print('**Error during comparison phase\n'
              '...The mergeall run was terminated by a folder comparisons error,\n'
              '...to avoid a partial merge.  No data was changed.  Please resolve\n'
              '...the following Python exception before rerunning mergeall against\n'
              '...the same folders:')
        print(Why.__class__.__name__, Why)
        print('\n...A detailed Python traceback follows:')
        import traceback
        traceback.print_exc()
        sys.exit(1)
    else:
        trace(1, 'Phase runtime:', gettime() - starttime)  # [2.2] time phases

    trace(1, '-' * 79, '\n*Reporting tree differences')
    reportdiffs(diffs, uniques, mixes, cmdargs.restore)    # handles own exceptions
    if cmdargs.report:
        # report and exit
        summaryreport(diffs, uniques, mixes, cmdargs.restore)   # show totals [2.0] [3.3]
        sys.exit(0)


    #---------------------------------------------------------------------------
    # RESOLUTION PHASE: reconcile differences
    #---------------------------------------------------------------------------
    
    trace(1, '-' * 79, '\n*Resolving tree differences')
    if cmdargs.skipcruft:
        trace(1, 'Skipping system cruft (metadata) files in FROM folders')

    starttime = gettime()
    mergetrees(diffs, uniques, mixes,                      # noted differences
               cmdargs.auto,                               # make changes? else ask
               cmdargs.backup,  cmdargs.dirto,             # save items replaced/removed [2.0]
               cmdargs.restore, cmdargs.dirfrom,           # keep unique TO, undo adds [2.1]
               cmdargs.quiet,                              # suppress backing-up messages [2.4]
               cmdargs.skipcruft)                          # skip cruft files in copytree [3.0]
    trace(1, 'Phase runtime:', gettime() - starttime)      # [2.2] time phases
    
    # [3.2] keep, though this doesn't work for delta-set applies (or rollbacks!)
    if cmdargs.verify:
        # post verify step
        trace(1, '-' * 79 + '\n*Diffall run follows\n' + '-' * 79)
        starttime = gettime()
        cmd = os.popen('diffall.py %s %s' % (cmdargs.dirfrom, cmdargs.dirto))
        for line in cmd: print(line, end='')                 # or save to a file?
        trace(1, 'Phase runtime:', gettime() - starttime)    # [2.2] time phases

    summaryreport(diffs, uniques, mixes, cmdargs.restore)    # show totals [2.0] [3.3]



[Home page] Books Code Blog Python Author Train Find ©M.Lutz