File: shrinkpix/collect-unshrunk-images.py

#!/usr/bin/env python3
"""
=================================================================================
collect-unshrunk-images.py - extract all original images from a folder tree.

This is a utility script for shrinkpix.py, and shares the same date, author,
license, etc., as that script.  See https://learning-python.com/shrinkpix/.

---------------------------------------------------------------------------------
PURPOSE
---------------------------------------------------------------------------------

  Run this script to pull all (or new) saved original images out of a tree.
  When first run, it moves all of the original-image backup subdirs created 
  by shrinkpix to their same paths in a separate folder at the top of the tree.
  The backup folders by default show up at their root-relative paths in the 
  automatically created ALLBACKUPSDIR ('_shrinkpix-all-originals') at the 
  tree's root.  ALLBACKUPSDIR can also be located elsewhere as an option.

  The net effect extracts and isolates the original unshrunk images, and 
  removes their backups from the folder tree.  This may be useful to retain
  the unshrunk originals, but exclude them from uploaded website content.

  When rerun after shrinking new images in a tree, this script simply copies 
  the new originals to existing backup folders in ALLBACKUPSDIR, and removes 
  the backups folders from the source tree.  This allows you to shrink and 
  collect a full tree, and later shrink and collect individual images added.
  Recollecting newly shrunk images this way assumes that you haven't renamed 
  ALLBACKUPSDIR (else new backup folders will be moved to a new root folder).

---------------------------------------------------------------------------------
USAGE
---------------------------------------------------------------------------------

  Command line: 
      $ python3 <script> <folderpath> <collectpath>? -listonly? -toplevel?

  Run this with one required argument, <folderpath>, the root pathname of 
  the tree whose images are to be collected, and the normal location of the 
  ALLBACKUPSDIR tree.  As an option, you can also provide a second pathname, 
  <collectpath>, to serve as the location of the ALLBACKUPSDIR tree; this may 
  be helpful if you've moved that tree before a recollect, or wish to route 
  all collections to a folder outside your main content tree. 

  The command line also allows -listonly and -toplevel options to be passed
  in.  If provided, -listonly works the same as setting LISTONLY=True, showing
  backups to be collected without collecting them; -toplevel works the same as
  setting TOPLEVEL=True, limiting the collection to <folderpath>'s top level.
  
  If the <collectpath> argument isn't given, it defaults to <folderpath>.
  If -listonly or -toplevel aren't given, they default to this script's 
  LISTONLY and TOPLEVEL settings, respectively.  See also "Restore tip" 
  ahead for restoring originals from an ALLBACKUPSDIR.

  Subtlety: if you shrink new images nested in <folderpath> and rerun the 
  collector, be sure to use the same tree root for <folderpath> here, not 
  the new images' nested folders.  This is required so that relative paths 
  in <collectpath> be the same as in prior runs.  

---------------------------------------------------------------------------------
NOTES
---------------------------------------------------------------------------------

Toplevel collections:
  If you used -toplevel for a folder in the shrink script, you probably want
  to use it for the same folder here too.  Here, it prevents the collector from 
  collecting any originals backed up in nested subfolders below the folder given.
  If those subfolders are managed separately, they should in most cases not be 
  collected along with their ancestor, unless you really want this to happen.

  In -toplevel mode, this script is no more than a simple manual move: it 
  simply relocates a single backups folder within its parent, or moves it 
  to the folder given as an argument.  It may be just as easy to run these 
  steps manually.  Note, however, that a full tree walk here always skips 
  other collection folders, but not uncollected backup folders; this is why
  it's just subtle (and dangerous) enough to call out with an option.
 
Handling duplicates:
  Though atypical, it's not impossible that this script will recollect an 
  image already collected, which duplicates names in the collection folder.
  To resolve, such duplicates are renamed as in the shrinker script, with
  a "__N" counter just before their extension.  This can crop up if an image
  was left too large by a shrink, or you've added an image of the same name
  again; in these contexts, the collector is just emulating what the shrinker
  would have done for backups kept in the source tree.  If this seems too 
  automatic, collect newly shrunk files manually.  Note: this script has
  nothing like the restore's DROPDUPS; all items in backups are retained.

Restore tip: 
  If you've run this script and still need to restore your originals, run 
  restore-unshrunk-images.py on ALLBACKUPSDIR directly, and then merge the
  ALLBACKUPSDIR folder back to the original tree.  This works because the
  restore script moves originals up one level from their BACKUPSDIR folders, 
  and removes the BACKUPSDIR levels, leaving just original images at their
  original folder paths in ALLBACKUPSDIR.  In other words, this collapses 
  and removes the backup subfolders in the all-backups collection tree.
  
  The merge from ALLBACKUPSDIR can be done with some file explorers, or a 
  Unix "rsync" command line (for rsync on Windows, see Cygwin, Windows 10's
  Linux subsystem, or other sources).  Here's the full incantation:
  
    $ py3 <code>/restore-unshrunk-images.py <collectpath>/_shrinkpix-all-originals/
    $ rsync -avh <collectpath>/_shrinkpix-all-originals/ <folderpath>/

  Note that <collectpath> and <folderpath> are the same here, unless you 
  gave the former as an option in collector runs earlier.  Either way, 
  <folderpath> matches what you used in earlier collector runs, and the 
  trailing "/" matters on the source in this rsync (to copy contents, not
  the folder; it matters on destinations only when copying files).

  Afterward, you probably also want to manually move, remove, or empty the 
  all-backups folder; it's retained for safety (these are your originals), 
  but may cause backup-tree anomalies if you collect to this same folder again
  (new BACKUPSDIRs will appear alongside collapsed originals in ALLBACKUPSDIR,
  and duplicates may crop up).  For example, delete or move like this on Unix:

    $ rm -rf <collectpath>/_shrinkpix-all-originals
    $ mv -f <collectpath>/_shrinkpix-all-originals <somewhere else>

  Also note: -toplevel can be used for ALLBACKUPSDIR restores, but use cases 
  are unknown (you want to collapse the entire tree for the rsync, not the top).
  See the examples/ folder here for a more detailed demo of this technique.

  *Caution*: restoring from a separate collection tree this way works only if 
  the structure of the source tree has not changed in a way that invalidates
  collection-folder paths.  It's okay to add new folders, and folders having no 
  backed-up originals can be freely changed; but if you move, rename, or delete 
  a source-tree folder that has backups, applying the collection tree's updates
  is haphazard, and may damage the source tree.  This is one reason the restore
  script has no option to restore from collections directly.  Shrinker, beware.  
 
Miscellaneous notes:
  - This script and step might be unnecessary if the shrinker script had a 
    "backup-to" option.  This wasn't implemented, because it seems to add
    too much complexity, justifying use cases are unclear, and this wouldn't
    work when shrinking individual files (root-relative paths are unclear). 
    See shrinkpix.py's CAVEATS->Design for more notes on this.

  - It may be simpler and faster, and would remove subdir nesting, to just:
       os.rename(folder, pathtosavefolder)
    But this assume POSIX rename, and won't work on Windows; use shutil.move(),
    which keeps the nested BACKUPSDIR folder level, and does copy+move if needed.
    For details, see: 
    https://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html,
    https://docs.python.org/3/library/shutil.html#shutil.move,
    https://docs.python.org/3/library/os.html#os.rename.

=================================================================================
"""

import sys, os, shutil
from shrinkpix import BACKUPSDIR           # assumed same as at tree's shrink time
from shrinkpix import ALLBACKUPSDIR        # ditto: made here, skip in shrink+restore
from shrinkpix import askyesno             # [1.3] don't print traceback on ctrl+c
trace = print


#=====================================================================================
# Configure
#=====================================================================================

LISTONLY = False    # True=show backup dirs, but don't move them 
TOPLEVEL = False    # True=limit restore to folder top-level, skipping any subfolders


#=====================================================================================
# Setup
#=====================================================================================

command = '<script> <folderpath> <collectpath>? -listonly? -toplevel?'
confirm = 'This script collects and removes saved original images; proceed?'

# options: any position
if '-listonly' in sys.argv:              # don't make changes
    LISTONLY = True                      # else use setting's value
    sys.argv.remove('-listonly')

if '-toplevel' in sys.argv:              # skip nested subdirs 
    TOPLEVEL = True                      # else use setting's value
    sys.argv.remove('-toplevel')

# trees: positional (and assume -xxx is not a folder)
walkroot = ''   # [1.3] not None: os.path.join fails
if len(sys.argv) > 1:
    walkroot = os.path.abspath(sys.argv.pop(1))         # move all BACKUPSDIR in this,

if len(sys.argv) == 1:
    saveroot = os.path.join(walkroot, ALLBACKUPSDIR)    # to its toplevel ALLBACKUPSDIR,
elif not sys.argv[1].startswith('-'):
    altsave  = os.path.abspath(sys.argv.pop(1))         # or this ALLBACKUPSDIR if arg,
    saveroot = os.path.join(altsave, ALLBACKUPSDIR)     # at paths relative to walkroot.

if not walkroot or not os.path.isdir(walkroot) or len(sys.argv) > 1:
    print('Usage:', command)                            # no folder, not a folder, extras?
    sys.exit()                                          # minimize nesting

if (not LISTONLY) and askyesno(confirm).lower() not in ['y', 'yes']:
    print('Run cancelled.') 
    sys.exit()


#=====================================================================================
# Collect
#=====================================================================================

# walk the source tree
nummoved = numfound = 0
for (folder, subs, files) in os.walk(walkroot, topdown=True):

    if ALLBACKUPSDIR in subs:
        # don't collect from a collection folder of a prior run
        subs.remove(ALLBACKUPSDIR)  # skip later in walk

    if BACKUPSDIR in subs:
        # collect every backup folder reached during the tree walk
        subs.remove(BACKUPSDIR)                                       # prune from walk:
        backupsub = os.path.join(folder, BACKUPSDIR)                  # moved or deleted

        numfound += 1
        pathfromwalkroot = folder[len(walkroot)+1:]                   # relative to root
        pathtosavefolder = os.path.join(saveroot, pathfromwalkroot)
        trace('Collecting', backupsub, '\n' + ' '*6 + 'into', pathtosavefolder) 
        if not LISTONLY:
            nummoved += 1
            savesub = os.path.join(pathtosavefolder, BACKUPSDIR)

            if not os.path.exists(savesub):
                # move new backup folder to collection folder, as a nested subfolder       
                os.makedirs(pathtosavefolder, exist_ok=True)
                shutil.move(backupsub, pathtosavefolder)

            else:
                # move new items in backup folder to existing backup folder (recollect)
                for item in os.listdir(backupsub):
                    itempath = os.path.join(backupsub, item)
                    if not os.path.isfile(itempath):
                        continue  # skip subdirs - remove will fail

                    savepath = os.path.join(savesub, item)
                    if os.path.exists(savepath):
                        # recollecting an already-collected original: rename with "__N"
                        savehead, saveext = os.path.splitext(savepath)
                        copynum = 2
                        while True:
                            savepath = savehead + '__' + str(copynum) + saveext
                            if not os.path.exists(savepath):
                                break
                            copynum += 1

                    trace(' '*11 + '+ ' + itempath)
                    os.rename(itempath, savepath)

                try:
                    os.rmdir(backupsub)
                except Exception as why:
                    print(why)
                    print('**Cannot remove backup folder', backupsub)

    # after backupsdir in subs
    if TOPLEVEL: break    # don't collect in subdirs if they're managed separately

# post-walk wrap-up
print('Finished: number subfolders moved:', nummoved)



[Home page] Books Code Blog Python Author Train Find ©M.Lutz