File: shrinkpix/collect-unshrunk-images.py
#!/usr/bin/env python3 """ ================================================================================= collect-unshrunk-images.py - extract all original images from a folder tree. This is a utility script for shrinkpix.py, and shares the same date, author, license, etc., as that script. See https://learning-python.com/shrinkpix/. --------------------------------------------------------------------------------- PURPOSE --------------------------------------------------------------------------------- Run this script to pull all (or new) saved original images out of a tree. When first run, it moves all of the original-image backup subdirs created by shrinkpix to their same paths in a separate folder at the top of the tree. The backup folders by default show up at their root-relative paths in the automatically created ALLBACKUPSDIR ('_shrinkpix-all-originals') at the tree's root. ALLBACKUPSDIR can also be located elsewhere as an option. The net effect extracts and isolates the original unshrunk images, and removes their backups from the folder tree. This may be useful to retain the unshrunk originals, but exclude them from uploaded website content. When rerun after shrinking new images in a tree, this script simply copies the new originals to existing backup folders in ALLBACKUPSDIR, and removes the backups folders from the source tree. This allows you to shrink and collect a full tree, and later shrink and collect individual images added. Recollecting newly shrunk images this way assumes that you haven't renamed ALLBACKUPSDIR (else new backup folders will be moved to a new root folder). --------------------------------------------------------------------------------- USAGE --------------------------------------------------------------------------------- Command line: $ python3 <script> <folderpath> <collectpath>? -listonly? -toplevel? Run this with one required argument, <folderpath>, the root pathname of the tree whose images are to be collected, and the normal location of the ALLBACKUPSDIR tree. As an option, you can also provide a second pathname, <collectpath>, to serve as the location of the ALLBACKUPSDIR tree; this may be helpful if you've moved that tree before a recollect, or wish to route all collections to a folder outside your main content tree. The command line also allows -listonly and -toplevel options to be passed in. If provided, -listonly works the same as setting LISTONLY=True, showing backups to be collected without collecting them; -toplevel works the same as setting TOPLEVEL=True, limiting the collection to <folderpath>'s top level. If the <collectpath> argument isn't given, it defaults to <folderpath>. If -listonly or -toplevel aren't given, they default to this script's LISTONLY and TOPLEVEL settings, respectively. See also "Restore tip" ahead for restoring originals from an ALLBACKUPSDIR. Subtlety: if you shrink new images nested in <folderpath> and rerun the collector, be sure to use the same tree root for <folderpath> here, not the new images' nested folders. This is required so that relative paths in <collectpath> be the same as in prior runs. --------------------------------------------------------------------------------- NOTES --------------------------------------------------------------------------------- Toplevel collections: If you used -toplevel for a folder in the shrink script, you probably want to use it for the same folder here too. Here, it prevents the collector from collecting any originals backed up in nested subfolders below the folder given. If those subfolders are managed separately, they should in most cases not be collected along with their ancestor, unless you really want this to happen. In -toplevel mode, this script is no more than a simple manual move: it simply relocates a single backups folder within its parent, or moves it to the folder given as an argument. It may be just as easy to run these steps manually. Note, however, that a full tree walk here always skips other collection folders, but not uncollected backup folders; this is why it's just subtle (and dangerous) enough to call out with an option. Handling duplicates: Though atypical, it's not impossible that this script will recollect an image already collected, which duplicates names in the collection folder. To resolve, such duplicates are renamed as in the shrinker script, with a "__N" counter just before their extension. This can crop up if an image was left too large by a shrink, or you've added an image of the same name again; in these contexts, the collector is just emulating what the shrinker would have done for backups kept in the source tree. If this seems too automatic, collect newly shrunk files manually. Note: this script has nothing like the restore's DROPDUPS; all items in backups are retained. Restore tip: If you've run this script and still need to restore your originals, run restore-unshrunk-images.py on ALLBACKUPSDIR directly, and then merge the ALLBACKUPSDIR folder back to the original tree. This works because the restore script moves originals up one level from their BACKUPSDIR folders, and removes the BACKUPSDIR levels, leaving just original images at their original folder paths in ALLBACKUPSDIR. In other words, this collapses and removes the backup subfolders in the all-backups collection tree. The merge from ALLBACKUPSDIR can be done with some file explorers, or a Unix "rsync" command line (for rsync on Windows, see Cygwin, Windows 10's Linux subsystem, or other sources). Here's the full incantation: $ py3 <code>/restore-unshrunk-images.py <collectpath>/_shrinkpix-all-originals/ $ rsync -avh <collectpath>/_shrinkpix-all-originals/ <folderpath>/ Note that <collectpath> and <folderpath> are the same here, unless you gave the former as an option in collector runs earlier. Either way, <folderpath> matches what you used in earlier collector runs, and the trailing "/" matters on the source in this rsync (to copy contents, not the folder; it matters on destinations only when copying files). Afterward, you probably also want to manually move, remove, or empty the all-backups folder; it's retained for safety (these are your originals), but may cause backup-tree anomalies if you collect to this same folder again (new BACKUPSDIRs will appear alongside collapsed originals in ALLBACKUPSDIR, and duplicates may crop up). For example, delete or move like this on Unix: $ rm -rf <collectpath>/_shrinkpix-all-originals $ mv -f <collectpath>/_shrinkpix-all-originals <somewhere else> Also note: -toplevel can be used for ALLBACKUPSDIR restores, but use cases are unknown (you want to collapse the entire tree for the rsync, not the top). See the examples/ folder here for a more detailed demo of this technique. *Caution*: restoring from a separate collection tree this way works only if the structure of the source tree has not changed in a way that invalidates collection-folder paths. It's okay to add new folders, and folders having no backed-up originals can be freely changed; but if you move, rename, or delete a source-tree folder that has backups, applying the collection tree's updates is haphazard, and may damage the source tree. This is one reason the restore script has no option to restore from collections directly. Shrinker, beware. Miscellaneous notes: - This script and step might be unnecessary if the shrinker script had a "backup-to" option. This wasn't implemented, because it seems to add too much complexity, justifying use cases are unclear, and this wouldn't work when shrinking individual files (root-relative paths are unclear). See shrinkpix.py's CAVEATS->Design for more notes on this. - It may be simpler and faster, and would remove subdir nesting, to just: os.rename(folder, pathtosavefolder) But this assume POSIX rename, and won't work on Windows; use shutil.move(), which keeps the nested BACKUPSDIR folder level, and does copy+move if needed. For details, see: https://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html, https://docs.python.org/3/library/shutil.html#shutil.move, https://docs.python.org/3/library/os.html#os.rename. ================================================================================= """ import sys, os, shutil from shrinkpix import BACKUPSDIR # assumed same as at tree's shrink time from shrinkpix import ALLBACKUPSDIR # ditto: made here, skip in shrink+restore from shrinkpix import askyesno # [1.3] don't print traceback on ctrl+c trace = print #===================================================================================== # Configure #===================================================================================== LISTONLY = False # True=show backup dirs, but don't move them TOPLEVEL = False # True=limit restore to folder top-level, skipping any subfolders #===================================================================================== # Setup #===================================================================================== command = '<script> <folderpath> <collectpath>? -listonly? -toplevel?' confirm = 'This script collects and removes saved original images; proceed?' # options: any position if '-listonly' in sys.argv: # don't make changes LISTONLY = True # else use setting's value sys.argv.remove('-listonly') if '-toplevel' in sys.argv: # skip nested subdirs TOPLEVEL = True # else use setting's value sys.argv.remove('-toplevel') # trees: positional (and assume -xxx is not a folder) walkroot = '' # [1.3] not None: os.path.join fails if len(sys.argv) > 1: walkroot = os.path.abspath(sys.argv.pop(1)) # move all BACKUPSDIR in this, if len(sys.argv) == 1: saveroot = os.path.join(walkroot, ALLBACKUPSDIR) # to its toplevel ALLBACKUPSDIR, elif not sys.argv[1].startswith('-'): altsave = os.path.abspath(sys.argv.pop(1)) # or this ALLBACKUPSDIR if arg, saveroot = os.path.join(altsave, ALLBACKUPSDIR) # at paths relative to walkroot. if not walkroot or not os.path.isdir(walkroot) or len(sys.argv) > 1: print('Usage:', command) # no folder, not a folder, extras? sys.exit() # minimize nesting if (not LISTONLY) and askyesno(confirm).lower() not in ['y', 'yes']: print('Run cancelled.') sys.exit() #===================================================================================== # Collect #===================================================================================== # walk the source tree nummoved = numfound = 0 for (folder, subs, files) in os.walk(walkroot, topdown=True): if ALLBACKUPSDIR in subs: # don't collect from a collection folder of a prior run subs.remove(ALLBACKUPSDIR) # skip later in walk if BACKUPSDIR in subs: # collect every backup folder reached during the tree walk subs.remove(BACKUPSDIR) # prune from walk: backupsub = os.path.join(folder, BACKUPSDIR) # moved or deleted numfound += 1 pathfromwalkroot = folder[len(walkroot)+1:] # relative to root pathtosavefolder = os.path.join(saveroot, pathfromwalkroot) trace('Collecting', backupsub, '\n' + ' '*6 + 'into', pathtosavefolder) if not LISTONLY: nummoved += 1 savesub = os.path.join(pathtosavefolder, BACKUPSDIR) if not os.path.exists(savesub): # move new backup folder to collection folder, as a nested subfolder os.makedirs(pathtosavefolder, exist_ok=True) shutil.move(backupsub, pathtosavefolder) else: # move new items in backup folder to existing backup folder (recollect) for item in os.listdir(backupsub): itempath = os.path.join(backupsub, item) if not os.path.isfile(itempath): continue # skip subdirs - remove will fail savepath = os.path.join(savesub, item) if os.path.exists(savepath): # recollecting an already-collected original: rename with "__N" savehead, saveext = os.path.splitext(savepath) copynum = 2 while True: savepath = savehead + '__' + str(copynum) + saveext if not os.path.exists(savepath): break copynum += 1 trace(' '*11 + '+ ' + itempath) os.rename(itempath, savepath) try: os.rmdir(backupsub) except Exception as why: print(why) print('**Cannot remove backup folder', backupsub) # after backupsdir in subs if TOPLEVEL: break # don't collect in subdirs if they're managed separately # post-walk wrap-up print('Finished: number subfolders moved:', nummoved)