File: shrinkpix/collect-unshrunk-images.py
#!/usr/bin/env python3
"""
=================================================================================
collect-unshrunk-images.py - extract all original images from a folder tree.
This is a utility script for shrinkpix.py, and shares the same date, author,
license, etc., as that script. See https://learning-python.com/shrinkpix/.
---------------------------------------------------------------------------------
PURPOSE
---------------------------------------------------------------------------------
Run this script to pull all (or new) saved original images out of a tree.
When first run, it moves all of the original-image backup subdirs created
by shrinkpix to their same paths in a separate folder at the top of the tree.
The backup folders by default show up at their root-relative paths in the
automatically created ALLBACKUPSDIR ('_shrinkpix-all-originals') at the
tree's root. ALLBACKUPSDIR can also be located elsewhere as an option.
The net effect extracts and isolates the original unshrunk images, and
removes their backups from the folder tree. This may be useful to retain
the unshrunk originals, but exclude them from uploaded website content.
When rerun after shrinking new images in a tree, this script simply copies
the new originals to existing backup folders in ALLBACKUPSDIR, and removes
the backups folders from the source tree. This allows you to shrink and
collect a full tree, and later shrink and collect individual images added.
Recollecting newly shrunk images this way assumes that you haven't renamed
ALLBACKUPSDIR (else new backup folders will be moved to a new root folder).
---------------------------------------------------------------------------------
USAGE
---------------------------------------------------------------------------------
Command line:
$ python3 <script> <folderpath> <collectpath>? -listonly? -toplevel?
Run this with one required argument, <folderpath>, the root pathname of
the tree whose images are to be collected, and the normal location of the
ALLBACKUPSDIR tree. As an option, you can also provide a second pathname,
<collectpath>, to serve as the location of the ALLBACKUPSDIR tree; this may
be helpful if you've moved that tree before a recollect, or wish to route
all collections to a folder outside your main content tree.
The command line also allows -listonly and -toplevel options to be passed
in. If provided, -listonly works the same as setting LISTONLY=True, showing
backups to be collected without collecting them; -toplevel works the same as
setting TOPLEVEL=True, limiting the collection to <folderpath>'s top level.
If the <collectpath> argument isn't given, it defaults to <folderpath>.
If -listonly or -toplevel aren't given, they default to this script's
LISTONLY and TOPLEVEL settings, respectively. See also "Restore tip"
ahead for restoring originals from an ALLBACKUPSDIR.
Subtlety: if you shrink new images nested in <folderpath> and rerun the
collector, be sure to use the same tree root for <folderpath> here, not
the new images' nested folders. This is required so that relative paths
in <collectpath> be the same as in prior runs.
---------------------------------------------------------------------------------
NOTES
---------------------------------------------------------------------------------
Toplevel collections:
If you used -toplevel for a folder in the shrink script, you probably want
to use it for the same folder here too. Here, it prevents the collector from
collecting any originals backed up in nested subfolders below the folder given.
If those subfolders are managed separately, they should in most cases not be
collected along with their ancestor, unless you really want this to happen.
In -toplevel mode, this script is no more than a simple manual move: it
simply relocates a single backups folder within its parent, or moves it
to the folder given as an argument. It may be just as easy to run these
steps manually. Note, however, that a full tree walk here always skips
other collection folders, but not uncollected backup folders; this is why
it's just subtle (and dangerous) enough to call out with an option.
Handling duplicates:
Though atypical, it's not impossible that this script will recollect an
image already collected, which duplicates names in the collection folder.
To resolve, such duplicates are renamed as in the shrinker script, with
a "__N" counter just before their extension. This can crop up if an image
was left too large by a shrink, or you've added an image of the same name
again; in these contexts, the collector is just emulating what the shrinker
would have done for backups kept in the source tree. If this seems too
automatic, collect newly shrunk files manually. Note: this script has
nothing like the restore's DROPDUPS; all items in backups are retained.
Restore tip:
If you've run this script and still need to restore your originals, run
restore-unshrunk-images.py on ALLBACKUPSDIR directly, and then merge the
ALLBACKUPSDIR folder back to the original tree. This works because the
restore script moves originals up one level from their BACKUPSDIR folders,
and removes the BACKUPSDIR levels, leaving just original images at their
original folder paths in ALLBACKUPSDIR. In other words, this collapses
and removes the backup subfolders in the all-backups collection tree.
The merge from ALLBACKUPSDIR can be done with some file explorers, or a
Unix "rsync" command line (for rsync on Windows, see Cygwin, Windows 10's
Linux subsystem, or other sources). Here's the full incantation:
$ py3 <code>/restore-unshrunk-images.py <collectpath>/_shrinkpix-all-originals/
$ rsync -avh <collectpath>/_shrinkpix-all-originals/ <folderpath>/
Note that <collectpath> and <folderpath> are the same here, unless you
gave the former as an option in collector runs earlier. Either way,
<folderpath> matches what you used in earlier collector runs, and the
trailing "/" matters on the source in this rsync (to copy contents, not
the folder; it matters on destinations only when copying files).
Afterward, you probably also want to manually move, remove, or empty the
all-backups folder; it's retained for safety (these are your originals),
but may cause backup-tree anomalies if you collect to this same folder again
(new BACKUPSDIRs will appear alongside collapsed originals in ALLBACKUPSDIR,
and duplicates may crop up). For example, delete or move like this on Unix:
$ rm -rf <collectpath>/_shrinkpix-all-originals
$ mv -f <collectpath>/_shrinkpix-all-originals <somewhere else>
Also note: -toplevel can be used for ALLBACKUPSDIR restores, but use cases
are unknown (you want to collapse the entire tree for the rsync, not the top).
See the examples/ folder here for a more detailed demo of this technique.
*Caution*: restoring from a separate collection tree this way works only if
the structure of the source tree has not changed in a way that invalidates
collection-folder paths. It's okay to add new folders, and folders having no
backed-up originals can be freely changed; but if you move, rename, or delete
a source-tree folder that has backups, applying the collection tree's updates
is haphazard, and may damage the source tree. This is one reason the restore
script has no option to restore from collections directly. Shrinker, beware.
Miscellaneous notes:
- This script and step might be unnecessary if the shrinker script had a
"backup-to" option. This wasn't implemented, because it seems to add
too much complexity, justifying use cases are unclear, and this wouldn't
work when shrinking individual files (root-relative paths are unclear).
See shrinkpix.py's CAVEATS->Design for more notes on this.
- It may be simpler and faster, and would remove subdir nesting, to just:
os.rename(folder, pathtosavefolder)
But this assume POSIX rename, and won't work on Windows; use shutil.move(),
which keeps the nested BACKUPSDIR folder level, and does copy+move if needed.
For details, see:
https://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html,
https://docs.python.org/3/library/shutil.html#shutil.move,
https://docs.python.org/3/library/os.html#os.rename.
=================================================================================
"""
import sys, os, shutil
from shrinkpix import BACKUPSDIR # assumed same as at tree's shrink time
from shrinkpix import ALLBACKUPSDIR # ditto: made here, skip in shrink+restore
from shrinkpix import askyesno # [1.3] don't print traceback on ctrl+c
trace = print
#=====================================================================================
# Configure
#=====================================================================================
LISTONLY = False # True=show backup dirs, but don't move them
TOPLEVEL = False # True=limit restore to folder top-level, skipping any subfolders
#=====================================================================================
# Setup
#=====================================================================================
command = '<script> <folderpath> <collectpath>? -listonly? -toplevel?'
confirm = 'This script collects and removes saved original images; proceed?'
# options: any position
if '-listonly' in sys.argv: # don't make changes
LISTONLY = True # else use setting's value
sys.argv.remove('-listonly')
if '-toplevel' in sys.argv: # skip nested subdirs
TOPLEVEL = True # else use setting's value
sys.argv.remove('-toplevel')
# trees: positional (and assume -xxx is not a folder)
walkroot = '' # [1.3] not None: os.path.join fails
if len(sys.argv) > 1:
walkroot = os.path.abspath(sys.argv.pop(1)) # move all BACKUPSDIR in this,
if len(sys.argv) == 1:
saveroot = os.path.join(walkroot, ALLBACKUPSDIR) # to its toplevel ALLBACKUPSDIR,
elif not sys.argv[1].startswith('-'):
altsave = os.path.abspath(sys.argv.pop(1)) # or this ALLBACKUPSDIR if arg,
saveroot = os.path.join(altsave, ALLBACKUPSDIR) # at paths relative to walkroot.
if not walkroot or not os.path.isdir(walkroot) or len(sys.argv) > 1:
print('Usage:', command) # no folder, not a folder, extras?
sys.exit() # minimize nesting
if (not LISTONLY) and askyesno(confirm).lower() not in ['y', 'yes']:
print('Run cancelled.')
sys.exit()
#=====================================================================================
# Collect
#=====================================================================================
# walk the source tree
nummoved = numfound = 0
for (folder, subs, files) in os.walk(walkroot, topdown=True):
if ALLBACKUPSDIR in subs:
# don't collect from a collection folder of a prior run
subs.remove(ALLBACKUPSDIR) # skip later in walk
if BACKUPSDIR in subs:
# collect every backup folder reached during the tree walk
subs.remove(BACKUPSDIR) # prune from walk:
backupsub = os.path.join(folder, BACKUPSDIR) # moved or deleted
numfound += 1
pathfromwalkroot = folder[len(walkroot)+1:] # relative to root
pathtosavefolder = os.path.join(saveroot, pathfromwalkroot)
trace('Collecting', backupsub, '\n' + ' '*6 + 'into', pathtosavefolder)
if not LISTONLY:
nummoved += 1
savesub = os.path.join(pathtosavefolder, BACKUPSDIR)
if not os.path.exists(savesub):
# move new backup folder to collection folder, as a nested subfolder
os.makedirs(pathtosavefolder, exist_ok=True)
shutil.move(backupsub, pathtosavefolder)
else:
# move new items in backup folder to existing backup folder (recollect)
for item in os.listdir(backupsub):
itempath = os.path.join(backupsub, item)
if not os.path.isfile(itempath):
continue # skip subdirs - remove will fail
savepath = os.path.join(savesub, item)
if os.path.exists(savepath):
# recollecting an already-collected original: rename with "__N"
savehead, saveext = os.path.splitext(savepath)
copynum = 2
while True:
savepath = savehead + '__' + str(copynum) + saveext
if not os.path.exists(savepath):
break
copynum += 1
trace(' '*11 + '+ ' + itempath)
os.rename(itempath, savepath)
try:
os.rmdir(backupsub)
except Exception as why:
print(why)
print('**Cannot remove backup folder', backupsub)
# after backupsdir in subs
if TOPLEVEL: break # don't collect in subdirs if they're managed separately
# post-walk wrap-up
print('Finished: number subfolders moved:', nummoved)