tagpix — Combine Your Photos for Easy Access |
This is the tagpix user guide. It includes an overview, usage instructions, and version changes. Whether you consider yourself a programmer or end user, you'll find resources to help get you started organizing photos with tagpix here.
Because tagpix by default moves and renames photos, users are encouraged to read this guide first—especially its caution—before running tagpix on valued photo collections. For this program's license and author, see its main script. For screenshots, click the image above. For code and examples, see the install folder. To download this program, visit its web page.
tagpix is a photo organizer that merges and labels your photos for convenient access. It collects, renames, and sorts them into a normalized folder structure, resolving duplicate content and filenames automatically in the process. This section introduces the basics of its roles and operation.
If your digital photo collection has become scattered over many folders; uses filenames that are not unique because of their origin on multiple cameras; hosts modification dates that reflect retouches instead of events; or contains arbitrary duplicates, tagpix may be the photo-organizing tool you've been looking for. Running it on your photo folders transforms them into a simple, uniform format that's ideal for both viewing and archiving.
Just as importantly, tagpix is an open-source program that makes hidden agendas impossible, and its merged result is as private as the device on which it is stored. With tagpix, access to a folder, and a few simple commands, control of your photo archives remains with you, not a proprietary, closed program or device.
tagpix transfers all the files in an entire folder tree to a flat folder, without changing their content. Along the way, it adds date-of-origin to the front of the names of files transferred to make them unique and sortable; skips any truly duplicate content, and adds a unique serial number to the end of any remaining duplicate filenames; isolates movies and other non-photo files in folders of their own; and groups all transferred items into by-year subfolders on request.
The net effect is useful for organizing the contents of disparate photo collections holding pictures and movies shot on multiple cameras over many years. By running tagpix, all the items of each media type are merged on your local computer into a single flat folder, or a set of flat by-year subfolders, for fast, convenient, and private access.
In more detail, the following list summarizes the main assets that tagpix brings to your photo-normalization jobs:
Because it's fastest, moves are the default. Copy-and-delete mode has the same effect as moves, but allows items to be moved between different devices and drives (albeit, more slowly than direct moves on the same device). Copy-only mode leaves items in the source tree and works across devices, but may require manual steps to avoid reprocessing prior content on later runs.
xxx.jpg
becomes 2017-10-14__xxx.jpg
in the destination, only).
For photos, the prefix uses date taken, extracted from standard
photo-file Exif metadata
tags when available.
For photos with no Exif date-taken tag, and for other types of files,
the prefix instead uses either the date-taken string embedded in
Android photo filenames,
or else the date-modified value of the file itself.
As an example, when tagpix encounters the first of the following in a source folder, the file's name is expanded to the second form to make it unique across multiple cameras that may produce the same filename for different photos shot on different dates:
DSC03249.JPG 2018-02-05__DSC03249.JPG
Whether the added date comes from an Exif photo tag, Android filename, or the file itself, the net effect makes the names of photos shot on different dates unique in the result's flat merged folders. When date taken is available in Exif tags or Android filename, the expanded name also reflects the date of the scene capture, not the most recent retouch.
date__xxx__N.jpg
) and added to the result.
As another example, if the preceding example's file has already been processed by tagpix, and a new same-named and same-dated file like the first of the following is encountered in the source folder, the new image will be either discarded if its content is the same as the file already processed, or expanded to the second form to make its name unique if its content differs:
DSC03249.JPG 2018-02-05__DSC03249__1.JPG
This means your merged folders will keep just one copy of true duplicates, but all versions of same-named and same-dated content that differs—a rare scenario across different cameras, but possible and even normal if you've retouched or resized a photo and saved it with the same filename in a different folder, and the same date of origin per its Exif tags, Android filename, or file-modification date.
As of version 2.1, tagpix also skips even rarer duplicates of duplicates, that may arise if modified copies are copied to multiple folders unmodified (just one is retained). Regardless of their source, tagpix keeps true duplicates out of the merged result automatically, and renames files with the same name and date of origin but differing content to make them unique.
As an option, items in all three file-type folders can also be grouped by year of origin. If this option is selected, each content type's folder will be grouped into by-year subfolders instead of a flat list of items. Either way, the duplicate-resolution steps of the preceding two items are applied to all three content-type folders. For instance, duplicate copies of movies in the source tree are skipped too.
Read on to learn how to use tagpix to organize your photos.
This section describes tagpix install requirements, inputs and results, usage modes, and other operational details.
tagpix is a Python program that runs on all major platforms, and is provided in source-code form. To install the program itself, download its zipfile from the following web page's Download section and unzip it on your computer:
learning-python.com/tagpix.htmltagpix also requires installs of either a Python 3.X or 2.X to run its source code, plus the third-party Pillow (a.k.a. PIL) image library for the installed Python to access photo tags. Fetch and install these items if needed from the following sites, respectively (or search the web for other links):
www.python.org/downloads/ pypi.python.org/pypi/Pillow
tagpix will work on any platform that runs Python and Pillow, and has the required folder and file access permissions. For example, the program has been verified on Windows, Mac OS, Linux, and Android, and may work on iOS (there's more about running tagpix on mobile devices in the notebox below).
For pointers on Pillow installs, see this page. A note for developers: the exif.py tags-processing alternative to Pillow failed for some files when tested in 2013 for tagpix version 1.0, though your results may vary, and there are other Exif alternatives in the open-source domain.
tagpix on mobile:
per this
shot and
log,
tagpix works on Android devices in apps
that support Python and command lines. For instance, it can be used in
Termux, after running both
this command
(sans its "-y" if you want to be asked about changes) and pip install Pillow
; as well as in
Pydroid 3,
after running the same pip command in its Terminal or using its Pip.
This makes tagpix ideal for organizing photos on Android,
though keyboards can boost usability. Also note that
Android imposes proprietary access rules which limit the folders
accessible to your Python app—and hence tagpix;
for more on its rules which are beyond this guide's scope, see
this doc.
tagpix may work on iOS too (e.g., the
Pythonista app bundles a
version of Pillow), but this is untested, and iOS's access rules have
historically been tighter than Android's.
To launch, run script tagpix.py
with no command-line arguments.
It can be run from a console (e.g., Terminal on Unix and Command Prompt on Windows) and
most Python IDEs (e.g., PyEdit
as captured here, or Python's
own IDLE), though IDEs may not support output-report routing described
ahead. A basic run from a console looks like this:
$ python3 tagpix.py ...input run parameters at prompts...
All run parameters are requested by the following prompts at the program's console:
tagpix renames and moves photos to a merged folder; proceed?
Source - pathname of folder with photos to be moved?
Destination - pathname of folder to move items to?
Group items into by-year subfolders?
List only: show new names, but do not rename or move?
Delete all prior-run outputs in "output folder name"?
For all prompts except #2 and #3, type y
for yes, and
type n
or simply press Enter (a.k.a. return) for no. Some of these
prompts are self-explanatory, but here are a few details to help you get
started, with the most important first:
SOURCE
folder in the
current working directory (e.g., in the script's own directory, if run from the same).
Move or copy all your camera folders and images to there before running this script.
MERGED
folder (described in more
detail ahead). You can either enter an explicit folder,
or press Enter to accept the default:
MERGED
within the script's own directory, if run from the same). Move or copy
the result folders from there after running this script.
MERGED
subfolder will
hold all your combined source-tree items after
the tagpix run.
Per usage-modes coverage ahead, if you enter a prior run's
folder at this prompt, it will be extended; if you enter a new folder, it will be generated.
Among the other prompts:
#4 allows you to bunch items by their year of creation (there's
more on its effect ahead);
and #6 may appear up to three times for non-empty photo,
movie, and other folders, and is important when rerunning tagpix
(see here and here for
its roles, as well as its verifications added in version
2.1; you'll want to reply
n
(no) unless erasing an existing archive).
Finally, to end the script immediately without making any changes, reply no to prompt #1, or enter control+C (or otherwise kill the program) at any other prompt. You can also preview changes before applying them, by replying yes to prompt #5; this enables a list-only mode that analyzes content and shows planned updates, but does not perform any.
For more comprehensive tagpix command-line usage examples, browse the examples folder included in its install package. There, you'll find console logs that demonstrate a variety of options on a variety of platforms. Perhaps the most typical use case is captured in this example.
For simplicity, all tagpix inputs are provided as console replies instead of command-line
arguments, but it's still possible to automate tagpix runs by providing canned
inputs for the run command. This requires a bit a shell-programming skill
and can vary per both platform and shell, but it's straightforward to provide inputs
with one of two general techniques. First, and most portably, you can redirect
stdin
(the
stream
from which input is read) to a file, which contains one reply per line:
$ python3 tagpix.py < inputs.txt
Second, and perhaps more conveniently, you can use a shell 'here' document
to provide inputs in the run script itself. The exact syntax of this can vary,
but here's a simple example coded as a Unix Bash script named runtagpix.sh
;
it provides canned inputs as tabbed lines between EOF
markers, and
suppresses spurious input prompts by routing stderr
to an output sink with 2>
(nit: the latter may also discard some
error messages, including those of uncaught Python exceptions):
#!/bin/bash python3 tagpix.py 2> /dev/null <<-EOF y New-unmerged . y y EOF
You wouldn't type all this at the console, of course (it's just as easy to reply to the prompts), but placing it in a script means you can run tagpix with a single command and no input replies. You won't be able to vary inputs this way, but it's noticeably simpler than typing up to eight responses on each typical updates run:
$ bash runtagpix.sh # or just 'runtagpix.sh' if you make it executable with chmod
For complete examples of precoded scripts that automate inputs this way for both list-only and full-merge tagpix runs, study the included Bash scripts here and here.
In all usage modes, the pathnames you input at prompts #2 and #3 can be either relative or absolute:
MyPhotos
to name a folder of that name
in the current working directory, or use .
for the current folder or ..
for its parent.
/Users/you/MyPhotos
names
a folder on Unix, and C:\Users\you\MyPhotos
names
a folder on Windows.
To minimize the lengths of the paths you'll input, it's often helpful to first run a cd
command in your console to go to the folder containing your MERGED
destination folder and/or
source folder, and then run tagpix there, giving folder paths relative to where you are working.
To illustrate, the following kicks off a tagpix updates run on Unix after changing
to the folder containing both the MERGED
results tree and a New-unmerged
folder
holding the new photos to add to MERGED
. Both folders are in the current
directory (a.k.a. .
) after the cd
command (relative), and the tagpix
script itself is elsewhere (absolute). User-entered commands and
replies are in bold font (and ~
is your user folder on Unix):
~$ cd ~/MY-STUFF/Camera/Digital-cameras-merged ~/MY-STUFF/Camera/Digital-cameras-merged$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py tagpix renames and moves photos to a merged folder; proceed? y Source - pathname of folder with photos to be moved? New-unmerged Destination - pathname of folder to move items to? . Group items into by-year subfolders? y List only: show target names, but do not rename or move? n Delete all prior-run outputs in "./MERGED/PHOTOS"? n Delete all prior-run outputs in "./MERGED/MOVIES"? n Delete all prior-run outputs in "./MERGED/OTHERS"? n ...report messages show up here...
Absolute paths are generally required when running tagpix from an IDE such as
PyEdit, because the
IDE's current directory may not be related to your image folders, and
may not be useful for relative paths; see your file explorer's copy-path
option to paste a folder's absolute path at tagpix prompts easily.
As usual, the tagpix.py
script's path in command lines can be relative
or absolute too depending on where commands are run, and is not
required if the script is open and run from an IDE.
This script's initial prompts
are printed to the stderr
stream,
and its report is printed to stdout
(see the intro to streams
here).
Both go to the console by
default, but this two-stream model allows you to save the tagpix report
to a file for later inspection—especially handy for larger runs.
To start tagpix and save just its report to a file, use a
console command line like this to route stdout
to a file
(>
shell syntax will not work when running tagpix from most IDEs):
$ python tagpix.py > report.txt
This technique works with any command-line form, and can be combined with
the automated inputs we met earlier.
Any special message lines in the report all begin with ***
;
search for this in the saved report text after a tagpix run (more
on this ahead).
For a sample of report content, see the demo logs in the example-runs folder; report text is all that following the last input prompt. For a comprehensive report example from a tagpix run on a very large photo collection, including duplicates, locked-file errors, prior-run dates, and more, see this file.
The script's results show up in the MERGED
folder nested in the destination folder (prompt #3),
split into PHOTOS
, MOVIES
, and OTHERS
subfolders that
each contain merged and uniquely named content files. If you reply
yes to prompt #4, these three subfolders
further group their content into year subfolders. Specifically,
the results are organized into a shallow tree as follows:
Destination or ./ MERGED/ PHOTOS/ flat content, or year subfolders with flat content MOVIES/ flat content, or year subfolders with flat content OTHERS/ flat content, or year subfolders with flat content
As described earlier, all filenames at the bottom levels of
the results tree include date prefixes added to make them unique (e.g., 2017-10-14__file.jpg
).
The dates added reflect either date-taken Exif tag values (for most shot photos),
date-taken date in Android filenames (for Android photos with no Exif date),
or date-modified file attributes (for all others).
For photo files, date taken is always used if present, because it both ensures that names are unique (different cameras may reuse the same names), and reflects the recorded event's date (modification date may instead be a latest-retouch date after edits, but a date-taken tag is likely to survive). Although date taken may not apply to photo scans, for most photos shot on digital cameras the expanded names chronologically identify both the photos themselves and the scenes they capture.
Items not recognized as movies or tagged photos are moved (or copied)
to OTHERS
.
After a tagpix run, you may wish to manually remove items from OTHERS
that reflect camera-specific
cruft. For example, some cameras create .THM
or .CTG
files which are irrelevant
to your content in PHOTOS
and MOVIES
. tagpix does not omit these automatically,
because it prefers to err on the side of caution (only well-known .*
hidden files
and user-selected subfolders are skipped, per the next section).
Be sure to delete only cruft: the OTHERS
result folder may contain non-camera
images like PNGs and GIFs too.
For a more graphical look at results trees, see the examples folder's screenshots of both flat and group-by-year modes.
Following a run, you should check the
report's final Missed
section
to see if any files were skipped due to:
.*
hidden items,
and items in subfolders matching a configurable skip pattern.
They are also noted in Skipping
message lines at the top of the report.
***Duplicate
message lines earlier in the report.
***Error
message lines earlier in the report.
All items skipped are left intact in the source tree, and listed in the Missed
section.
If the Missed
line shows 0
skips, or if you are okay with the
items skipped, delete the contents of your source folder after the run
if desired; if there were no skips, it's just empty directories
(but see also the mode variations note ahead).
If the Missed
line's skips is not 0
and valid items were skipped due to errors,
resolve their issues (e.g., fix locks or permissions, or use a shorter
destination path on Windows) and rerun tagpix to transfer them. For the
rerun, use the same source and destination folders as the original run,
and do not delete the prior run's results
(at prompts #2, #3, and #6).
Mode variations: most of the above pertains to file-move and copy-and-delete
transfer modes only. When tagpix is run in copy-only mode,
added in version 2.1,
it does not produce a Missed
line or section in the results report, because
no files are removed from the source tree. Instead, the end of the report in
this mode concludes with a message Nothing was removed from the source tree
.
To analyze skips in copy-only mode, search for messages earlier in the report,
as described for the three skip categories listed above.
Depending on the replies you provide to input prompts, you can use this script to either extend an existing archive or make one anew, and can do both with the aid of another program:
MERGED
result folder), and answer
no to #6 prompts; new source items will be moved (or copied)
to the prior run's folders.
For an example of usage mode A, see the logs here and here. For an example of mode B, see the log here. For additional usage-mode examples, see the full examples folder. For alternative file transfer modes, see version 2.1 release notes.
This section collects smaller usage notes and tips. Some summarize earlier coverage.
MERGED/PHOTOS/2018/2010-12-03__
). If merged results exceed
pathname limits on your platform, try using a shorter destination path
(i.e., a folder higher on your drive).
YYYY-MM-DD
parts) added to filenames by prior tagpix runs.
It also ensures the new and prior dates match, to avoid stripping any
user-added text in the process.
.
(e.g., Mac OS .DS_Store
files), as well as all items in subfolders whose
names match the user-configurable skips pattern added in version 2.1 (described
ahead). All other items in the source
tree are transferred to the destination's folders. See also
Resolving Skips above.
OTHERS
folder instead of PHOTOS
(which merits a separate note, up next).
OTHERS
images to PHOTOS
OTHERS
results folder: by design, tagpix recognizes photos as
images with MIME types that imply Exif tags (as described earlier),
and always moves other image types to the OTHERS
folder, not PHOTOS
.
This means that
PHOTOS
gets all JPEGs and TIFFs (Exif tags or not), but non-photo image types like
PNGs, GIFs, and BMPs are routed to OTHERS
. If you'd rather see the latter bunch in
PHOTOS
too,
simply move them across manually after a tagpix run; because items in OTHERS
are
also labeled with dates, they'll work well in PHOTOS
alongside your camera JPEGs.
Request for comments: if you think that combining all image types as
described here should be automated with a
new tagpix option, please send feedback via the Input
link in this guide's bottom
toolbar. To date, no user (including tagpix's creator) has asserted a need for this,
and software growth sans use case is a Generally Bad Thing.
os.path.getmtime()
) as a last resort, after trying photo Exif
tags and then Android filename date (per this).
Modification date reflects either the file's creation date (if it has not been
edited), or its latest modification (if it has); for unretouched photos, this is
normally the true date of origin.
It's worth noting that tagpix by design does not try to use a file's
creation date—a datum dependent on both operating system
and filesystem. Specifically, file creation date is generally available on
Windows only (not on Unix, where it is weakly supported on Mac OS and no better
than modification time on Linux),
and even where available can sometimes be irrelevant when content changes.
For background, try this discussion
thread,
this filesystems
comparison,
and Python's
os.path.getctime()
and
os.stat()
.
Because tagpix works in the woefully unstandardized filesystems realm,
it must use modification dates in the name of portability, interoperability,
and results that are the same across all supported platforms.
This cannot be remedied (merging metadata of arbitrary tools is impossible), but you can avoid the issue altogether by applying such tools to tagpix destination folders only, not source folders. That is, run other tools on merged results, not unmerged input. Because merged destination folders are only ever extended, their content is never scattered by tagpix. Source folders are generally best used for staging photos to be later moved by tagpix, per the recommendations ahead.
Modes update: though it comes with some tradeoffs, version 2.1's new copy-only mode can now be used to extract images from a source tree without destroying it. See 2.1's release notes ahead. The preceding still applies to both the original and default file-move mode, as well as the new copy-and-delete mode.
os.rename()
to move files from source to destination, which is normally correct, fast, and
atomic. File moves can be problematic, though, when run between different devices
or filesystems. If a run's moves all fail due to differing devices, make sure
your source and destination folders reside on the same writable device—copy
the source folder to the same hard drive or SSD as your destination folder,
before the tagpix run. This is a minor inconvenience, but makes all
tagpix runs quicker, and copying new source images to a temporary staging folder
is recommended practice anyhow; merging from a camera or camera card
directly leaves no backup copy if anything goes wrong.
Developers notes: Python's
os.replace()
doesn't help here, because it still raises an exception across different drives and
devices on Windows, Mac OS, and Linux (this call just avoids Windows exceptions
if the target file exists on the same device). The only alternative to moves is to
copy and delete, which can be much slower for large photo archives, and cross-device
moves seem too rare and dangerous to justify the slowdown for all use
cases—especially when a manual pre-run copy of the source folder takes
roughly the same amount of time.
Modes update: though they come with some tradeoffs, version 2.1's new copy-only and copy-and-delete modes can now be used to merge across different drives and devices directly. See 2.1's release notes ahead. The preceding still applies to the original and default file-move mode.
This section describes changes made in recent tagpix versions. It is meant primarily for developers and prior-version users, though additional usage-level details and context are presented along the way. tagpix is occasionally repackaged with minor documentation-only changes (e.g., to this doc and its demos), but code and functionality changes occur only in the versions listed here.
tagpix was patched and rereleased on September 29, 2020 with two upgrades. The first was a minor UI improvement: at input prompts, typing control+C to exit now yields a user-friendly message instead of a Python exception traceback, and source-file existence is checked ASAP. For example:
~/Desktop/camera$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py tagpix renames and moves photos to a merged folder; proceed? y Source - pathname of folder with photos to be moved? ^C Script not run: no changes made. ~/Desktop/camera$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py tagpix renames and moves photos to a merged folder; proceed? y Source - pathname of folder with photos to be moved? Spam Script not run: source folder does not exist, no changes made.
The second upgrade was more urgent: code was added
to silence a bogus DecompressionBombWarning
message now issued
senselessly by the underlying
Pillow library for all large
images.
Specifically, when running tagpix on images larger than 89MP,
the Pillow library by default prints a single DOS (denial of service)
warning message in program output that looks like this (with line-breaks
added here for marginal readability):
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/PIL/Image.py:2797: DecompressionBombWarning: Image size (108000000 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. warnings.warn(
This baseless warning is completely harmless, and does not impact tagpix results (large images work either way). But it's also stupidly excessive, and needlessly confuses users of this and many other Pillow-based programs.
It was first seen for perfectly valid 108MP images shot on a Galaxy Note20 Ultra smartphone in 2020, and will crop up for large images created on many other devices and tools in widespread use. Obviously, these are not "attacks," despite the warning's language. Users who see this, however, may assume it reflects bugs or viruses.
To see the changes applied to silence the message, search for Sep-2020
in
the source; the fix was trivial,
but the cost of rereleasing this and
other
programs impacted at tagpix's host site was not.
Such is life when "batteries included" meets open-source agendas.
Postscript: though scantly documented, it turns out that Pillow later turned the warning described here into a full error for images larger than twice the warning's size limit. This error takes the form of an exception that will cause client programs to fail or terminate. Despite this, its only mention seems to be in an obscure release note. To avoid kills, tagpix's warning-silencing code has been updated to use a new and broader fix—which will suffice only until Pillow tightens the screws again. This check should clearly be opt in for programs that need to care.
Version 2.2—finalized on December 2018—was a minor release that addressed just one specific issue. Specifically, it was enhanced to automatically process origin dates added to photo filenames by Android cameras: it utilizes these dates if no Exif date-taken tag is present, and discards these dates (but not times) to avoid redundancy with tagpix-added dates. A utility script was also coded to drop Android filename dates on demand for users of prior tagpix releases.
This change applies only to tagpix users who have shot photos on Android devices, or may do so in the future. Given the potential magnitude of this subset, though, the rest of this section provides complete coverage. For a brief look at this change's results, see this log and shot. For the full story, read on.
Most digital cameras assign filenames to images using a simple
format that accommodates the basic but portable FAT filesystem's
8.3
naming convention.
For instance, a DSC
or IMG
prefix followed by a sequence
number suffices to identify images on a given camera, though not across
different cameras—one of the main limitations tagpix solves,
by expanding the first of the following forms to the second, with
a date-of-origin prefix:
DSC03249.JPG 2018-02-05__DSC03249.JPG
By contrast, cameras on some Android devices (and perhaps others) add a date in photo filenames which, combined with an added time, identifies images by their moment of creation, but is redundant with that added by tagpix's own renaming logic. For example, such images' filenames are initially expanded by tagpix from the first of the following to the second:
20180205_154910.jpg 2018-02-05__20180205_154910.jpg
While the Android-added date and time (separated by _
in the
first name above) might be a good idea in a world begun anew,
they bifurcate the digital-photos world that is.
This is a unique and nonstandard naming scheme, that stamps files with
a date that makes tagpix filenames longer unnecessarily, and in most
cases is fully redundant with both standard in-file Exif creation-date
tags (when present and unchanged), and the date-of-origin prefix added to all
photos by tagpix (when its source agrees with the Android stamp).
That said, blindly deleting the Android date in filenames is too extreme, because it may be the only record of creation date in some scenarios. For example, Android photos edited in tools that discard Exif tags won't have a date-taken tag, but will retain a creation date in their filenames that normally differs from the file's modification date (which is generally a last-edit date).
More subtly, some recent Samsung Android devices never record Exif date-taken tags for front—a.k.a. "selfie"—cameras. This is a known issue that you can explore on the web here and here. It may be a temporary bug that Samsung will fix in an update, and back cameras on these devices do record Exif dates correctly. But discarding the Android filename date of photos shot on such devices' front cameras would also drop valuable metadata found nowhere else.
Because the filename date is potentially useful in such cases, tagpix 2.2 has generalized the way it chooses a date of origin to be used for the prefix it adds to filenames. Formally, it always now tries three sources in turn, until a date is selected:
The first step above is applied to photos only (other content type doesn't have Exif tags). The other steps are run for all types of content in source trees, including photos without usable Exif tags.
The second step above is new, requires heuristics to detect dates, and applies only to a subset of users and images, but is necessary to accommodate metadata recorded outside the Exif model by a handful of devices and manufacturers. A special case to be sure, but exceptions seem as much the norm in the digital camera domain as the computer field at large!
Because step two is partly
heuristic—it looks for matching
strings and checks their content for valid dates—it can also be
disabled by setting UseAndroidFilenameDates
in the
user configs file. This switch is preset
to True
to cover the norm; set it to False
in the unusual event
that filenames in your source tree appear to embed Android dates just
by coincidence.
After the tagpix date has been selected per the prior section,
tagpix 2.2 addresses the redundancy of Android filename dates with a new
renaming step, run before duplicates detection and file move or copy.
If enabled by setting DropAndroidFilenameDates
to True
in the
user configs file, the
tagpix.py
main script
now automatically renames merged photo files to drop the superfluous Android
date and keep only the tagpix date (along with the Android-added time,
which helps identify the photo). For instance, it shortens
from the first of the following to the second:
2018-02-05__20180205_154910.jpg 2018-02-05__154910.jpg
This step is enabled by default, because it yields shorter names, and normally has no impact on duplicates processing or content access—the shorter form is no less unique or meaningful than the longer. The tagpix date is usually the same as the Android date, whether it is taken from Exif tags or filename.
As a special case, though, this new renaming step can also be specialized with
switch KeepDifferingAndroidFilenameDates
to drop only Android dates that
are the same as the tagpix date. Though unlikely, the two dates may differ if a
photo's Exif-tag date is not the same as its Android-filename date—which
is generally possible only after manual changes to either, given tagpix's
date-selection algorithm. In such rare
cases, the tagpix and Android dates may disagree, as in the following
inconsistently changed photo:
2018-08-03__20180408_073757.jpg
Set the keep switch to True
in the
user configs file if you wish to retain
the Android date when it differs this way. This switch defaults to True
to be cautious, because an auto-shortened filename carries less information
in this case only. Still, this case seems too unlikely to apply to most,
if any, users
(and if it does apply to you, you probably understand both the perils
of manual metadata changes, and the need for such an obscure switch!).
For an example of 2.2's automatic handling of Android filename dates, see the console log here, and the screenshot of its results folder here. In the end, the combination of using and dropping such dates shortens filenames of all photos shot on Android cameras, without sacrificing filename metadata when useful.
For more specialized roles, 2.2 also adds a new utility script
_drop-redundant-dates.py
,
which can be run on demand to drop all Android dates in images already
processed by a former version of tagpix (or a later version
run with auto-renaming disabled).
This utility script is never required for users of tagpix 2.2+ if auto-renaming is enabled, and usually must be run just once by pre-2.2 users who have upgraded. It is also somewhat naive: it makes no attempt to determine if the Android date dropped differs from that of the tagpix date formerly added. Be sure to use its list-only mode to preview changes before running it to update photos; because prior versions of tagpix didn't use filename dates in the absence of Exif dates, some formerly-merged Android photos may be labeled with file-modification date instead.
One special case here: as described in the new utility script's docstring, if you're using a tool that relies on the names of images, you may need to rerun the tool after running the utility script, to pick up the new names. This requirement naturally varies per tool. For instance, the HTML viewer pages generated by the thumbspage gallery builder hardcode image filenames, which can be invalidated by later renames. On the other hand, this isn't a concern for the PyPhoto GUI viewer, which updates its thumbnails cache automatically on image changes.
This special case is also completely irrelevant when using the 2.2
automatic renaming of tagpix.py
, because its renaming occurs before
other tools can be run on its merged results. Where possible, use
automatic renaming instead of the on-demand utility script.
Request for comments:
there undoubtedly are additional device-specific photo-naming conventions beyond
the Android camera pattern addressed here (e.g., some Windows screenshot names may
redundantly embed date/time information too). If you'd like to see other filenames
accommodated by tagpix, please send feedback via the Input
link in this
doc's bottom toolbar. As it stands, device manufacturers seem to be climbing over each
other to come up with proprietary naming conventions with no interest in
standardization or interoperability, and supporting all the constantly changing
variants in this context would be akin to herding cats.
Version 2.1—finalized on October 2018—was a major update, which generalized source-tree subfolder skips; added a simple but crucial deletion verification; improved duplicates detection; introduced new file-transfer modes that copy instead of move; and cleaned up a few dark but rare corners.
moveall()
). This had no impact on program
operation or results, but makes future changes easier. A new file was also added for
user configurations, user_configs.py
. This has only
a small number of settings a present but better supports future customizations.
IgnoreFoldersPattern
in the user-configurations file
user_configs.py
.
This pattern's new preset skips .*
hidden folders;
thumbs
thumbnail folders created by some tools (including older versions of
PyPhoto that predate its
single-file caches); and _thumbspage
thumbnail/viewer-page folders created by the
latest thumbspage
image-gallery builder.
For a demo of 2.1 subfolder skipping, see
this example.
Note that this matters only for subfolders having irrelevant images (e.g., thumbnails);
applies only to folders in your source tree (the destination tree is not
scanned for images to add to the collection); and is not required if your source folders
to be skipped are named with a leading .
(the pattern preset already skips all such
folders, though some zip and backup tools may skip them too).
The code now also correctly skips multiple matching folders when present.
n
or simply press the Enter/return key to cancel the delete
(a control+C at any prompt works to kill the program in general, but may be too late
for weary users to apply):
Delete all prior-run outputs in "./MERGED/PHOTOS"? y ....About to delete: ARE YOU SURE? n Delete all prior-run outputs in "./MERGED/OTHERS"? y ....About to delete: ARE YOU SURE?
The simple fix, in moveone()
of the script, is to increment
the numeric-ID suffix in a loop, until the resulting filename either does not exist
in the destination folder or matches an existing same-named file there by content
(as formerly done in the related music-file program
flatten-itunes).
This avoids file overwrites in all contexts (the former defect), but also correctly skips
all same-content images for a given filename—whether they match the first instance
of the filename moved to the destination (as before), or any differing-content duplicate
added later with a uniquely suffixed ID (new behavior).
For a short demo of the new duplicates-resolution logic in action, see this example. The new behavior—skipping duplicates having content the same as another duplicate—addresses the unlikely event of modified copies being copied to multiple folders unmodified. This works well and as it should, but is also the tagpix equivalent of a second lightning strike...
user_configs.py
.
These are alternatives to the original and default file-move mode,
which always removes files from the source tree by definition.
The two new modes copy source files byte-for-byte to the destination,
instead of directly moving them. This makes the new modes run slower,
but in some roles can make manual source-content copies unnecessary,
and lets you use tagpix in additional contexts:
In short, these two new modes provide extra utility, as captured in this example. Nevertheless, the original file-move mode is still the tagpix preset default, both because moves always run faster than copies, and because this mode promotes better practice. In terms of practice:
Hence, as both general rule and recommended usage: copy your initial or new source images to a temporary staging folder to be used as the tagpix source tree, and use the default file-move mode. Unless your use case is more custom, this is still the best and safest way to use tagpix.
Version 2.0—finalized on October 2017—was a major step up from the former, simplistic script, as summarized below.
Among version 2.0's foremost improvements, it now:
stdin
to a file of precoded
replies—or a shell in-script 'here' document, as described above and later
here.
Per earlier, also sends prompts to stderr
so stdout
report text can be saved for easier review.
mimetypes
mimetypes
module, so
other images may be treated as photos too. Still, because Exif
tags are apparently used only by JPEG and TIFF images and WAV audio
(PNG and WebP images may have metadata too, but their standards and support are
evolving),
only JPEG and TIFF mime types are treated as 'photos' here; others
go to the OTHERS
folder: as images, but not photos. For more
details, try this page
or a web search. 2.0 also uses mimetypes
for movie detection, adding
newer video types in case some platforms do not.
./SOURCE
(in the current working directory (CWD), which is the
script's own folder if it's run from there), and press Enter when
asked for the source folder's path.
./MERGED
.
MOVIES
subfolder, instead
of lumping them in with OTHERS
as before (or PHOTOS
).
None
-indexing error message
enumerate()
: ID was too high if many items
thumbs/
thumbnail subfolders created by programs like
PyPhoto
.*
Unix hidden items like Mac .DS_Store
files (but they may reform!)
Despite its upgrades, version 2.0 left the following issues on the table (see also the later changes in 2.1 and 2.2):
stderr
/stdout
split model,
but it could instead always save the report in the MERGED
root folder of
the results, with an appended date/time suffix. This was not implemented
because the reports might become unwelcome trash after many runs, but that
rationale is open to debate.
\\?\
pathname-prefix trick (like
Mergeall and
ziptools).
But this case is rare, it can be addressed by using a shorter (higher)
destination-folder path, and users may not be able to view the results
in Explorer anyhow. Punt in this release, but revisit if feedback warrants
(see Input
in the toolbar below).
Prior to version 2.0, thumbspage was a basic, tactical script that was neither robust nor customizable. And then it was used.
tagpix has been tested extensively and used successfully on extremely large photo collections, including all those of its creator, and it will likely perform well on yours too. It is provided freely because it can help you organize your photo libraries. Especially given the many ways that computers can fail, however, a word of caution is in order:
By design, this script's default operation moves and renames
all photos and other files in an entire source folder tree. No automated
method for undoing the changes it makes is provided, and no warranty is
included with this program. Please read all usage details in this document
carefully before running tagpix on your photos. It is strongly recommended
to preview changes with list-only mode before
applying them; and either run tagpix on a temporary copy of your source
folder tree, or enable its copy-only transfer
mode in file user_configs.py
to avoid
source-tree changes.
Lest that sound too dire, keep in mind that tagpix never changes photo content (it transfers and renames them only), and errors simply leave items in their original location in all transfer modes (a rerun can propagate them to the destination). Moreover, if you always copy/paste new images from your camera's storage to a tagpix staging folder (per the preceding notebox's recommendation), the camera's storage will automatically serve as a backup copy, regardless of this program's operation.
Still, the importance of your photos merits a complete understanding of any tool that modifies them—this one included.