tagpix — Combine Your Photos for Easy Viewing and Archiving

This is the tagpix user guide. It includes an overview, usage instructions, and version changes. Because tagpix renames and moves photos, users are encouraged to read this guide before running tagpix on valued photo collections. For this program's license, see its main script.

Contents

This is an online-only version of this document. It has the same content as the original desktop version shipped in the program's download package, but has been styled for viewing on both desktop and mobile devices. To use this instead of the original version, use your browser to save this page's HTML in the install folder.

Overview

Why tagpix?

If your digital photo collection has become scattered over many folders; uses filenames that are not unique because of their origin on multiple cameras; hosts modification dates that reflect retouches instead of events; or contains arbitrary duplicates, tagpix may be the photo-organizing tool you've been looking for. Running it on your photo folders transforms them into a simple, uniform format that's ideal for both viewing and archiving, and as private as the device on which it is stored.

What tagpix Does

tagpix moves all the photo files in an entire folder tree to a single folder, adding date taken (or modified) to the front of filenames to make them unique and sortable; discarding any truly duplicate content and adding a unique serial number to the end of any remaining duplicate filenames; isolating movies and other non-image files in folders of their own; and optionally grouping all the merged items into by-year subfolders on request.

The net effect is useful for organizing the contents of disparate photo collections holding pictures and movies shot on multiple cameras over many years. By running tagpix, all the items of each media type are merged on your local computer into a single flat folder, or a set of flat by-year subfolders, for fast, convenient, and private access.

In more detail, here are the main assets that tagpix brings to your photo-normalization jobs:

Resolving same-name conflicts
tagpix resolves same-name conflicts between different cameras' content by adding a date prefix to all filenames (e.g., "2017-10-14__file.jpg"). For photos, the prefix uses date taken, extracted from standard photo-file Exif metadata tags. For other types of files, and for photos with no Exif tag, date modified is used instead. Either way, the date makes photo names unique in the result's flat merged folders. When date taken is available, the expanded name also reflects the date of the scene captured, not the most recent retouch.
Grouping by type and year
tagpix groups tree content by file type, creating separate folders for photos, movies, and others. Photos from cameras are usually JPEG files, but are recognized by both MIME type (which keys off of filename extension) and Exif tag use; this means that both JPEGs and TIFFs are treated as photos by the program. Movies are similarly classified per MIME types and segregated from photos for direct access, and each type's folder may be grouped by year subfolders as an option.
Resolving duplicate content
tagpix automatically detects and resolves duplicates in the tree. It first runs a full byte-by-byte comparison of files with the same date-prefixed name. If the files are the same, the redundant copy is skipped and not added to the result. If they differ, the new copy's filename is extended with a unique serial-number suffix (e.g., "date__file__N.jpg"). This means your merged folders will keep just one copy of true duplicates, but all versions of same-named and same-dated content that differs—a rare scenario across different cameras, but normal if you've retouched or resized a tagged photo and saved it with the same name in a different folder.

In addition, tagpix strips prior runs' date prefixes so you can rerun it any number of times on prior results; comes with a list-only mode that allows you to preview its intentions without making any changes; and generates a report that describes all the updates it performs and any files it skips. Read on to learn how to use tagpix to organize your photos.

Usage Details

This section describes tagpix install requirements, inputs and results, usage modes, and other operational details.

Platforms and Installs

tagpix is a single-file program that runs on all major platforms, and is provided in source-code form. It requires installs of either a Python 3.X or 2.X to run its source code, plus the third-party Pillow (a.k.a. PIL) image library for the installed Python to access photo tags. Fetch and install these items if needed from the following sites, respectively (or search the web for other links):

https://www.python.org/downloads/
https://pypi.python.org/pypi/Pillow

For pointers on Pillow installs, see this page. A note for developers: the exif.py tags-processing alternative to Pillow failed for some files when tested in 2013 for tagpix version 1.0, though your results may vary.

Input Prompts

To launch, run script tagpix.py with no command-line arguments. It can be run from a console (e.g., Terminal on Unix and Command Prompt on Windows) and most Python IDEs (e.g., PyEdit and IDLE), though IDEs may not support report routing described ahead.

All run parameters are requested by the following prompts at the program's console:

  1. tagpix renames and moves photos to a merged folder; proceed?
  2. Source - pathname of folder with photos to be moved?
  3. Destination - pathname of folder to move items to?
  4. Group items into by-year subfolders?
  5. List only: show new names, but do not rename or move?
  6. Delete all prior-run outputs in "<output folder name>"?

For all prompts except #2 and #3, type "y" for yes, and type "n" or simply press Enter (return) for no.

For #2 (the source):
You can either enter an explicit folder, or press Enter to accept the default. To use an explicit folder, enter the pathname of the root folder containing all the photo subfolders you wish to combine; for example, you might give the folder just above those where you store photos from your camera cards, copies, or imports. If you prefer to use the default, it is the "SOURCE" folder in the script's own directory (technically, the current working directory); move or copy all your camera folders and images to there before running this script. Whether the source is explicit or default, all its content and subfolders will be scanned to collect items.
For #3 (the destination):
You can either enter an explicit folder, or press Enter to accept the default. To use an explicit folder, enter the pathname of the folder to which you wish tagpix to move your merged source photos; result folders will be created there automatically as needed. If you prefer to use the default, it is the "MERGED" folder in the script's own directory (technically, the current working directory); move or copy the result folders from there after running this script. Whether the destination is explicit or default, it will hold all your relocated items after the tagpix run. Per usage-modes coverage ahead, if you enter a prior run's folder at this prompt, it will be extended; if you enter a new folder, it will be generated.

To end the script immediately without making any changes, reply no to prompt #1, or enter control+C (or kill the program) at any other prompt. List-only mode (replying yes to #5) analyzes content and shows planned changes but does not perform them; use this to preview and verify the script's updates. Prompt #6 is important when rerunning tagpix; see ahead for its roles.

A Brief Primer on Pathnames

In all usage modes, the paths you input at prompts #2 and #3 can be either relative to your current location in a console (e.g., "." for the current folder), or absolute (e.g., "/Users/you/photos" on Unix, "C:\My-Photos\unmerged on Windows). For instance, when running tagpix via command lines, you can "cd" to the folder containing your MERGED destination folder and/or source folder, and give folder paths relative to where you are working. Absolute paths are generally required when running tagpix from an IDE such as PyEdit. As usual, the tagpix.py script's path in command lines can be relative or absolute too.

Results Report

This script's initial prompts are printed to the stderr stream, and its report is printed to stdout. Both go to the console by default, but this two-stream model allows you to save the tagpix report to a file for later inspection—especially handy for larger runs. To start tagpix and save just its report to a file, use a console command line like this to route stdout to a file (">" shell syntax will not work when running tagpix from most IDEs):

python tagpix.py > report.txt

Any special message lines in the report all begin with "***"; search for this in the saved report text after a tagpix run.

For a sample of report content, see the demo logs in the example runs folder; report text is all that following the last input prompt. For a comprehensive report example from a tagpix run on a very large photo collection, including duplicates, locked-file errors, prior-run dates, and more, see this file.

Results Tree

The script's results show up in the "MERGED" folder nested in the destination folder (prompt #3), split into "PHOTOS," "MOVIES," and "OTHERS" subfolders that each contain merged and uniquely named content files. If you reply yes to prompt #4, these three subfolders further group their content into year subfolders. Specifically, the results are organized into a shallow tree as follows:

Destination or ./
    MERGED/
        PHOTOS/
            flat content, or year subfolders with flat content
        MOVIES/
            flat content, or year subfolders with flat content
        OTHERS/
            flat content, or year subfolders with flat content

As described earlier, all filenames at the bottom levels of the results tree include date prefixes added to make them unique (e.g., "2017-10-14__file.jpg"). The dates added reflect either date-taken tag values (for most shot photos), or date-modified file attributes (for all others).

For photo files, date taken is always used if present, because it both ensures that names are unique (different cameras may reuse the same names), and reflects the recorded event's date (date modified may instead be a latest-retouch date after edits, but a date-taken tag is likely to survive). Although date taken may not apply to photo scans, for most photos shot on digital cameras the expanded names chronologically identify both the photos themselves and the scenes they capture.

Items not recognized as movies or tagged photos are moved to OTHERS. After a tagpix run, you may wish to manually remove items from OTHERS that reflect camera-specific cruft. For example, some cameras create ".THM" or ".CTG" files which are irrelevant to your content in PHOTOS and MOVIES. tagpix does not omit these automatically, because it prefers to err on the side of caution (only well-known ".*" hidden files are skipped). Be sure to delete only cruft: OTHERS may contain PNGs and GIFs too.

For a more graphical look at results trees, see the examples folder's screenshots of both flat and group-by-year modes.

Resolving Skips

Following a run, you should check the report's final "Missed" section to see if any files were skipped due to:

All items skipped are left intact in the source tree, and listed in the "Missed" section.

If the "Missed" line shows "0" skips, or if you are okay with the items skipped, delete the contents of your source folder after the run if desired; if there were no skips, it's just empty directories.

If the "Missed" line's skips is not "0" and valid items were skipped, resolve their issues (e.g., fix locks or permissions, or use a shorter destination path on Windows) and rerun tagpix to transfer them; use your same source and destination folders, and do not delete the prior run's results (for prompts #2, #3, and #6).

Usage Modes

Depending on the replies you provide to input prompts, you can use this script to either extend an existing archive or make one anew, and can do both with the aid of another program:

  1. To extend an archive (e.g., for viewing, or full optical-disc burn), for prompt #3 give the same destination-folder path as a prior run (i.e., the path to the folder containing a prior run's MERGED result folder), and answer no to #6 prompts; new source items will be moved to the prior run's folders.
  2. To make a new archive (e.g., for an initial or incremental optical-disc burn), for prompt #3 give a new destination-folder path, perhaps with the run date in its name; source items will be moved to the new archive's folders.
  3. To add new items to both an incremental archive for burning and an existing archive for viewing, use option #B above first, and then merge the new archive's contents into an existing archive with another tool (a GUI cut/paste or drag-and-drop will generally suffice).

For an example of usage mode A, see the examples here and here. For additional usage-mode examples, see the full examples folder.

Other Usage Notes

This section collects smaller usage notes and tips. Some summarize earlier coverage.

Dates, not times
Time is not included in filename prefixes, because it would make names longer, and camera-added sequence numbers will normally suffice to identify and order photos taken on the same day. Dates are more crucial, as different cameras may use the same sequence numbers.
Result path lengths
Even with date alone, the combination of folders and date prefixes created by tagpix can be 31 characters long. If the result exceeds pathname limits on your platform, try using a shorter destination path (i.e., a folder higher on your drive).
Preventing changes
tagpix makes no changes if the source folder does not exist; the user cancels the run verification or requests a list-only run (via prompts #1 or #5); or the script is killed while waiting for any input (e.g., control+C in a console, or a kill request in an IDE).
Reruns on prior results
It's safe to rerun tagpix on items and folders it created in the past, because it automatically detects and discards any date prefixes added to filenames by prior runs. It also ensures the new and prior dates match, to avoid stripping user-added text.
Duplicates
Per the overview above, it's safe to run tagpix to combine trees with duplicate item copies: they are automatically discarded (for duplicate content) or renamed (for duplicates filenames).
Rerunning after errors
It's safe to rerun the script if it exits early, or skips items due to move errors described earlier. The next run will simply rename and move all the items left in the source folder (but be careful not to delete the prior run's results when asked by prompt #6).
Moves across devices
tagpix uses Python's os.rename() to move files from source to destination, which is normally correct, fast, and atomic. File moves can be problematic, though, when run between different devices or filesystems. If all of a run's moves fail, make sure your source and destination folders reside on the same writable device (e.g., your hard drive or SSD).
Choosing folders to merge
As a rule of thumb, files that are not movies or photos with date-taken tags may be better left out of the tree that tagpix will merge. This includes both scanned photos, whose dates will all reflect scan date instead of event date, and images such as PNGs and GIFs that have no date-taken information. You can merge these too, but scans will be renamed with their scan date, and untagged images will wind up in the OTHERS folder instead of PHOTOS.

Usage Caution

tagpix has been tested extensively and used successfully on extremely large photo collections, and will likely perform well on yours too. It's provided freely because it can help you simplify your photo libraries. Especially given the many ways that computers can fail, however, a word of caution is in order.

By design, this script renames all photos and other files in an entire source folder tree, and moves them to destination folders. No automated method for undoing the changes it makes is provided, and no warranty is included with this program. Please read all usage details here carefully before running tagpix on your photos. Its list-only mode can be used to view but not apply changes, and it is recommended to run tagpix on a temporary copy of your folder tree.

Lest that sound too dire, keep in mind that errors simply leave items in their original location, and a rerun can propagate them to the destination. Still, the importance of your photos merits a complete understanding of any tool that modifies them—this one included.

New in This Version

This section tersely describes changes made in the most recent release—version 2.0, released October 17, 2017 (and republished January 12, 2018 with only minor user-guide and example changes). It is primarily meant for developers and prior-version users. No new usage-level details are introduced in this section, though it may serve as additional context.

Changes Made

Parameters
Gets all run parameters as console inputs (not code variables). Command-line arguments are not used, because they are cryptic; to provide input programmatically, redirect stdin to a file of replies. Per earlier, also sends prompts to stderr so stdout report can be saved.
List-only mode
Adds an option to list planned changes only, making no changes. Use this to inspect and verify proposed changes without applying them.
Year subfolders
Adds an option to group the resulting flat folders into by-year subfolders automatically (for photos, movies, and others).
TIFFs and mimetypes
Handles non-JPEG images by using Python's mimetypes module, so other images may be treated as photos too. Still, because Exif tags are apparently used only by JPEG and TIFF images and WAV audio, only JPEG and TIFF mime types are treated as 'photos' here (others go to the 'others' folder: as images, but not photos). For more details, try this page or a web search. Also uses mimetypes for movie detection, adding newer video types in case some platforms do not.
Source folder
Allows the source folder to be separate from this script's own folder. Moving huge photo archives to a temp folder can be expensive (one subject folder was 75G). To use the prior model, copy images to "./SOURCE" (in the current working directory (cwd), which is the script's own folder if it's run from there), and press Enter when asked for the source folder's path.
Destination folder
Allows the results folder to be separate from this script's own folder (i.e., cwd). This in turn allows the program to extend a prior run's results when desired, instead of always making a new archive folder (see Usage Modes). To use the prior model, press Enter when asked for the destination folder's path, and copy results from "./MERGED".
Movies folder
Moves all video mime-type files to a new "MOVIES" subfolder, instead of lumping them in with "OTHERS" as before (or "PHOTOS").
Additional changes
See the code for more details on the following:

Open Issues

Report location
This release allows its output to be routed to a file with its stderr/stdout split model, but it could instead always save the report in the "MERGED" root folder of the results, with an appended date/time suffix. This was not implemented because the reports might become unwelcome trash after many runs, but that rationale is open to debate.
Windows path lengths
tagpix could support too-long pathnames on Windows with the "\\?\" pathname-prefix trick (like Mergeall and ziptools). But this case is rare, it can be addressed by using a shorter (higher) destination-folder path, and users may not be able to view the results in Explorer anyhow. Punt in this release, but revisit if feedback warrants.



[Python Logo] News Code Blog Apps Top Email ©M.Lutz