File: shrinkpix/shrinkpix.py

#!/usr/bin/env python3
"""
=============================================================================
shrinkpix.py - shrink images for faster (and politer) online viewing.

Version: 1.3, September 30, 2020 (see VERSIONS below)
Author:  © M. Lutz (learning-python.com)
License: provided freely but with no warranties of any kind
Website: https://learning-python.com/shrinkpix/
Bundled: restore-unshrunk-images.py and collect-unshrunk-images.py 
Related: thumbspage gallery builder, at learning-python.com/thumbspage.html

-----------------------------------------------------------------------------
CAUTION
-----------------------------------------------------------------------------

  This script changes images in place, but saves originals first, and 
  includes a utility that backs out all changes made.  More fundamentally, 
  this script works well for the website it targets (see RESULTS ahead), 
  but has not yet been widely used, and remains experimental.  Run it on 
  a temporary copy of your website first, and always inspect the quality 
  of its results before publishing them.  Image-size reduction is a complex
  task; this script demos just a few techniques in this domain, and other 
  tools may do better.  On the other hand, this program is free, fun to 
  code, runs locally on your machine, and may serve as inspiration or base.

-----------------------------------------------------------------------------
INSTALL
-----------------------------------------------------------------------------

  Download shrinkpix from here and unzip:
     https://learning-python.com/shrinkpix/shrinkpix-full-package.zip

  shrinkpix requires Python 3.X to run its code, and the Pillow (a.k.a.
  PIL) third-party library for image processing, and is expected to run
  on any system that supports both tools, including Mac OS, Windows, 
  Linux, and Android.  Install if needed from here:

     https://www.python.org/downloads/
     https://pypi.python.org/pypi/Pillow

  Shrunken-image quality is the same for Pillow versions 4.2 through 7.0.
  shrinkpix also uses the piexif third-party library in a minor role, but
  includes its code directly; see the piexif/ folder here for details.  
 
-----------------------------------------------------------------------------
USAGE
-----------------------------------------------------------------------------

  This program is configured by uppercase settings at "# Configurations" 
  in code ahead, and may be run with a command line of this form:

     $ python3 shrinkpix.py (<folderpath> | <filepath>)? -listonly? -toplevel?

  As usual, add a "> saveoutput.txt" to retain the script's output.  In 
  more detail, shrinkpix can be run with 0 to 3 command-line arguments
  in any order, as follows:

  - If an argument other than -listonly and -toplevel is given, shrinkpix
    expects it to be the pathname of either a folder tree or an image file.
    For a folder tree, it walks the entire tree and shrinks all its images.
    For an image file, it shrinks just that one specific image alone.

  - If -listonly is included in the arguments, shrinkpix shows large images
    to be shrunk, but does not update them.  This previews changes in both
    folder-tree and image-file modes, and works similarly in utility scripts.

  - If -toplevel is included in the arguments and a folder-tree name is 
    provided, shrinkpix shrinks just the images in the top level of the 
    tree, skipping any subfolders.  This option walks a folder instead 
    of a full tree, and is ignored when shrinking an individual image.
    Use the same option to limit walks in utility scripts (see CAVEATS).

  - In all cases, arguments omitted default to settings in the code below.
    The folder-or-file argument defaults to SHRINKEE, and -listonly and 
    -toplevel default to LISTONLY and TOPLEVEL, respectively.  This may 
    be useful for focused goals and IDEs lacking command-line support.

-----------------------------------------------------------------------------
PURPOSE
-----------------------------------------------------------------------------

  Run this script to reduce the filesize of images you post online.
  This program may be used to shrink either a specific image or all images 
  in a folder tree.  In its broadest role, it finds all image files larger 
  than a given size in a website's tree, and attempts to shrink them down 
  to the configurable target size using a series of transformations.  For 
  more focused goals, the same reduction may be applied to individual images.

  When posted online, the smaller images this program creates can avoid (or
  at least minimize) delays for slow servers and/or clients, and are politer 
  to visitors with limited or metered bandwidth.  Per evidence so far (and
  the next section), views of images shrunk by this script are plainly faster.  
  As a bonus, shrinkpix runs locally on your computer from a console, IDE, 
  or website build script, and neither uses nor requires network connectivity.

  This script's original impetus was the full-size image-viewer pages in 
  galleries created by thumbspage (learning-python.com/thumbspage.html).
  If you use thumbspage, after this script is run, be sure to remake your 
  site's image galleries for the new image filesize and dimensions info 
  displayed in popups, and upload the results to your host.  For more on 
  this utility's roles, see thumbspage's UserGuide.html#imagesizeandspeed.

-----------------------------------------------------------------------------
RESULTS
-----------------------------------------------------------------------------

  Today, the smaller image files created by this program are good enough
  to be adopted globally at learning-python.com.  It's not uncommon for the
  program to reduce a 6M image to 200-300K: a 20-30X decrease in size, with 
  proportionate decreases in download time and bandwidth usage, and no readily
  discernable decrease in visual quality.  This also lessens the size of 
  image-laden download packages massively; the largest fell from 250M to 110M. 

  Though results will vary per network and image, the speedup for views of 
  shrunken images seems palpable and noticeable, and justifies the minor and
  rare quality hits.  This program does come with some caveats listed ahead, 
  and may improve in time; for now, it's already a win for its target use case.

-----------------------------------------------------------------------------
BACKUPS
-----------------------------------------------------------------------------

  This script always saves original (unshrunk) images in subfolders named
  "_shrinkpix-originals/" and located in the same folder as the original 
  image itself, unless backups are disabled in configurations.  These 
  backup subfolders are automatically created when needed.  When a new 
  image is added and shrunk in an already-shrunk folder, its original 
  is simply added to the existing backup subfolder (if still present).

  Because images below the size cutoff are skipped, images are normally
  shrink just once.  If an image is ever reshrunk (by missing the cutoff,
  or being added anew), same-named versions in the backups subfolder 
  saved by later runs will have a counter "__N" added just before their 
  original filenames' extensions to make them unique (e.g., "xxxx__2.jpg"). 

  To restore unshrunk originals, move images from all backups subfolders
  in a tree to their parent folder, ignoring any files with "__N" names.
  The bundled utility "restore-unshrunk-images.py" does this automatically,
  and fully restores the folder tree to its preshrink state in the process.

  For convenience, the bundled "collect-unshrunk-images.py" instead moves 
  all of a tree's backup folders to its root "_shrinkpix-all-originals/" 
  (e.g., to retain but exclude them from site uploads), and moves later 
  additions to existing backup folders there.  You can also collect backup
  folders to an alternate folder outside the source tree, and can later 
  restore originals from a collection tree by a restore + rsync combination. 

  For usage details on restores and collections, see the utility scripts.
  In typical usage, you might use this system's scripts to just shrink; 
  shrink and restore; shrink and collect; or shrink, collect, and restore.
  Note that backup folder names can be changed in settings, but must agree
  between script runs for restores and collections to function properly.
 
-----------------------------------------------------------------------------
MECHANICS
-----------------------------------------------------------------------------

  This script primarily supports and shrinks JPEG, PNG, GIF, and BMP
  images, because these are broadly supported by web browsers, but some 
  other image types work in its code too (see issupported() ahead).  Its 
  shrinking algorithm may seem arbitrary, but yields acceptable results:

  0) Images already below the target size are skipped and unchanged.

  1) For all other images, transformations are tried in turn until the image 
     file is smaller than the target size, or no more transformations remain.

  2) For JPEGs, the script tries to "optimize" the image and then decrease
     its "quality" per the Pillow options of these names; then resizes the
     original image by lowering its dimensions repeatedly using a progression
     of increasing scale factors, using "optimize" and "quality" for each.

  3) For PNGs, the script attempts Pillow's "optimize" setting; then its 
     "quantize()" method; then resizes the original but quantized image as 
     in #2, applying "optimize" on each result tried.

  4) For GIFs and any others, the script attempts Pillow's "optimize" 
     setting; then resizes the original image as in #2, with "optimize".

  JPEGs also retain their original Exif tags when resaved to files, with
  dimension tags updated to reflect the new size of images shrunk by the
  last-resort resizeTillSmall().  This update is not crucial (this script
  by design produces images of reduced quality that are meant only for online
  display), but it may make downscaled images work better in other tools.

  PNGs currently do propagate Exif tags, though Exif has recently been 
  standardized for PNGs; some PNGs record Exif data in ad-hoc ways; and
  Pillow's PngImageFile.getexif() may provide options on this front (TBD).  

  Unsupported image types over the size cutoff are reported at the end of 
  tree-walk runs, but unchanged.  In single-image runs, some additional
  image types may work in the code as is; this was largely soft-pedaled,
  because most browsers don't support exotic types even if Pillow does.

  The tree walker always skips Unix-hidden and developer-private folders,
  whose names start with a "." and "_", respectively; any other folder 
  names you configure to skip in code ahead; and any backup folders found  
  along the way (else this would shrink saved originals).  If the walker's
  scope is still too broad, run it on individual subdirs in your tree.

  References for the Pillow library's tools employed by this program, 
  all of which reside at https://pillow.readthedocs.io/en/stable:
     Resizing:       /reference/Image.html#PIL.Image.Image.resize
     Save options:   /handbook/image-file-formats.html
     Quantize:       /reference/Image.html#PIL.Image.Image.quantize
     Color modes:    /handbook/concepts.html#concept-modes
     Resize filters: /handbook/concepts.html#filters

-----------------------------------------------------------------------------
CAVEATS
-----------------------------------------------------------------------------

  Though this script's results are good enough to adopt at its target 
  site (see RESULTS above), it comes with some tradeoffs you should be
  aware of up front.  In addition to the caution at the top of this file:

  Speed
     This script can run a _long_ time for trees with many large images.
     Per the examples/ folder, an older 2015 MacBook Pro took 3 minutes 
     to shrink 71 images, and about 8 minutes to shrink 125.  This likely
     makes shrinkpix impractical to run in some websites' build scripts.
     The upside is that it runs locally; its slowest run is usually an 
     initial one-time event for existing sites; and using it on individual
     images later is quick.  Shrunken images themselves generally load much
     faster in all contexts, though the improvement depends on many factors.

       UPDATE: large images naturally take longer to shrink.  In newer
       testing with the latest Python, Pillow, and 2020 devices, shrinking 
       a 108MP JPEG image in a 20M file required a full 13 seconds on a 
       2019 MacBook Pro with an 8-Core Intel Core i9 and 16M.  Even then, 
       the image had to be reshrunk (at 2-3 seconds) to hit size < 512K.
       There seems ample room for optimization, both in Pillow and here. 

  Failures
     Though rare, this script may fail to shrink some images to the target
     size; search for "*SAVED ABOVE MAXSIZE*" in run output to see failures.
     Convert these manually from originals to avoid re-shrinkpixing them
     if desired (reshrinking may leave duplicates in backup folders).

       UPDATE: especially for large JPEGs, it may suffice to simply _rerun_ 
       shrinkpix after images are saved above the target size; the next run's 
       resizing will likely complete the shrinkage, and it may be difficult
       to tell the difference in quality except when blown up to actual size.
       PNG mileage may vary, though disabling quantize() ahead may help.

     Individual images may also fail to shrink at all, most commonly due
     to miscoded Exif tags that trigger failures in the piexif library;
     look for "***Unable to shrink: image skipped" in the output for details. 
     These don't cause a run to terminate, but the failing image is unshrunk.

       UPDATE: a few miscoded tags are now fixed to minimize piexif failures: 
       see "[1.3]" changes here.  Unfixed tags may still fail and cause images
       to be skipped as described; edit the miscoded tag with another tool.

  Quality
     In use so far, both JPEGs and PNGs normally shrink with little or no
     visual degradation, but a few PNGs may appear subpar.  Though atypical,
     shadows may render as discernable bands instead of being continuous;
     portions of images may lose colors occasionally; and subtle details 
     like light text may render blurry.  This appears to be the work of 
     the default Pillow quantize(), though resizing is worse.  Always be 
     sure to inspect results before posting, restore and manually shrink 
     those you don't like, and watch for possible shrinkpix updates.  See
     examples/_subpar-pngs for examples and more details on PNG quality loss.

       UPDATE: in later practice, JPEGs have done well with this script, 
       but some PNGs have suffered visible quality loss.  The shrinkpix 
       team is happy to take suggestions for improvements by email, at 
       lutz@learning-python.com.  As is, this script has more potential
       than developer attention, though its code is good enough for many 
       use cases, and can be used as a framework for exploring ideas.

  Thumbnails
     The shrunken images produced by this script might yield thumbnails
     of lesser quality than the originals.  The thumbspage system solved
     this by converting some images to "RGBA" color mode temporarily when 
     making their thumbnails; this produces results as good as for unshrunk
     originals.  You may also avoid quality loss by making thumbnails from 
     originals before downscaling, or from auto-saved originals afterwards.
     See thumbspage's UserGuide.html#thumbnails17 for more background.

  Merges
     As described at BACKUPS above, before shrinking an image in place, 
     this program saves the original in a "_shrinkpix-originals" backups
     subfolder located in the same folder as the original.  This avoids 
     files with different extensions (e.g., ".backup"), but can lead to 
     collisions if multiple folders with backups are merged into a union. 
     If this applies to your use case, rename backup folders manually to 
     make them unique before merging, or use the collector script to move 
     them out of the source trees to a folder that's unique or unmerged.

  Utilities
     If you use -toplevel for a folder here, you probably want to use it 
     for the folder in the restore and collector scripts too, to prevent
     these scripts from processing backups in separately managed subfolders; 
     see these scripts' -toplevel docs for more background.  Also see the 
     collector script for a caveat regarding tree-structure changes; in 
     short, you may not be able to restore from a separate collection tree 
     if its backup paths have been invalidated by source-tree changes.

       UPDATE: you may also need to change the DROPDUPS preset in the 
       restore script to avoid restoring unwanted duplicates.  See that
       script for more details; its preset in version [1.3] differs from
       prior releases for good (but subtle) reasons, though you may 
       need to manually remove "__N" duplicates after a restore.
 
  Settings
     The run options of this script have evolved with use, but some may be
     easier to vary if moved to a separate file or command-line argument
     (though complicated command lines are explicitly discouraged here).

  Design
     A "backup-to" option here and a "backup-from" in the restore script
     could make the collector script and its rsync restores unnecessary.
     Neither was implemented because they seem to muddy the waters with too
     much complexity; restores from collections are error prone if the source
     tree has changed; and this is not (yet?) justified by use cases or users.
     This also wouldn't make sense when shrinking individual files directly
     here: what would the root-relative path be in the "backup-to" folder? 
     In the end, pulled collections may be best used for archiving originals.

  File counts
     The total files count displayed at the end of a run be one higher
     than you expect on Mac OS, due to a ".DS_Store" file now fully hidden 
     by Mac OS's Finder.  Alas, shrinkpix can't fix OSs that cheat.

-----------------------------------------------------------------------------
VERSIONS
-----------------------------------------------------------------------------

  - 1.3, Sep-30-2020:
         1) Silence a bogus DOS warning issued by the Pillow library for
            large (>89MP) images, and avoid a Pillow exception for images
            twice this size.  108MP is now common on high-end smartphones.
         2) Also repair some miscoded Exif tags to avoid piexif failures
            which were formerly caught but caused miscoded-image skips.

         Both were adopted from thumbspage, whose user guide has details:
         [DOS]  htps://learning-python.com/thumbspage/UserGuide.html#_20E
         [Exif] htps://learning-python.com/thumbspage/UserGuide.html#_17G

         The restore script also now ships with DROPDUPS=False, and the
         collector script can now be run with no arguments for usage info.
         All 3 scripts no longer print a traceback on ctrl+c at run confirm.

  - 1.2, Jun-12-2020: 
         Catch piexif lib's bad-tag exception, and skip subject image.
         piexif raises excs for unexpected types for rare tags from some 
         cameras.  This change also catches and recovers from other excs.
         More details: examples/example5-1.2-piexif-exc-skips-demo.txt.

  - 1.1, Mar-08-2020: 
         Add -toplevel option to command lines in all three scripts.
         Use piexif to update some Exif dimension tags propagated to JPEGs 
         whose dimensions have been resized.  New screenshots + examples.

  - 1.0, Mar-02-2020: 
         Initial release, separate from initial dev home in thumbspage.

=============================================================================
"""


import os, sys, shutil, math, mimetypes, io
from PIL import Image
import piexif


#----------------------------------------------------------------------------
# [1.3] Sep-2020: silence a harmless but excessive Pillow-library warning
# now issued stupidly for all large images.  This includes perfectly valid
# 108MP images shot on a Note20 Ultra smartphone, among other >89M image
# devices.  This also impacted thumbspage, tagpix, and PyPhoto, requiring
# program rereleases - a typical open-source-agenda result, and an example 
# of the pitfalls of "batteries included" development.  Fix, please.
# More details: thumbspage or tagpix UserGuide.html#pillowdoswarning.
# Update: Pillow makes this an error _exception_ at limit*2: disable too.
#----------------------------------------------------------------------------

Image.MAX_IMAGE_PIXELS = None    # stop both warning, and error at limit*2

# in case the preceding fails
if hasattr(Image, 'DecompressionBombWarning'):    # not until 2014+ Pillows
    import warnings
    warnings.simplefilter('ignore', Image.DecompressionBombWarning)



#============================================================================
# Configuration
#============================================================================


# Main settings (command-line arguments override some: see above)

TRACE    = True                              # True=print transformation used too
LISTONLY = False                             # True=list but do not change large images
MAXSIZE  = 500 * 1024                        # if size > this, shrink to this or less (500k)
SHRINKEE = '/YOUR-STUFF/Websites/UNION'      # path to folder tree or image to shrink 

# Advanced settings - don't change unless you're sure of the impact

JPEGQUALITY = 85                             # JPEG quality downscale value (95 max)
TRYRESIZES  = [.80, .60, .40]                # All-types resize scale-down %s (1.0 max)
PNGQUANTS   = dict()                         # extra args for quantize() (none for now)

BACKUPSDIR    = '_shrinkpix-originals'       # where unshrunk originals are autosaved
ALLBACKUPSDIR = '_shrinkpix-all-originals'   # where collector script moves backup dirs
NOBACKUPS     = False                        # True=don't save originals (iff trusted!)

SUBDIRSKIPS  = ('_thumbspage', 'thumbs')     # walker: subfolders to skip (thumbnails,..)
SUBDIRFORCE  = ()                            # walker: override all skips to shrink these 
TOPLEVEL     = False                         # walker: skip all tree subs - folder walk



#============================================================================
# Utilities
#============================================================================


def trace(message):
    if TRACE: print(' '*3, '[%s]' % message)


def isimage(filename):
    mimetype = mimetypes.guess_type(filename)[0]                    # (type?, encoding?)
    return mimetype != None and mimetype.split('/')[0] == 'image'   # e.g., 'image/jpeg'


# Conveniences
mimeType  = lambda filename: mimetypes.guess_type(filename)[0]
imageType = lambda filename: mimetypes.guess_type(filename)[0].split('/')[1]


def issupported(filename):
    """
    -------------------------------------------------------------------------
    Define the image types shrunk in tree-walker mode.  Change it to be more
    or less inclusive (e.g., skip just icons?).  Image types that don't pass
    the test here can be converted in single-image mode.  Example: TIFFs work 
    as singles, but may degrade in quality too much to support in tree walks.
    -------------------------------------------------------------------------
    """
    return imageType(filename) in ['jpeg', 'png', 'gif', 'bmp']    # not extension



#============================================================================
# Mechanics
#============================================================================


def backupOriginal(folder, file, path):
    """
    -------------------------------------------------------------------------
    Save original, unshrunk image to a subfolder before changing it.
    The saves subfolder is in the same folder as the original image.
    Use "__N" filenames to avoid overwriting backups from prior runs.
    NOBACKUPS saves cleanup time, if you _really_ trust this script.
    -------------------------------------------------------------------------
    """
    if NOBACKUPS: return

    savedir = os.path.join(folder, BACKUPSDIR)
    if not os.path.exists(savedir):
        os.mkdir(savedir)
    savepath = os.path.join(savedir, file)

    if os.path.exists(savepath):
        # don't clobber prior-run copies
        savehead, saveext = os.path.splitext(savepath)   # xxxxx, .yyy
        copynum = 2
        while True:
            savepath = savehead + '__' + str(copynum) + saveext
            if not os.path.exists(savepath):
                break
            copynum += 1

    shutil.copy2(path, savepath)   # backup: data + metadata



def getImageFormat(imgname):
    """
    -------------------------------------------------------------------------
    Get an image format from filename for buffer saves, where types vary
    and the filename cannot be used for the save.  This also works in older
    Pillows, where setting image.name won't suffice.  Copied from PyPhoto: 
    learning-python.com/pygadgets-products/unzipped/_PyPhoto/PIL/viewer_thumbs.py.
    Note that Pillow's types may or may not match Python's mimetypes maps.
    -------------------------------------------------------------------------
    """
    try:
        from PIL.Image import registered_extensions    # where available
        EXTENSION = registered_extensions()            # ensure plugins init() run
    except:
        from PIL.Image import EXTENSION                # else assume init() was run

    ext = os.path.splitext(imgname)[1].lower()         # lookup ext in Pillow table
    format = EXTENSION[ext]                            # fairly brittle, this...
    return format



def saveToFile(imgbytes, path):
    """
    -------------------------------------------------------------------------
    Save image's bytes to a file: no need for another Pillow save() here.
    -------------------------------------------------------------------------
    """
    savefile = open(path, 'wb')
    savefile.write(imgbytes)
    savefile.close()



def saveToBuffer(image, path, **saveoptions):
    """
    -------------------------------------------------------------------------
    Get content of image without an actual file, by saving to bytes buffer.
    saveoptions vary between JPEG and others (JPEG uses quality and exif).
    Others would silently ignore JPEG's quality, but might not ignore exif.
    -------------------------------------------------------------------------
    """
    filename = os.path.basename(path)
    imgformat = getImageFormat(filename)    # for Pillows that require it
    image.name = filename                   # for Pillows that recognize it

    buffer = io.BytesIO()
    image.save(buffer, imgformat, **saveoptions)
    filebytes = buffer.getvalue()
    return filebytes



def fixFailingExifTags(parsedexif):
    """
    -------------------------------------------------------------------------
    [1.3] Sep-2020: piexif bug temp workaround: correct uncommon Exif tags 
    (e.g., 41729, which is piexif.ExifIFD.SceneType) whose miscoding on some
    devices triggers an exception in piexif's dump() - but not its load(). 
    Other failing tags will still fail in dump(), and skip shrinkage in full.

    Until piexif addresses this more broadly, this munges SceneType from
    int to byte, and converts another from tuple to bytes*4 if needed (and 
    defensively: stuff happens).  The piexif exceptions formerly caused 
    miscoded images to be skipped by shrinkpix processing.  Bug reports:
        tag 41729 => https://github.com/hMatoba/Piexif/issues/95
        tag 37121 => https://github.com/hMatoba/Piexif/issues/83

    This code was adapted from thumbspage (which borrowed from here too).
    -------------------------------------------------------------------------
    """
    parsedExif = parsedexif['Exif']                 # parsed tags: dict of dicts

    # fix SceneType? 1 => b'\x01'
    if 41729 in parsedExif:
        tagval = parsedExif[41729]                  # miscoded on some Galaxy
        if type(tagval) is int:                     # munge from int to byte                
            if 0 <= tagval <= 255:
                parsedExif[41729] = bytes([tagval])
                trace('--Note: bad SceneType Exif tag type was corrected--')
            else:
                del parsedExif[41729]
                trace('--Note: bad SceneType Exif tag type was dropped--')

    # fix ComponentsConfiguration? (1, 2, 3, 0) => b'\x01\x02\x03\x00'
    if 37121 in parsedExif:    
        tagval = parsedExif[37121]
        if type(tagval) is tuple:
            if (len(tagval) == 4 and 
                all(type(x) is int for x in tagval) and
                all(0 <= x <= 255  for x in tagval)):
                parsedExif[37121] = bytes(tagval)
                trace('--Note: bad ComponentsConfiguration Exif tag was corrected--')
            else:
                del parsedExif[37121]
                trace('--Note: bad ComponentsConfiguration Exif tag was dropped--')

    # other tag failures cause shrinking be skipped for an image



def fixJpegExifSize(image, saveoptions):
    """
    -------------------------------------------------------------------------
    Change the dimension tags in a JPEG's Exif data to reflect the image's
    new, reduced size.  This uses the piexif third-party lib because Pillow 
    has almost no Exif support, apart from fetching and saving raw Exif bytes;
    piexif parses and composes the data, and is easy-to-ship pure-Python code. 
    Largely copied from thumbspage, where changing Orientation is more crucial.
    Could run the parse just once, but image processing is much more costly.

    [1.2] piexif.dump() can fail for some oddball tags; skip excs in caller;
    we don't try to fix tags in shrinkpix - for a program which does, plus 
    additional background detail on the piexif design flaw, see thumbspage, 
    at: learning-python.com/thumbspage/UserGuide.html#piexifworkaround
 
    [1.3] this now _does_ try to fix a few tags, with code borrowed from
    thumbspage: see the new fixFailingExifTags(); other tags may still fail.
    -------------------------------------------------------------------------
    """
    if "exif" in saveoptions:                       # set for JPEGs only 
        origexifs1 = saveoptions['exif']            # Exif bytes from Pillow
        if origexifs1:                              # piexif bombs if b''
            parseexifs = piexif.load(origexifs1)    # parse into a dict

            # [1.3] piexif work-around: fix tags known to trigger exceptions
            fixFailingExifTags(parseexifs)

            # update dimension tags
            parseexifs["Exif"][piexif.ExifIFD.PixelXDimension] = image.width
            parseexifs["Exif"][piexif.ExifIFD.PixelYDimension] = image.height
             
            origexifs2 = piexif.dump(parseexifs)    # back to a bytes
            saveoptions['exif'] = origexifs2        # for Pillow save

    return saveoptions



def resizeTillSmall(image, path, **saveoptions):
    """
    -------------------------------------------------------------------------
    Apply resize factors till small enough, always restarting with original.
    The resizing here shrinks image, but preserves the original aspect ratio.
    resize() returns a new copy (unlike thumbnail()); assume image unchanged.
    This is a last-ditch attempt to shrink, iff image-specific options fail.
    For PNGs, image is not the original: it has already been quantized().
    -------------------------------------------------------------------------
    """
    assert len(TRYRESIZES) > 0
    oldwide, oldhigh = image.width, image.height   # or image.size

    for resizepct in TRYRESIZES:
        newwide, newhigh = oldwide * resizepct, oldhigh * resizepct
        newwide, newhigh = math.floor(newwide), math.floor(newhigh)
        resize = image.resize((newwide, newhigh), resample=Image.LANCZOS)

        saveoptions  = fixJpegExifSize(resize, saveoptions)
        newfilebytes = saveToBuffer(resize, path, **saveoptions)
        if len(newfilebytes) <= MAXSIZE:
            break

    # last resize is small enough, or as small as can be
    trace('resized at %0.2f' % resizepct)
    saveToFile(newfilebytes, path)

    # did we make the cutoff?
    if len(newfilebytes) > MAXSIZE: 
        trace('*SAVED ABOVE MAXSIZE*')



def shrinkJPEG(image, path):
    """
    -------------------------------------------------------------------------
    Change quality and optimize, then resize; propagate Exif tags if present.
    JPEGs generally shrink very well, with little or no resizing required.
    Converting PNGs to JPEGs doesn't work well for things like screenshots.
    TBD: could use quality here first, but size/visual diffs negligible.
    Exif tags are propagated; their dimensions are also updated if needed.
    -------------------------------------------------------------------------
    """
    oldexifs = image.info.get('exif', b'')    # raw bytes, if any, via Pillow

    saveoptions = dict(optimize=True, exif=oldexifs)
    newfilebytes = saveToBuffer(image, path, **saveoptions)
    if len(newfilebytes) <= MAXSIZE:
        trace('optimize')
        saveToFile(newfilebytes, path)
    else:
        saveoptions.update(quality=JPEGQUALITY)
        newfilebytes = saveToBuffer(image, path, **saveoptions)
        if len(newfilebytes) <= MAXSIZE:
            trace('optimize+quality')
            saveToFile(newfilebytes, path)
        else:
            trace('optimize+quality+resize')
            resizeTillSmall(image, path, **saveoptions)



def shrinkPNG(image, path):
    """
    -------------------------------------------------------------------------
    Optimize (no quality), then quantize+optimize ("P" format, 256 colors), 
    then resize the quantized image.  quantize()'s "method" option didn't 
    help: 0-1 aren't for RGB, 3 requires a plugin, and 2's diff was trivial.
    There are more advanced quantize() options, but they're beyond scope here.
    Avoid resizing if at all possible: it can blur text/detail badly for PNGs.
    Some PNGs may have Exifs by kludge or a newer standard: ignore them here.
    -------------------------------------------------------------------------
    """
    newfilebytes = saveToBuffer(image, path, optimize=True)
    if len(newfilebytes) <= MAXSIZE:
        trace('optimize')
        saveToFile(newfilebytes, path)
    else:
        quantimage = image.quantize(**PNGQUANTS)
        newfilebytes = saveToBuffer(quantimage, path, optimize=True)
        if len(newfilebytes) <= MAXSIZE:
            trace('optimize+quantize')
            saveToFile(newfilebytes, path)
        else:
            trace('optimize+quantize+resize')
            resizeTillSmall(quantimage, path, optimize=True)



def shrinkOther(image, path):
    """
    -------------------------------------------------------------------------
    Optimize (no quality), then resize.  Used for GIFs, and issupported() or 
    directly shrunk others (e.g., TIFFs work here, but browsers may not show).
    TBD: could specialize here: TIFFs have quality arg, quantize() others?
    -------------------------------------------------------------------------
    """
    newfilebytes = saveToBuffer(image, path, optimize=True)
    if len(newfilebytes) <= MAXSIZE:
        trace('optimize')
        saveToFile(newfilebytes, path)
    else:
        trace('optimize+resize')
        resizeTillSmall(image, path, optimize=True)



def resizeOne(path, folder, file, indent=' '*4):
    """
    -------------------------------------------------------------------------
    Resize a single image; used by both tree-walker and single-image modes,
    though the walker's usage may depend on the coding of issupported().
    [1.2] catch all exceptions here - including piexif's JPEG tag failures.
    -------------------------------------------------------------------------
    """
    print(indent + 'Old size: %d bytes' % os.path.getsize(path))

    try:
        # save a backup copy first
        backupOriginal(folder, file, path)

        # downscale and update in place
        image = Image.open(path)

        if mimeType(file) == 'image/jpeg':
            shrinkJPEG(image, path)

        elif mimeType(file) == 'image/png':
            shrinkPNG(image, path)

        elif mimeType(file) == 'image/gif':
            shrinkOther(image, path)

        else:
            assert isimage(file)
            shrinkOther(image, path)

        image.close()   # just in case
        print(indent + 'New size: %d bytes' % os.path.getsize(path))

    except:
        # any error: report, continue
        print(indent + '***Unable to shrink: image skipped')
        print(indent + ('Exception: %s' % sys.exc_info()[1]).replace('\n', '\n'+indent))



#============================================================================
# Tree-walker mode
#============================================================================


def treeWalkerMode(treeroot):
    """
    -------------------------------------------------------------------------
    Walk tree, resize all its images; tree path in configs or command line.
    The focus here is on JPEG, PNG, GIF, and BMP because that's what browsers
    support, but other types work too if you modify issupported() above.
    -------------------------------------------------------------------------
    """
    treeroot = os.path.abspath(treeroot)
    missedimgs = []
    numfile = numimage = numimagelarge = 0
    for (folder, subs, files) in os.walk(treeroot, topdown=True):

        # prune the walk tree below here for subtrees to skip
        prunes = []
        for sub in subs:
            if sub in SUBDIRFORCE:        # always visit these despite skip tests
                continue
            if (sub == BACKUPSDIR or      # skip prior-run backup dirs: originals
                sub == ALLBACKUPSDIR or   # skip collected backup dirs: originals
                sub in SUBDIRSKIPS or     # skip config subdirs: thumbnails, etc.
                sub[0] in ['.', '_'] or   # skip Unix hidden and developer private
                TOPLEVEL):                # skip all subs: just top-level images
                prunes.append(sub)
        for prune in prunes:
            subs.remove(prune)            # skip this later - and all subs below it

        # shrink this folder's images
        for file in files:
            numfile += 1
            if isimage(file):
                path = os.path.join(folder, file)
                size = os.path.getsize(path)

                if size > MAXSIZE and not issupported(file):
                    # report large images missed
                    missedimgs.append((path, size))

                elif issupported(file):
                    # downscale these types if large
                    numimage += 1
                    if size > MAXSIZE:
                        numimagelarge += 1
                        if LISTONLY:
                           print(path, '[%d bytes, not changed]' % size)
                        else:
                            print(path)
                            resizeOne(path, folder, file)   # resize this one

    # walker wrap-up 
    if missedimgs:
        print('\nMissed %d large images:' % len(missedimgs))
        for missed in missedimgs: print('...', missed)
        print()

    print('Done: %d files, %d images, %d large images' %  
                                (numfile, numimage, numimagelarge))



#============================================================================
# Single-image mode
#============================================================================


def singleImageMode(filename):
    """
    -------------------------------------------------------------------------
    Check and resize just one specific image, path/name in command line.
    issupported() is not tested here on purpose: TIFFs, etc., work too.
    -------------------------------------------------------------------------
    """
    path = os.path.abspath(filename)
    size = os.path.getsize(path)
    if not isimage(path):
        print('Not an image file.')
    elif size <= MAXSIZE:
        print('Already below size cutoff.')
    elif LISTONLY:
        print('Current size: %d bytes.' % size)
    else:
        folder, file = os.path.split(path)
        resizeOne(path, folder, file)
        print('Done.')



#============================================================================
# Top level
#============================================================================


def askyesno(prompt, opts=' (y|n) '):
    try:
        return input(prompt + opts)
    except KeyboardInterrupt:            # [1.3] ctrl+c: don't print trackback
        print()
        return 'no'


if __name__ == '__main__':

    if '-listonly' in sys.argv:          # for folder or file
        LISTONLY = True                  # else use setting's value
        sys.argv.remove('-listonly')

    if '-toplevel' in sys.argv:          # ignored if shrinkee is a file 
        TOPLEVEL = True                  # else use setting's value
        sys.argv.remove('-toplevel')

    if len(sys.argv) > 1:
        shrinkee = sys.argv.pop(1)       # any position, last remaining 
    else:
        shrinkee = SHRINKEE              # from arg, or else setting

    command = 'shrinkpix.py (<folderpath> | <filepath>)? -listonly? -toplevel?'
    confirm = 'This script shrinks images in place, after saving originals; continue?'

    if len(sys.argv) > 1:
        print('Usage:', command)         # any more left?: bad args

    elif (not LISTONLY) and askyesno(confirm).lower() not in ['y', 'yes']:
        print('Run cancelled.')

    elif os.path.isdir(shrinkee):        # arg|setting=dir: walk this tree or dir
        treeWalkerMode(shrinkee)

    elif os.path.isfile(shrinkee):       # arg|setting=file: resize this file only
        singleImageMode(shrinkee)

    else:
        print('Usage:', command)         # what in the world are you shrinking?



[Home page] Books Code Blog Python Author Train Find ©M.Lutz