File: genhtml/

=========================================================================================== - static HTML inserts

Version: 2.7, March 2022 (see VERSIONS ahead)
License: provided freely but with no warranties of any kind
Attribution: © M. Lutz (, 2015-2022


genhtml builds HTML files by performing key-based text insertions on all the 
files in a folder (dir) at site-build time.

Given an HTML templates dir and an HTML inserts dir, it generates final HTML files 
by applying text replacements to the templates, where replacement keys and text 
correspond to insert file names and contents.  The text which replaces keys in the
generated files can be any textual component: HTML, CSS, JavaScript, and so on.

For static insert content (common text that can change over time, but need not
be generated on each new page request), the net result provides a basic webpage 
macro utility, and is an alternative to:

  - Mass file edits on every common-item change - which can be painfully tedious
  - Client-side includes via embedded JavaScript - which not all visitors may run
  - Server-side includes via PHP or Apache - which require a server to view pages
  - cpp - which may not be present, plus makefiles - which must be manually coded

This script adds a local admin step, as it must be run on every HTML content
change (e.g., from a publishing script), but there is no direct HTML include.


Given the SOURCEDIR, TARGETDIR, and INSERTDIR settings in this script's code:

  % python3  (or run via icon or shortcut click)

      1) Copy all changed non-HTML files in SOURCEDIR to TARGETDIR (if any)

      2) Regenerate all HTML files whose SOURCEDIR template or any INSERTDIR 
         insert file it uses has changed since the HTML's latest generation, 
         replacing all insert-file references with insert-file text, and storing
         the expanded HTML files in TARGETDIR

  % python3 [filename]+

      Same, but apply to just one or more SOURCEDIR files, listed without dir name

As shipped, SOURCEDIR, INSERTDIR, and TARGETDIR all are in the script's current
working directory (a.k.a. '.', from where the script is run); change as desired.

To use this script to maintain a site's files:

  1) Change HTML template files in SOURCEDIR, and/or insert files in INSERTSDIR.
  2) Run this script to regenerate all changed HTML files in TARGETDIR as needed.
  3) Upload newly generated files (or all) from the TARGETDIR to the web server.
  Do not change HTML files in TARGETDIR: they may be overwritten by generations!

There are two ways to structure a site's files:

  A) Keep both HTML templates and all other site files in SOURCEDIR.  In this mode,
     changed non-HTML files are copied to TARGETDIR when HTML files are regenerated.
  B) Use SOURCEDIR for HTML template files only, keep other web site files in TARGETDIR.
     This mode avoids copying other non-HTML files to TARGETDIR on HTML regenerations. 
  Either way, TARGETDIR is always the complete web site, for viewing and uploads.


Text replacements are flexible, and derived from insert folder content:

  - Replacement keys are '$XXX$' for all INSERTDIR/XXX.txt filenames.
  - Replacement values are the contents of the INSERTDIR/XXX.txt files.
  - Algorithm:
    For each changed '*.htm' and '*.html' (caseless) HTML template in SOURCEDIR:
        For each XXX in INSERTDIR/XXX.txt:
            Replace any and all '$XXX$' in HTML template 
            with the content of file INSERTDIR/XXX.txt
        Save the result in TARGETDIR
    Other changed non-HTML files (if any) are copied to TARGETDIR verbatim.

To automate changing DATES in both HTML files and insert files, the script also
replaces special non-file '$_DATE*$' keys: e.g., '$_DATELONG$' => 'November 6, 2015'.
See the script's code ahead for the full set of date keys available.

Example key=>file replacements (with possible nested inserts, described ahead):

  Coded in <HEAD>
      $STYLE$  => INSERTDIR/STYLE.txt   (a <style> or <link rel...>)
      $SCRIPT$ => INSERTDIR/SCRIPT.txt  (analytics or other JS code block)
      $ICON$   => INSERTDIR/ICON.txt    (site-specific icon link spec)

  Coded in <BODY>
      $FOOTER$ => INSERTDIR/FOOTER.txt  (a standard sites links toolbar)
      $HEADER$ => INSERTDIR/HEADER.txt  (a standard header block)
      $TOTOC$  => INSERTDIR/TOTOC.txt   (a standard go-to-index button line)

See also "__docs__/template-pattern.html: for a skeleton use case example
file, and the "Html-templates" test folder for additional template examples.


To allow insert files to be built up from other insert files (in an intentionally
limited fashion), the script also replaces any '$XXX$' keys in the loaded text of
insert files, before regenerating any HTML files.  For an example use case, see
FOOTER-COMMON.txt and its clients in the Html-inserts folder; it is inserted into
other footer insert files which have varying footer parts.  For dependency checking,
any newer modtimes of nested inserts are also propagated to their inserter (see ahead).

Limitation: by design, nesting is only 1-level deep - an HTML template may insert an
insert file which inserts other insert files, but no more (this is not recursive).


This script acts like a makefile: changing files suffices to trigger regeneration
on the next run.  In more detail, an HTML file is automatically generated if its
expansion does not yet exist, and regenerated if:

  (a) Its HTML template file has been changed since the HTML's last generation; or
  (b) Its HTML template file inserts any file that has been changed since the
      HTML's last generation; or
  (c) Its HTML template file inserts any file that inserts any other file which
      has been changed since the HTML's last generation (nested inserts).

In other words, the script generates expanded HTML files for all HTML templates
that have no expansion yet; are newer than their expanded versions; or use any
insert file that is newer than their expanded versions.  Conversely, an HTML
file is not regenerated if neither the HTML template nor any used insert is
newer than the expansion target.

Additional dependency notes:

  To force a regeneration of all HTML files, open and save any insert file used 
  by every template (if any), or set the FORCEREGEN variable below.  When True,
  this switch effectively ignores dependency tests and generates all.

  By design, there is no dependency checking for non-file '$_DATE*$' inserts 
  (else each date-key client would be regenerated every day!).  Open and save 
  date-client files to force their regen with updated dates when appropriate.

  When nested inserts are used, dependencies are transitive: the modtime of any 
  insert file is considered to be the greater of the modtime of the insert file 
  itself, or the modtimes of any nested insert files which the insert file uses.  
  Modtimes are propagated to nested inserters before testing template dependencies.


genhtml now handles ".*" macOS/Unix cruft files lurking in inserts and templates 
folders.  This includes both ".DS_Store" Finder files, plus any "._*.{txt, html}" 
AppleDouble resource-fork files on non-macOS filesystem drives.  These files are
skipped in inserts so they don't cause errors, and are simply treated like other
non-HTML in templates and copied over blindly because more may crop up after this
script runs and before the site is uploaded (.DS_Store happens!).  

To handle cruft files post site generation, filter them out with your zip or upload
tools; ziptools (, for example, skips cruft with
its "-skipcruft", and macOS Finder's Compress isolates proprietary items in folders.


When loading page templates, this script tries Unicode encodings on a changeable
list, and uses the first that succeeds.  This is flexible, but may require 
configuration for your site's files.  As an example, this script can generally 
get away with treating CP-1252 files as Latin-1, because bytes whose decoding
interpretations differ between the two are passed through unchanged from load
to save (what Latin-1 reads and writes as 0x93 is still '“' to CP-1252):

  >>> c = '“'
  >>> c.encode('latin1')
  UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c'...
  >>> n = open('temp', 'w', encoding='cp1252').write(c)
  >>> open('temp', 'r', encoding='cp1252').read()
  >>> L = open('temp', 'r', encoding='latin1').read()      # genhtml load
  >>> L
  >>> n = open('temp', 'w', encoding='latin1').write(L)    # genhtml save
  >>> open('temp', 'r', encoding='cp1252').read()          # quote retained
  >>> open('temp', 'rb').read()

More generally, because Latin-1's encoded bytes are also code-point values, 
'latin1' often works as well as 'cp1252' and other 8-bit encodings in this
program's TemplateEncodings Unicode encodings list, as long as you don't need
to match replacement keys containing text outside Latin-1's character set:

  >>> '“'.encode('cp1252').decode('latin1').encode('latin1').decode('cp1252')
  >>> '“'.encode('cp1252').decode('latin1') == '“'    # cp1252's meaning lost

As guidelines, though, use CP-1252 if you know your files are this type, and 
convert files to a common site-wide file encoding like UTF-8 to avoid encoding 
mismatch issues altogether.  For more tips on choosing encodings, see the 
following - an online article with an expanded version of this note in its 
"Footnote," and which uses a similar Unicode-choices scheme:

Both mention the third-party "chardet" as an alternative for encoding guesses.


Subtlety: as coded, genhtml assumes that the content of insert files is 
compatible with the Unicode encoding used to load and later save template 
files which use the inserts.  This can be an issue if the template file's 
encoding is too narrow for a used insert's text.  

For example, if a template file is loaded as ASCII and uses a UTF-8 insert 
file containing non-ASCII text, the expanded result's save will fail when 
trying to save as ASCII again.  To avoid this potential, either make sure 
your inserts are Unicode-compatible with template files that use them (e.g., 
all ASCII, Latin-1, or UTF-8), or change the "TemplateEncodings" list below 
to accommodate all possible cases.

As an example of the latter approach, if this list is set to begin with UTF-8
(not ASCII), it will handle simple ASCII templates and inserts (as a subset 
encoding), but will also allow for non-ASCII inserts in ASCII templates and 
save the results as UTF-8.  This works because ASCII templates will be loaded
as UTF-8, though you may wish to update such files' charset declarations for 
the broader UTF-8 encoding also used to save the expanded result (this is why 
saves are still allowed to fail if the load encoding does not work).

[2.7] UPDATE MAR-2022: saves are now retried as UTF-8 if template load encodings
fail.  This allows inserts to differ, but be sure to set the page's encoding too.
It cropped up for a non-ASCII insert file and an ASCII template, which may be 
common in practice.  To see saves retried, look for output messages like this:

=> index.html: 
	** ascii failed - retrying save as UTF-8: GENERATED, using UTF-8

To disable this and fallback on prior exceptions, set RETRYSAVES to False ahead.


  - Skips replacement targets not present in HTML text (replace() is a no-op).
  - Skips replacement targets having no INSERTDIR file (no .replace() is run).
  - Tries multiple Unicode encodings for HTML text: expand the set as needed.
  - Assumes insert files are all in same Unicode encoding: change as needed.
  - Changed external CSS files must be uploaded, but do not require or trigger
    HTML regeneration here (unlike changed CSS <link>s or inline CSS code inserts).
  - See "Programming Python, 4th Edition" for automated FTP site upload scripts.
  - File modtimes are simply floats, giving seconds since the "epoch": 

        >>> import os, time
        >>> t1 = os.path.getmtime('.')
        >>> t1
        >>> time.ctime(t1)
        'Tue Nov  3 09:48:39 2015'
        >>> t1 += 1
        >>> time.ctime(t1)
        'Tue Nov  3 09:48:40 2015'


  - Nested inserts could be allowed to be arbitrarily deep, rather than limiting
    them to just one level.  This entails substantial change and extra complexity,
    though, which has not yet been justified by the proprieter's use cases.

  - Subdirectories are not directly supported, though they can be maintained as
    separately generated and uploaded working folders (e.g., via Bash scripts).

  - Automatic "<!-- -->" comment wrappers could be emitted, but they may be 
    invalid for shorter text inserted into other lines (versus text blocks).

  - It might be useful to parameterize inserts in some fashion.  For instance,
    between '@' delimiters, allow a script name and arguments defining a
    command line whose stdout gives the insert text.  This is an order of
    magnitude more complex, though, and is not warranted by any use case so far.

  - Per the Dec2018 note above: should save failures try a broader type like 
    UTF-8; and should the preset TemplateEncodings start with UTF-8 instead of 
    ASCII?  Neither is done automatically today, because a load/save encoding 
    difference may require additional user actions (e.g., charset changes).
    UPDATE - see the Mar2022 update at note above: saves are now retried as UTF-8.


  2.7, Mar-07-22: retry page saves with UTF-8 if template's encoding fails.
  2.6, Sep-26-20: docs only - reformatted/rewrote this docstr for readability.
  2.6, Dec-07-18: docs only - add note about mixing template/insert encodings.
  2.5, Sep-01-18: docs only - add note that Latin-1 handles cp1252 files.
  2.5, Feb-28-18: add new date format, use just *.txt inserts (skip dirs, etc.).
  2.4, Jun-11-17: skip/handle ".*" macOS cruft files in inserts and templates dirs. 
  2.3, Jun-03-16: add built-in date format that strips leading "0" from day number.
  2.2, Dec-27-15: change summary text to "generated", to match trace displays.
  2.1, Dec-26-15: minor patch to propagate modtimes of nested insert files to their
       nesters correctly when > 1 file is nested (compare to loop max, not nester).
  2.0, Nov-26-15: smarter dependency checking, refactor code.
       Don't regenerate an HTML file for a changed insert file, unless the HTML's
       template actually USES the insert file, or an insert file that inserts it.
       Also refactor as functions: at ~250 lines, top-level script code becomes
       too scattered to read (and at ~1K lines, class structure is nearly required).
  1.X, Initially released sometime in 2015 (presumably).



import os, sys, shutil, time
trace = lambda *args: None     # set to print to see more output

# user settings (all three dirs are in '.' cwd as shipped)

INSERTDIR = 'Html-inserts'     # insert text, filename gives key: XXX.txt.
SOURCEDIR = 'Html-templates'   # load html templates (and others?) from here
TARGETDIR = 'Complete'         # save expanded html files (and others?) to here

CLEANTARGET = False            # True = empty all files in TARGETDIR first
FORCEREGEN  = False            # True = regenerate all HTML files, ignoring dependencies
RETRYSAVES  = True             # True = try UTF-8 if template encoding fails on saves [2.7]

# customize Unicode encodings iff needed - see Sep2018 and Dec2018 notes above

TemplateEncodings = ('ascii', 'utf8', 'latin1', 'utf16')   # try each, in turn
InsertsEncoding   = 'utf8'                                 # use for all inserts

def loadInserts():
    Load insert files/keys and modtimes.
    (Jun2017) Skip any hidden ".*" macOS cruft files lurking in inserts dir,
    to avoid errors on "._*.txt" AppleDouble or any other Unix hidden files.
    inserts, insmodtimes = {}, []
    for insertfilename in os.listdir(INSERTDIR):
        # Added Jun2017
        if insertfilename.startswith('.'):                   # skip macOS (and Unix) cruft

        # Added Feb2018                                      # skip non-text files, dirs
        if not insertfilename.endswith('.txt'):
            continue                                         # use just *.txt inserts

        insertkey = '$' + insertfilename[:-4] + '$'          # key='$XXX$' from 'XXX.txt'
            path = os.path.join(INSERTDIR, insertfilename)   # load insert text for key
            file = open(path, encoding=InsertsEncoding)      # platform default or custom
            text =
            file.close()                                     # close for non-CPython                       
            inserts[insertkey] = text
            insertmodtime = os.path.getmtime(path)           # modtime for changes test   
            insmodtimes.append([insertkey, insertmodtime])   # add file-based inserts only
            inserts[insertkey] = ''  # empty if file error
    return inserts, insmodtimes

def addDateInserts(inserts):
    Add special non-file replacement keys (evolve me).
    Not file-based, so never added to insmodtimes list.
    inserts['$_DATELONG$']  = time.strftime('%B %d, %Y')     # 'November 06, 2015'
    inserts['$_DATESHORT$'] = time.strftime('%b-%d-%Y')      # 'Nov-06-2015'
    inserts['$_DATENUM$']   = time.strftime('%m/%d/%Y')      # '11/06/2015'
    inserts['$_DATETIME$']  = time.asctime()                 # 'Fri Nov  6 10:44:58 2015'
    inserts['$_DATEYEAR$']  = time.strftime('%Y')            # '2015'

    # Added Jun2016
    shortenday  = time.strftime('%b-%%s-%Y')
    shortenday %= time.strftime('%d').lstrip('0')            # drop day leading 0
    inserts['$_DATESHORTER$'] = shortenday                   # 'Jun-3-2016'

    # Added Feb2018                                          # drop day leading 0 too
        shortenedday = time.strftime('%B %-d, %Y')           # 'November 6, 2015'                                                     
        inserts['$_DATELONG2$'] = shortenedday               # alt technique: -d
        inserts['$_DATELONG2$'] = inserts['$_DATELONG$']     # if not supported

def propagateModtimes(insmodtimes):
    For nested inserts, the modtime of any insert file is considered
    to be the greater (later) of that of the file itself and that of
    any file it inserts.  Propagate newer modtimes from nested inserts
    to their clients before expanding nested inserts or HTML templates.
    for pair1 in insmodtimes:                # pairs are mutable lists
        [key1, modtime1] = pair1
        latest = modtime1
        for [key2, modtime2] in insmodtimes:
            if (key2 in inserts[key1]) and (modtime2 > latest):   # dec15: not modtime1
                latest = modtime2
        pair1[1] = latest

def expandInserts(inserts):
    Globally replace any keys in loaded insert-file text.
    Expands any nested inserts before inserts applied to html.
    for key1 in inserts:                                     # for all insert texts
        text = inserts[key1] 
        for key2 in inserts:                                 # for all insert keys
            text = text.replace(key2, inserts[key2])         # no-op if no match
        inserts[key1] = text                                 # inserts changed in-place

def sourceNewer(pathfrom, pathto, allowance=2):
    Was pathfrom changed since pathto was generated?
    2-seconds granularity needed for FAT32: see Mergeall.
    This and its follower assume both file paths exist.
    fromtime = os.path.getmtime(pathfrom)
    totime   = os.path.getmtime(pathto)
    return fromtime > (totime + allowance)

def insertsNewer(textfrom, pathto, insmodtimes, allowance=2):
    Was any used insert file changed since pathto was generated?
    2-seconds granularity needed for FAT32: see Mergeall.
    This could use any() and generators... but should it?
    # for all insert files, check if newer and used
    totime = os.path.getmtime(pathto)
    for (inskey, instime) in insmodtimes:
        if instime > (totime + allowance) and inskey in textfrom:
            return True
    return False

def loadTemplate(pathfrom):
    Try to load an HTML template file, using various Unicode types,
    from simpler to more complex.  Return tuple of text + encoding.
    for encoding in TemplateEncodings:
            file = open(pathfrom, mode='r', encoding=encoding)
            text =
            return (text, encoding)               # success: return now
            trace(encoding, 'invalid, ', sys.exc_info()[0])                   
    return (None, None)                           # no encoding worked

def generateHtmls(filestoprocess, inserts, insmodtimes):
    Generate expanded HTML files for all HTML templates that have no
    expansion yet, are newer than their expanded versions, or use any
    insert files which are newer than the templates' expanded versions.
    (Jun2017) Copy any ".*" macOS/Unix cruft files like non-HTML items,
    to avoid errors for "._*.html" (filter out when zip or upload).
    global numcnv, numcpy, numskip, numfail
    for filename in filestoprocess:
        print('=>', filename, end=': ')
        pathfrom = os.path.join(SOURCEDIR, filename)
        pathto   = os.path.join(TARGETDIR, filename)
        if not os.path.isfile(pathfrom):
            # skip any subdirs, etc.
            print('non-file, skipped')
            numskip += 1

        elif filename.startswith('.') or not filename.lower().endswith(('.htm', '.html')):
            # non-html file: don't attempt regen, copy to target if changed;
            # used when entire site in templates dir, not just templates;
            # do macOS ".*" cruft here - skip in ziptools/uploadall (Jun2017);
            if os.path.exists(pathto) and not sourceNewer(pathfrom, pathto):
                # source file unchanged, don't copy over
                print('unchanged, skipped')
                numskip += 1
                # copy in binary mode unchanged
                rawbytes = open(pathfrom, mode='rb').read()
                file = open(pathto, mode='wb')
                file.close()                        # close for non-CPython
                shutil.copystat(pathfrom, pathto)   # copy modtime over too
                print('COPIED unchanged')
                numcpy += 1

            # html file: regen to target if html or used inserts changed;
            # whether templates dir is entire site, or template files only;
            # macOS "._*.html" cruft files won't reach here (loads won't fail);  
            (text, encoding) = loadTemplate(pathfrom)
            if text == None:
                numfail += 1
            elif (os.path.exists(pathto)                           # target generated
                  and not (
                      sourceNewer(pathfrom, pathto) or             # source not changed
                      insertsNewer(text, pathto, insmodtimes) or   # no used insert changed
                      FORCEREGEN                                   # not forcing full regen
                # neither html template nor any used insert newer than target
                print('unchanged, skipped')
                numskip += 1

                # globally replace keys in text and copy over;
                # no copystat(): insertsNewer() needs new modtime
                for key in inserts:                                # for all filename keys
                    text = text.replace(key, inserts[key])         # no-op if no match

                # 2.7: try utf8 for save if template encoding fails: inserts may differ
                resaved = False
                    file = open(pathto, mode='w', encoding=encoding)   # encoding=template's
                except UnicodeEncodeError:
                    if not RETRYSAVES:
                        resaved = True
                        print('\n\t** %s failed - retrying save as UTF-8' % encoding, end=': ')
                        file = open(pathto, mode='w', encoding='utf8')

                file.close()                                       # close for non-CPython
                print('GENERATED, using', encoding if not resaved else 'UTF-8')
                numcnv += 1

if __name__ == '__main__':
    Top-level code run on script invocation.
    numcnv = numcpy = numskip = numfail = 0    # globals all

    # empty target dir?
        for filename in os.listdir(TARGETDIR):
            os.remove(os.path.join(TARGETDIR, filename))
        print('--Target dir cleaned')

    # load/add inserts text
    inserts, insmodtimes = loadInserts()
    print('Will replace all:',
          *(key for key in sorted(inserts)), sep='\n\t', end='\n\n')

    # copy newer modtimes of nested inserts to clients
    # expand nested inserts replacements first

    # check run mode
    if len(sys.argv) == 1:
        filestoprocess = os.listdir(SOURCEDIR)    # all files in source dir
        filestoprocess = sys.argv[1:]             # or just filename(s) in args

    # expand and copy templates, copy others
    generateHtmls(filestoprocess, inserts, insmodtimes)

    # wrap up
    summary = '\nDone: %d generated, %d copied, %d skipped, %d failed.'
    print(summary % (numcnv, numcpy, numskip, numfail))
    if sys.platform.startswith('win'):
        input('Press enter to close.')   # retain shell if clicked

[Home page] Books Code Blog Python Author Train Find ©M.Lutz