File: thumbspage/build/insert-analytics.py

#!/usr/bin/env python3
"""
===========================================================================
Add analytics code to online resources before they are uploaded [2.1].
Part of the thumbspage program (learning-python.com/thumbspage.html).

This script was recoded to support two usage modes:

1) Add analytics to all thumbspage resources (original)
2) Add analytics to any single file (new)

The first mode is invoked with zero command-line arguments, as before. 
The second is invoked with a pathname, and used by external-demo builds.
The second mode is not thumbspage specific, but reuses its common code.
The second mode can also be used by importing its function from here.

For both modes:

Works by replacing comment keys in files: see 'replacekeys' list below.
Makes no changes to files that already have analytics (e.g., reruns).
Note that analytics could be left in resources always, because it's not
run when docs are viewed locally via file:// URLs; omitted for courtesy. 

Analytics is anonymized (IPs are truncated), and is used only for 
prioritizing website development.  For more, see the website's
privacy statement: https://learning-python.com/privacy-policy.html.

For mode 1:

Run me from anywhere: always changes to the website-publishing folder.
Inserts analytics code in the online versions of the user guide and all 
example-gallery index pages.  This does not add analytics to individual 
images' _viewer_ pages in a gallery (even though they have the same key 
as default index headers, replace1 ahead); that seems too much data.

This automates roughly 20 file edits every time thumbspage is released.
To use it elsewhere, change folder paths, and possibly replacement keys.

----
Update Oct22: this script is now used by all publish scripts at the site
hosting thumbspage - and it's no longer clear why it's embedded here.
Changed to insert both UA and GA4 analytics tags temporarily; the UA
tag will be removed here after it stops collecting data in July-2023.
Some pages get tags via genhtml inserts; others run this when published.

A handful of pages that still hardcode the UA tag will now have to use
a replacekeys, because pages with just the UA tag need to be extended 
with the GA tag (they're no longer a skip here), and all pages need
to be regenerated to use just the GA4 tag eventually (thanks Google!).

----
Update Mar23: this sript now supports > 1 website (learning-python.com
and the new quixotely.com).  To generalize it, the UA and GA4 site IDs 
can now be passed in to the insert functions, and will be pasted into 
the analytics template string (default=l-p.com for older build scripts).

BUT this won't help for new sites that are GA4 only - generating UA
code too would be wrong.  For these cases, the path to an analytics
code file can be passed in instead, both on command lines and in direct
function calls.  quixotely.com's publish script passes in a GA4-only code 
file, which matches the code generated by tweak-analytics.py for l-p.com.
This script doesn't do very much for such cases (and could be imitated),
but it's better to centralize these auto edits here for future morph.
===========================================================================
"""
import os, sys


#--------------------------------------------------------------------------
# This will naturally vary per site (don't use this verbatim!)
#
# Oct22; changed to use both a UA and GA4 tag (script) for now, per above.
#--------------------------------------------------------------------------

analytics = """
<!-- 
Anonymous analytics to prioritize work, enabled in online resources 
only.  Automatically inserted at publish time by insert-analytics.py.
-->

<!-- 1) Universal Analytics tag (custom): stops collecting data on Jul-1-2023 -->
<SCRIPT>
  // Start async JS-file fetch, if not already cached

  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  // Queue actions to run in order after async JS-file fetch finished

  ga('create', '%(UA_ID)s', 'auto');       // Create tracker object (and queue)
  ga('set', 'anonymizeIp', true);              // Anonymize IP addr (&aip) [Jun-2019]
  ga('send', 'pageview');                      // Send page-view event now 
</SCRIPT>

<!-- 2) Google Analytics 4 tag: added to site Oct-2022 (okay to keep UA tag) -->
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=%(GA_ID)s"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', '%(GA_ID)s');
</script>

<!-- End analytics insert -->
"""


# default, original
learning_python_ids = dict(UA_ID='UA-52579036-1', GA_ID='G-J8CTEZHX3L')



#--------------------------------------------------------------------------
# Analytics may be formatted arbitrarily if already present: look for this.
#
# Oct22: if found, this is now a caution, because there are both UA and GA4
# tags, and pages need both UA and GA4 now, and just GA4 later.  Change any
# such pages to use a replacekeys insead of hardcoding the UA key.
#
# Mar23: added analyticskeyGA4, and skip files with this too, for new
# quixotely.com site.  This test doesn't care about files with only 
# a replacement tag (or no replacement tag); files with UA key were
# skipped before; and files with GA4 key are now skipped too, but this 
# seems innocuous: (UA=>GA4) and ((UA+GA4))=>GA4 both have UA already. 
# 
# l-p.com's tweak-analytics.py and $SCRIPT$ inserts aren't used for
# quixotely.com, and this script (i-a) is used only for explicit inserts.  
# Thai is: l-p.com ad qx.com are disjoint, and files processed here with
# _either_ UA tag, GA4 tag, or both, should clearly be skipped, else tag
# code is added redundantly.  This mod is logically sound for l-p.com too.
#--------------------------------------------------------------------------

# skip files with either
analyticskeyUA  = "ga('send', 'pageview')"            # works for UA tag only
analyticskeyGA4 = "gtag('js', new Date())"            # works for GA4 tag only
analyticskeys   = [analyticskeyUA, analyticskeyGA4]


#--------------------------------------------------------------------------
# Replacement keys - change or pass as needed to use this script elsewhere:
#
# - replace1: auto-added by thumbspage in default headers <head> since 2.0 
# - replace2: inserted manually in custom headers over the years before 2.0
#
# And some examples have analytics already present; this was < consistent, 
# but legacy.  replace1 is also auto-added to viewer pages' <head> by 2.0+,
# but viewer pages on this site never have analytics inserted (it's TMI).
#
# Caution: files with just commented-out analytics must be changed to use 
# a replacekeys, else they will be skipped as already having analytics.
# This is also legacy, from manual procedures that uncommented code; alas.
#--------------------------------------------------------------------------

# replace either key
replace1 = '<!-- Plus analytics code, custom styles, etc. (replace me) -->'
replace2 = '<!-- plus analytics code, etc. -->'
replacekeys = [replace1, replace2]


#--------------------------------------------------------------------------
# Paths used for mode 2 (thumbspage): change|pass to use script elsewhere
#--------------------------------------------------------------------------

homedir = os.environ['HOME']
codedir = homedir + '/MY-STUFF/Code/thumbspage'
pubdir  = homedir + '/MY-STUFF/Websites/Programs/Current/Complete/thumbspage'



def insertOneFile(filepath, 
                  replacekeys=replacekeys, 
                  trace=print,
                  tagcode=None,
                  tagids=learning_python_ids):
    """
    --------------------------------------------------------------------
    Insert analytics code at any replacement key in filepath, in place.
    This can be used for any site file, not just thumbspage examples.

    Skip files that already have analytics, by key test.  This includes
    both hard-coded tags, as well as files already processed by this
    and not copied to a collection folder (l-p.com does, qx.com doesn't).

    tagcode is an optional code-file path, tagids is an optional dict.
    If tagcode, use it instead of default analytics code in this script.
    Else, insert tagids in the default analytics string in this script.
    --------------------------------------------------------------------
    """
    changed = False
    filetext = open(filepath, 'r', encoding='utf8').read()
 
    if any(key in filetext for key in analyticskeys):
        trace('caution'.upper() + ' - file already has analytics:', filepath)    # Oct22

    elif not any(key in filetext for key in replacekeys):
        trace('File lacks replacement key:', filepath)

    else:
        if tagcode:
            insertcode = open(tagcode, 'r', encoding='utf8').read()   # override tag code
        else:
            insertcode = analytics % tagids                           # insert code's ids

        changed = True
        replace = [key for key in replacekeys if key in filetext]
        filetext = filetext.replace(replace[0], replace[0] + insertcode)

        fileout = open(filepath, 'w', encoding='utf8')
        fileout.write(filetext)
        fileout.close()  # flush for diff or other

        replaceindexes = [replacekeys.index(key) for key in replace]
        trace('Inserted analytics in file: '
              'keys=%s, file=%s' % (replaceindexes, filepath))
        if len(replace) > 1:
            trace('caution'.upper() + '- multiple replacement keys in file')

    return changed



def insertThumbspageExamples(pubdir, codedir, verbose=False):
    """
    --------------------------------------------------------------------
    Insert analytics code into user guide and all thumbspage examples.
    Run just before uploads, works on the website publication folder.
    --------------------------------------------------------------------
    """
    os.chdir(pubdir)    # thumbspage upload copy

    # user guide
    ugname = 'UserGuide.html'
    changed = insertOneFile(ugname)

    if changed and verbose:
        print('Diff of new user guide to original:')
        os.system('diff %s %s/%s' % (ugname, codedir, ugname))

    # example indexes
    numexchanged = 0
    for (dirhere, subshere, fileshere) in os.walk('examples'):
        for name in fileshere:
            if name == 'index.html':
                print()
                expath = os.path.join(dirhere, name)
                changed = insertOneFile(expath)

                if changed:
                    numexchanged += 1
                    if verbose:
                        print('Diff of new example index to original:')
                        os.system('diff %s %s/%s' % (expath, codedir, expath))

    print('Examples changed:', numexchanged)



if __name__ == '__main__':
    if len(sys.argv) == 1:
        insertThumbspageExamples(pubdir, codedir)    # no args: all thumbspage files

    elif len(sys.argv) == 2:
        insertOneFile(sys.argv[1])                   # one arg: this specific file only
 
    elif len(sys.argv) == 3:                                 # [mar23] for lutzware.com
        insertOneFile(sys.argv[1], tagcode=sys.argv[2])      # two args: this file+code  

    # tagids is not supported in command-line mode

    else:
        print('Usage: python3 insert-analytics.py [singleFilePath tagcodefile?]?')    




"""
==================================================================================
Example output (none are inserted on reruns):

Inserted analytics in file: keys=[1], file=UserGuide.html

Inserted analytics in file: keys=[1], file=examples/Screenshots/index.html

Inserted analytics in file: keys=[1], file=examples/1.7-upgrades/index.html

Inserted analytics in file: keys=[0], file=examples/unicode/images/index.html

Inserted analytics in file: keys=[1], file=examples/dynamiclayout/index.html

File already has analytics: examples/dynamiclayout/Demo-Wide-Filename-Labels/index.html

File already has analytics: examples/dynamiclayout/Demo-Narrow-Filename-Labels/index.html

Inserted analytics in file: keys=[1], file=examples/reorientation/index.html

File already has analytics: examples/reorientation/Unrotated-images-in-browsers/index.html

Inserted analytics in file: keys=[0], file=examples/mixedtypes/index.html

Inserted analytics in file: keys=[0], file=examples/mixedtypes/Limited-Support-Types/index.html

Inserted analytics in file: keys=[0], file=examples/subfolders/index.html

Inserted analytics in file: keys=[0], file=examples/subfolders/Subfolder3/index.html

Inserted analytics in file: keys=[0], file=examples/subfolders/Subfolder2/index.html

Inserted analytics in file: keys=[0], file=examples/subfolders/Subfolder1/index.html

Inserted analytics in file: keys=[0], file=examples/subfolders/Subfolder1/SubSubfolder/index.html

Inserted analytics in file: keys=[1], file=examples/2.1-upgrades/index.html

Inserted analytics in file: keys=[1], file=examples/2.0-upgrades/index.html

Inserted analytics in file: keys=[0], file=examples/2.0-upgrades/MORE-INFO-POPUP/index.html
Examples changed: 15
==================================================================================
"""



[Home page] Books Code Blog Python Author Train Find ©M.Lutz