File: cgi/

============================================================================== - on a URL query from a client, display any text file 
in an HTML page, auto-scrolled horizontally, with a raw-text link.

Author/Copyright: 2018, M. Lutz (
License: provided freely, but with no warranties of any kind.
Version: February 23, 2018 - initial release, for mobile site redesign.

This is a Python CGI script: it runs on a web server, reads URL "?"
query parameters, and prints HTTP headers and HTML text to the client.
Its code runs on both Python 2.X and 3.X (and 2.X on its current host).


Whenever invoked, this script dynamically builds an HTML reply page 
containing the subject file's text, with styling and auto-scrolling
in both directions.  This is done primarily for ease of viewing on 
smaller screens (e.g., mobile devices).  Else, text may be too small
to read, or worse, line-wrapped, which is awful for program code.

Links to view and save the file's raw (plain) text are also generated 
as options for browsers that handle them well (e.g., opening text in 
an editor).  This script is run for _every_ ".py" and ".txt" file on 
this site displayed directly, per the invocation schemes up next.


This script is run by both explicit HTML links, and automatic Apache
rewrite rules.  In general, it is invoked with a URL of this form, 
where the subject file's name appears as a query-string parameter:

The site name can be relative in links as usual, and the subject file 
is assumed to live in ".." (the site root, above the cgi/ folder of 
this script), so links in HTML files are coded this way when explicit:

  <A HREF="cgi/"></A>

The current site uses a few of the links above, but also uses Apache 
URL rewrite rules in .htaccess files to automatically route all other 
requests for both "*.py" and "*.txt" files to this script.  The rules
use PCRE patterns to map basic URLs to the form above automatically, 
thereby avoiding many manual link edits. 

For example, the following rewrite rule maps all URLs not starting in
'cgi/' and ending in '.py' or '.txt' to the script, thus handling all
direct Python and text file links, while skipping script invocations
(other extensions, including '.pyw,' are being added as needed):

  rewriterule ^(?!cgi\/)(.*).(py|txt|pyw)$ 

This works, but makes raw-text support complex.  Because the Apache
rule maps *all* Python and text file links to the script's URL (and 
it's weirdly difficult to prevent a rewrite of a rewrite in Apache), 
this script also supports a "rawmode" parameter, for use in the 
template file's URLs meant to fetch a raw-text copy:

  <A HREF="cgi/"></A>
  <A HREF="cgi/"></A>

A "rawmode=view" triggers inline plain-text output in this script 
instead of HTML; its effect is the same as a direct file link sans
rewrites.  A "rawmode=save" sends plain text as attachment, which asks 
browsers to save immediately; where supported, this is arguably easier
and more reliable than cmd/ctl-A+C to select text, or link rightclicks.


When loading code files, this script tries a set of Unicode encodings 
in turn, until one works or all fail.  Most Python and text files on 
this site are UTF8 (or its ASCII subset), but a few Latin-1 files crop
up as examples.  The UNICODE_IN encodings list reflects this, but may 
be changed for use elsewhere.  Once loaded, code text is just decoded 
codepoints in memory, and is always output as UTF8-encoded bytes.


  Besides making raw-text support complex, the Apache rewrite rule also 
  breaks "README.txt" files in auto-generated index pages; their text no
  longer appears (the leading theory is that their names are rewritten).

  This can be addressed by coding manual "index.html" pages.  But it's 
  simpler to rename or copy to "README.html" with a <PRE> or <P> around
  the file's text and a "ReadmeName README.html" in the .htaccess file. 
  For less-important cases, rename to "_README.txt" and let the user 
  click if they really wish to view; a script can easily automate this:
  see for an example.

  The only files that _require_ an explicit URL for display are those 
  in this folder (cgi/).  In the HTML template file, for example, the 
  self-display links must be explicit URLs.  For scripts, appearance
  in URL query parameters also avoids invocation.  All other files
  can be displayed by either explicit URL _or_ the Apache rewrite rule. 

  This script can also be invoked by URL in the "action" tag of a form 
  in an HTML page; could be submitted by a script (see Python's urllib);
  and might work as an Apache handler (to be explored).

As is, this script reflects a number of tradeoffs:

-Its code must run on the Python 2.X version which is default on the host.
-Its footer code must avoid copies of text normally generated by genhtml.
-Its error checking is minimal, as it is used only in well-known contexts.
-Its ".." assumption for subject files' paths is not very general.
-Its Apache rewrite rule breaks "README.txt" in index pages (see above).
-Its always-UTF8 output policy means others are converted to this on saves.

OTOH, it works as intended, and demos CGI; expand and improve as desired.

import cgitb; cgitb.enable()      # route python exceptions to browser/client
import cgi, os, sys, codecs

if sys.version[0] == '3':                             # py 3.X/2.X compatible
    from html import escape as html_escape            # run on 2.X only to date
    from urllib.parse import quote_plus
    from cgi import escape as html_escape             # for text added to HTML 
    from urllib import quote_plus                     # for text added to URL

# Switches and constants

MOCK        = False                 # 1=simulate URL invoker for testing
UNICODE_IN  = ['UTF8', 'latin1']    # try in turn for code file content
UNICODE_OUT = 'UTF8'                # for text in generated reply page

TEMPLATE    = 'showcode-template.txt'    # the reply-page format
FOOTER      = '../dummy-footer.html'     # site-wide footer code

# Get input filename (and raw-text mode?) sent from the client

if not MOCK:
    form = cgi.FieldStorage()         # parse live form/url input data
    class Mock:                       # or simulate form input to test
        def __init__(self, value):
            self.value = value
    form = dict(name=Mock(''))    # + rawmode=Mock('view')?

if 'name' not in form:
    name = 'cgi/'          # show myself: more useful 
    # error check: custom reply = hdr + blankline=\n + msg
    print('Content-type: text/plain\n')
    print('Please provide a value for "name" in the request.')
    name = form['name'].value         # real or mocked, pathname relative to '..'

if 'rawmode' not in form:
    rawmode = False
    rawmode = form['rawmode'].value   # 'view' or 'save' or absent=formatted

# Load the code from a file in "..", in 1 of N Unicode encodings

# name may be a basename or a pathname relative to ".." (site root);
# both open()/read() flavors retain \r on Windows, decode to codepoints,
# and return a Unicode object: a py2 u'xx' unicode, or a py3 'xx' str; 
# tries N Unicode types for input, but always outputs as UTF8 bytes;

path = '..' + os.sep + name
for tryenc in UNICODE_IN:
        if sys.version.startswith('3'):
            code = open(path, mode='r', encoding=tryenc, newline='').read()
            code =, mode='r', encoding=tryenc).read()
        pass     # try next encoding on list
        break    # load successful: skip else
    code = ("Error: could not open file.\n\n"
            "Please adjust the script's UNICODE_IN list.\n")

# Load and expand the HTML template

# it's okay to load the template as str, even though code is unicode:
# in py3 they're the same: both are str, which is always Unicode text;
# in py2 they differ, but str is coerced up: '%s' % u'spam' => u'spam',
# even for dicts: see;

if rawmode:
    reply = code    # send text as is

    template = open(TEMPLATE).read()        # template file in '.', ASCII only

    codehtml = html_escape(code)            # HTML-escape any characters special in HTML
    namehtml = html_escape(name)            # template also hardcodes some URL escapes
    # no longer need to strip 'cgi/' here: used in URL query, not raw link
    nameurl = quote_plus(name)              # URL-escape this: added to query in template 

    footer = open(FOOTER).read()            # load dummy ASCII generated footer html in ..
    for link in ('HREF', 'href', 'SRC'):    # munge it to add ".." to all nested item refs
        old = '%s="'    % link              # this avoids copying code (see template text)
        new = '%s="../' % link
        footer = footer.replace(old, new)

    for undo in ('mailto', '#'):            # undo up-rerouting for two special-cases 
        new = 'HREF="../%s' % undo          # still beats maintaining copied code...
        old = 'HREF="%s'    % undo
        footer = footer.replace(new, old)

    reply = template % dict(                # unicode reply: replace template targets
                __NAME__     = namehtml,    # the sent and escaped filename
                __NAMEURL__  = nameurl,     # the filename for raw-text link
                __CODE__     = codehtml,    # the loaded and escaped Unicode text 
                __FOOTER__   = footer)      # the munged dummy generated toolbar html

# Print the reply stream back to the client

# write UTF8-encoded bytes, use "charset" to force Unicode type to match;
# "inline" is always view, but may require cmd/ctl-A+C to save contents;
# "attachment" is usually save, but opens may fail on some platforms, and
# this is just view on others (notably, iOS: there's no user file access);

if not rawmode:
    contenthdr  = 'Content-type: text/html; charset=%s' % UNICODE_OUT
    dispostype  = 'inline' if rawmode == 'view' else 'attachment'
    basename    = os.path.basename(name)
    contenthdr  = 'Content-type: text/plain; charset=%s\n' % UNICODE_OUT
    contenthdr += 'Content-Disposition: %s; filename="%s"' % (dispostype, basename)

replybytes = reply.encode(UNICODE_OUT)      # send encoded bytes: print is iffy

print(contenthdr)                           # reply = hdrs + blankline + html
print('')                                   # need '' for 2.X, else a tuple!
if sys.version[0] == '2':
    sys.stdout.write(replybytes)            # py2 accepts a str for the bytes
    sys.stdout.buffer.write(replybytes)     # py3 stdout is str: use io layer  

[Home] Books Programs Blog Python Author Training Search Email ©M.Lutz