File: cgi/

============================================================================== - submit a site-specific search query to a provider.
Author/Copyright: 2016, M. Lutz (
License: provided freely, but with no warranties of any kind.
Version: January 25, 2016 - log search terms to a server file for metrics.

A Python CGI script: runs on the server, reads form (or URL query) input,
prints HTTP headers + HTML text to the client.  This script builds a query
URL and delegates it to a search provider via an HTTP redirect.

Example - search entire site:
  user inputs = [site=Entire site, term=spam, host=DuckDuckGo]
  search string = "spam"
  URL =

Example - search individual parts (if any):
  user inputs = [site=Books only, term=decorator, host=DuckDuckGo]
  search string = "decorator"
  URL =

Normally invoked by the "action" tag of the form in the HTML page at:
Use your browser's "View Source" to see the form in this HTML page's code.
To use for other sites, edit the HTML select list and the "sites" dict below.

As usual, this script can also be invoked from a browser or script using a
GET-style URL with query parameters at the end like this (all on one line):

Usage note: although search providers can be selected for comparison,
DuckDuckGo, Ixquick, or StartPage are strongly recommended.  Other providers
may insert ads and unrelated photos in results and track searchers, and Google
occasionally disables the Back button.  See the HTML page for more details.

Coding note: uses \n instead of \r\n for line breaks, because all known
clients accept it; print adds a \n by default; and Windows may expand \n
to \r\n automatically, which could change an explicit \r\n to \r\r\n.

Coding note: this script is portable to both Python 2.X and 3.X, but its code
is constrained by the need to run on Python 2.4 -- the most recent Python at
the ISP hosting this site (Godaddy).  OTOH, 2.4 works fine, which begs the
question: were all the Python changes since 2004 really that important?...

Update, June 2017: is now a single site/part; the former
"Books only" in examples is defunct but harmless (it's disabled in the HTML).

TBD: this scheme may or may not support non-ASCII Unicode in search terms. 
While it could encode the redirect URL to something like UTF-8, it's not 
clear that servers would use this in "Location:" lines, even given a content 
line of "Content-type: text/html; charset=UTF8".  Python 2.X's handling of
non-ASCII text in the CGI input stream is also a bit gray.  Resolve me. 

import cgitb; cgitb.enable()   # route python exceptions to browser/client

import cgi, sys, os, time

# fetch url reply, url-encode/decode text
if sys.version[0] == '3':                             # py 3.X/2.X compatible
    from urllib.request import urlopen                # *urlopen not yet used*
    from urllib.parse import quote_plus, unquote_plus
    from urllib import urlopen, quote_plus, unquote_plus

# jan-22/25-16: save search terms to this server file for metrics
SAVETERMSFILE = 'sitesearch-savedterms.txt' 

# testing
MOCK  = False    # True=simulate form/url-query inputs
DEBUG = False    # True=display url without submitting it

# inputs -> url parts
sites = {
    'Entire site':   '',
    'Books only':    '',      # url-encodes '/' ahead  
   #'Training only': ''    # not yet supported

hosts = {
    'DuckDuckGo': '',          # no tracking, ads, or images
    'Google':     '',          # all of the above + Back breaks...
    'Bing':       '',            # ads + images + less fruitful?
    'Ixquick':    '',             # metasearch, no tracking
    'StartPage':  '',           # google results, no tracking
    'Yahoo':      '',        # you be the judge

# get inputs
# from html form fields, url query parameters, or mock-up
if not MOCK:
    form = cgi.FieldStorage()                # parse live form/url input data
    class Mock:                              # simulate form input to test
        def __init__(self, value):
            self.value = value

    form = {'searchsite': Mock('Books only'),
            'searchterm': Mock('"class decorator"'),
            'searchhost': Mock('Google')}

# translate inputs
# missing and invalid keys in url queries trigger cgitb KeyError displays
site = sites[form['searchsite'].value]
host = hosts[form['searchhost'].value]

if 'searchterm' in form:                # but handle missing term: starts empty
    term = form['searchterm'].value     # py 2.4 has no 'a if b else c' expr
    term = ''

# unique action tags for some providers' queries
if host in ('', ''):
    atag = 'search' 
elif host in ('', ''):
    atag = 'do/search'
    atag = ''

# url-encode text input, site '/', label ':'
term = quote_plus(term) 
site = quote_plus(site)                 # '"A/B C:D E"' -> '%22A%2FB+C%3AD+E%22'
pref = quote_plus('site:')              # unquote_plus() reverses this

# fallback option (or via HTTP Refresh header)
manualredirect = """<HTML><HEAD>
<BODY><FONT face=Arial>
<P>Redirecting to search host...
<P>If this fails, please click this link instead:
    <B><A HREF="%s">%s</A></B>

# pass 'term site:xxx' query to providers
searchurl = 'https://%(host)s/%(atag)s?q=%(term)s+%(pref)s%(site)s' % vars()

if not term:
    # error check, custom reply
    print('Content-type: text/plain\n')        # start reply: hdr + blankline=\n
    print('Please provide a search term in field "Search for this:".')
elif DEBUG:
    # disply built link only
    print('Content-type: text/plain\n')        # start reply: hdr + blankline=\n


    # valid and live: redirect client to search provider site/page;
    # hosting site auto generates a 3xx status header if "Location:";
    # print HTML reply too, in case client doesn't redirect (unlikely);
   #print('HTTP/1.1 302 Found')                # added by host if 'Location:'
    print('Location: %s' % searchurl)          # cgi http redirect header
   #print('Connection: close')                 # this seems optional or auto
    print('Content-type: text/html')           # reply = hdrs + blankline + html
    print('')                                  # need '' for 2.X, else a tuple!
    print(manualredirect % ((searchurl,) * 3))
    # jan-22/25-16: save search term (pre-encode) to flat file on server;
    # open with exclusive lock for possibly-concurent web access;
    # encode str to bytes for Py 3.X, no-op on Py 2.X unless non-ascii;
        filename = SAVETERMSFILE
        message = '[%s] [%s]' % (time.asctime(), unquote_plus(term))   # jan25
        if not os.path.exists(filename):
            open(filename, 'w').close()    # make file first time
        fd =, os.O_EXCL | os.O_APPEND | os.O_WRONLY)
        line = (message + os.linesep).encode('utf8')
        os.write(fd, line)
        pass   # neither server nor client care if this fails

[Home] Books Programs Blog Python Author Training Search Email ©M.Lutz