File: genhtml/__docs__/publish-halves.py

PLEASE NOTE: this script is posted here for its genhtml and ziptools
usage examples only, and no longer reflects the version currently in use. 
Its algorithm is much in need of major redesign; a suggested exercise...
----


#!/usr/bin/env python3
"""
================================================================================
publish-halves.py: part of genhtml (with same copyright, author, and license).

   A last-resort option for large sites if Unix "unzip" fails and no  
   recent Python 2.X or 3.X (or other usable unzip tool) is available.

This is a variant of PUBLISH.py that zips and uploads content in halves to 
avoid Unix "unzip" issues when size is over the 2.1G ZIP64 cutoff.  Its
2-file zip step (only) is similar to running 2 Unix zip commands, though
Unix zip may not handle cruft or links as well (also see POSTSCRIPT ahead):

    $ zip -r UNION-1.zip UNION/.htaccess html/[2-n]*
    $ zip -r UNION-2.zip UNION/[o-z]*

Both ziptool's zip-extract.py and the Mac's Archive Utility (Finder clicks)
handle files above this size created by zip-create.py (and hence, Python's
zipfile), but some command-line Unix unzips may not extract correctly unless
split up as done here.  The "unzip" on Mac and Godaddy.com both failed,
though both also have a Python 2.7 that can generally run zip-extract.py
(as long as Python 2.X's zipfiles doesn't bomb on odd Unicode filenames!).

This script is meant only for large files, and hosts with just Unix "unzip" 
and no way to run ziptools' zip_extract.py with a recent Python 3.X or 2.X.
When possible, use PUBLISH.py and tools that allow a single-file unzip;
upload ziptools and run zip_extract.py if a recent Python is available.

Ideally, this script should be size- (not name-) based, and should produce 
as many parts as needed to keep all under the size limit, but this sufficed
for the subject site (2.3G at last count [Feb-18: now 3.36G/3.28G raw/zipped]).

See PUBLISH.py for more general docs omitted here.

----
UPDATE, Feb-2018: fix-readmes.py

Added post-generate fix-readmes.py run step. to copy any local README*.txt
to _README*.txt for broken Apache autoindex pages.  This ensures that the 
local UNION copy matches the remote site (see fix-readmes.py for details).

----
POSTSCRIPT, Feb-2018: versus "zip -s"

It should be noted that some (many?) Unix command-line zips support a "-s"
option which limits zipfiles to a given size, and produces multiple files
if needed.  For example, to zip the UNION folder into two files, UNION.zip
and UNION.z01, neither of which is larger than the 2.1G ZIP64 constraint:

    $ zip -s 2g -r UNION.zip UNION

This works, but it's not as portable (or fun) as a Python solution, and was 
not required when this script was first developed.  It may be easier than 
ad-hoc name divisions if simple name-list halves don't suffice, though this 
script should really split on sizes anyhow (barring a ziptools extension).

The real problem with the Unix zip "-s" option, though, is that its split 
multi-file archives ARE NOT YET SUPPORTED by Unix unzips in common use 
(yes, !!).  Splits work on Mac OS only if all parts are combined as follows:

    $ zip -s- UNION.zip --out single.zip
    $ unzip -d . single.zip

This workaround may or may not work on other platforms - which is one of the
main reasons for coding portable Python alternatives in the first place.
================================================================================
"""

import sys, os, shutil
join = os.path.join

# switches
KEEPDIR = True     # retain zipped union folder for testing?
ZIPDIR  = 0#True   # zip union upload dir into two files?
UPLOAD  = 0#True   # upload zipped file by ftp automatically?

thedir  = 'UNION'             # where final joined content appears
verbose = False               # trace file copies? (else just folders)

homedir = os.getcwd()         # zipfile name generated from thedir here
python  = sys.executable 

def say(msg):  
    print('\n\n' + msg + '\n', flush=True)

def check(stat, msg):
    if stat != 0:
        say('Error: ' + msg); sys.exit()


say('Generating sites---------------------------------------------------------')

# this step is unchanged: genhtml all parts 

# edit me
PARTS = ['Books', 'Programs', 'Posts', 'Author', 'Training', 'OldMain', 'Class']
GENER = [part for part in PARTS if part not in ['Class']]   # or set() - set()

for gendir in GENER:
    say('Generating ' + gendir)
    os.chdir(join(gendir, 'Current'))
    stat = os.system('%s /MY-STUFF/Code/genhtml/genhtml.py' % python)   # works on '.'
    check(stat, 'genhtml failed')
    os.chdir(homedir)


say('Collecting sites---------------------------------------------------------')

# this step is unchanged: make a single union dir

FROMS = [(join(part, 'Current', 'Complete') if part in GENER else part) 
                    for part in PARTS]

# favicon.ico, .htaccess => in Books only 
DUPOK = ('PythonPowered.gif', '.DS_Store', '_main.css')   # zip drops cruft later!

if os.path.exists(thedir):
    shutil.rmtree(thedir)
os.mkdir(thedir)

def copy(item, dest):
    """
    retains original files' modtimes, as does the zip:
    this is a bit gray, but the history can be useful,
    and this avoids full copies on incremental backups;
    NOTE: this follows any symlinks, and copies what 
    they reference (not the links); fix if it matters;
    """
    if os.path.isdir(item):
        shutil.copytree(item, dest)    # does copy2() = content + stat
    else:
        shutil.copyfile(item, dest)    # content only
        shutil.copystat(item, dest)    # retain modtime and mode (permissions)

# merge all into site's root folder
for root in FROMS: 
    print('\nCopying part', root)
    for item in os.listdir(root):
        if item in os.listdir(thedir) and item not in DUPOK:
            print('\tDuplicate item: %s in %s' % (item, root))   # and fix!
            sys.exit()
        else:
            if verbose: print('\tCopying %s from %s' % (item, root))
            copy(join(root, item), join(thedir, item))

# Feb-2018: fix READMEs for Apache autoindex pages (all "Add": UNION rebuilt)
print('\nFixing READMEs in', thedir)
stat = os.system('python fix-readmes.py ' + thedir)
check(stat, 'fix-readmes failed')    # not critical, but should fix


say('Zipping sites to zipfiles------------------------------------------------')

# this step is changed: make two half zips by listing names

halves = {}
if not ZIPDIR:
    print('--Skipping union dir zips--')
else:
    # run zip in union dir with source="*" so can unzip in site root directly,
    # else must unzip to temp folder and move all items (or dir) on server;
    # "zip-create.py zipfile folder" records items as nested in a folder
    # instead: that requires a post-zip move, but may be arguably safer;
    # nit: the os.system assumes names are quoted or have no spaces in them;
    #
    # Feb18: len(all) // 2 now leaves half2 @ 2.3g; use larger half1 ratio;
    # could use sizes (not names), but may be better handled in ziptools;
    # manually quote filenames (no shell), or use shlex.quote() in py 3.3+;

    os.chdir(thedir)
    #extras = ('.htaccess')   # not in shell * expansion (don't care here)
    ziphome = '/MY-STUFF/Code/mergeall/test/ziptools'   # edit me
    all = os.listdir('.')

    # original: list halves
    #mid = len(all) // 2                                     # by name, evenly

    # alternative 1: numeric fudge
    #mid = len(all) // 2                                     # yuck, but sufficient
    #mid = mid + (len(all) // 5) + (len(all) // 30) - 1      # fix me to use sizes?

    # alternative 2: by manual name (edit me)                # also yuck, but better
    mid = all.index('pygadgets-products')                    # pick half2 start point

    halves['half1'], halves['half2'] = all[:mid], all[mid:]
    assert halves['half1'] + halves['half2'] == all

    # bail if no non-hidden items in either (or empty)
    if (not any(not item.startswith('.') for item in halves['half1']) or
        not any(not item.startswith('.') for item in halves['half2'])):
        assert False, 'Cannot split zips into halves'

    for half in sorted(halves):
        print('\nZipping', half)
        thezip = '%s-%s.zip' % (thedir, half)                # build zip name
        quoted = ("'%s'" % item for item in halves[half])    # quote, separate
        items  = ' '.join(quoted)                            # or shlex.quote()
        zipcmd = '%s %s/zip-create.py %s/%s %s -skipcruft' % \
                                (python, ziphome, 
                                 homedir, thezip,  
                                 items)
        stat = os.system(zipcmd)
        check(stat, 'zip-create failed')   # see learning-python.com/ziptools.html

    os.chdir(homedir)

if KEEPDIR:
    print('--Retaining union folder--')
else:
    shutil.rmtree(thedir)   # or keep around for testing


say('Uploading sites zipfiles-------------------------------------------------')

# this step is changed: upload two zipfiles

if not UPLOAD:
    print('--Skipping upload step--')
else:
    import ftplib
    from getpass import getpass

    remotesite = 'learning-python.com'   # edit me
    remotedir  = '.'
    remoteuser = input('User name? ')
    remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))

    for half in sorted(halves):
        thezip = '%s-%s.zip' % (thedir, half)
       
        # upload call is atomic
        zipsize = os.path.getsize(thezip)
        print('Uploading site zipfile %s, %d bytes...' % (thezip, zipsize))

        connection = ftplib.FTP(remotesite)           # connect to FTP site
        connection.login(remoteuser, remotepass)      # log in as user/password
        connection.cwd(remotedir)                     # cd to directory to xfer
        localfile = open(thezip, 'rb')
        connection.storbinary('STOR ' + thezip, localfile)   # xfer zip in binary mode
        localfile.close()
        connection.quit()


say('Done.--------------------------------------------------------------------')

thezips = ' and '.join('%s-%s.zip' % (thedir, half) for half in sorted(halves))

if KEEPDIR:
    print('See the combination site in local folder %s.' % thedir)
if ZIPDIR:
    print('See the zipfiles %s in the local root folder.' % thezips)
if UPLOAD:
    print('Ssh to user@domain and move+unzip %s in the site HTML root folder.' % thezips)