File: genhtml/__docs__/publish-halves.py
PLEASE NOTE: this script is posted here for its genhtml and ziptools
usage examples only, and no longer reflects the version currently in use.
Its algorithm is much in need of major redesign; a suggested exercise...
----
#!/usr/bin/env python3
"""
================================================================================
publish-halves.py: part of genhtml (with same copyright, author, and license).
A last-resort option for large sites if Unix "unzip" fails and no
recent Python 2.X or 3.X (or other usable unzip tool) is available.
This is a variant of PUBLISH.py that zips and uploads content in halves to
avoid Unix "unzip" issues when size is over the 2.1G ZIP64 cutoff. Its
2-file zip step (only) is similar to running 2 Unix zip commands, though
Unix zip may not handle cruft or links as well (also see POSTSCRIPT ahead):
$ zip -r UNION-1.zip UNION/.htaccess html/[2-n]*
$ zip -r UNION-2.zip UNION/[o-z]*
Both ziptool's zip-extract.py and the Mac's Archive Utility (Finder clicks)
handle files above this size created by zip-create.py (and hence, Python's
zipfile), but some command-line Unix unzips may not extract correctly unless
split up as done here. The "unzip" on Mac and Godaddy.com both failed,
though both also have a Python 2.7 that can generally run zip-extract.py
(as long as Python 2.X's zipfiles doesn't bomb on odd Unicode filenames!).
This script is meant only for large files, and hosts with just Unix "unzip"
and no way to run ziptools' zip_extract.py with a recent Python 3.X or 2.X.
When possible, use PUBLISH.py and tools that allow a single-file unzip;
upload ziptools and run zip_extract.py if a recent Python is available.
Ideally, this script should be size- (not name-) based, and should produce
as many parts as needed to keep all under the size limit, but this sufficed
for the subject site (2.3G at last count [Feb-18: now 3.36G/3.28G raw/zipped]).
See PUBLISH.py for more general docs omitted here.
----
UPDATE, Feb-2018: fix-readmes.py
Added post-generate fix-readmes.py run step. to copy any local README*.txt
to _README*.txt for broken Apache autoindex pages. This ensures that the
local UNION copy matches the remote site (see fix-readmes.py for details).
----
POSTSCRIPT, Feb-2018: versus "zip -s"
It should be noted that some (many?) Unix command-line zips support a "-s"
option which limits zipfiles to a given size, and produces multiple files
if needed. For example, to zip the UNION folder into two files, UNION.zip
and UNION.z01, neither of which is larger than the 2.1G ZIP64 constraint:
$ zip -s 2g -r UNION.zip UNION
This works, but it's not as portable (or fun) as a Python solution, and was
not required when this script was first developed. It may be easier than
ad-hoc name divisions if simple name-list halves don't suffice, though this
script should really split on sizes anyhow (barring a ziptools extension).
The real problem with the Unix zip "-s" option, though, is that its split
multi-file archives ARE NOT YET SUPPORTED by Unix unzips in common use
(yes, !!). Splits work on Mac OS only if all parts are combined as follows:
$ zip -s- UNION.zip --out single.zip
$ unzip -d . single.zip
This workaround may or may not work on other platforms - which is one of the
main reasons for coding portable Python alternatives in the first place.
================================================================================
"""
import sys, os, shutil
join = os.path.join
# switches
KEEPDIR = True # retain zipped union folder for testing?
ZIPDIR = 0#True # zip union upload dir into two files?
UPLOAD = 0#True # upload zipped file by ftp automatically?
thedir = 'UNION' # where final joined content appears
verbose = False # trace file copies? (else just folders)
homedir = os.getcwd() # zipfile name generated from thedir here
python = sys.executable
def say(msg):
print('\n\n' + msg + '\n', flush=True)
def check(stat, msg):
if stat != 0:
say('Error: ' + msg); sys.exit()
say('Generating sites---------------------------------------------------------')
# this step is unchanged: genhtml all parts
# edit me
PARTS = ['Books', 'Programs', 'Posts', 'Author', 'Training', 'OldMain', 'Class']
GENER = [part for part in PARTS if part not in ['Class']] # or set() - set()
for gendir in GENER:
say('Generating ' + gendir)
os.chdir(join(gendir, 'Current'))
stat = os.system('%s /MY-STUFF/Code/genhtml/genhtml.py' % python) # works on '.'
check(stat, 'genhtml failed')
os.chdir(homedir)
say('Collecting sites---------------------------------------------------------')
# this step is unchanged: make a single union dir
FROMS = [(join(part, 'Current', 'Complete') if part in GENER else part)
for part in PARTS]
# favicon.ico, .htaccess => in Books only
DUPOK = ('PythonPowered.gif', '.DS_Store', '_main.css') # zip drops cruft later!
if os.path.exists(thedir):
shutil.rmtree(thedir)
os.mkdir(thedir)
def copy(item, dest):
"""
retains original files' modtimes, as does the zip:
this is a bit gray, but the history can be useful,
and this avoids full copies on incremental backups;
NOTE: this follows any symlinks, and copies what
they reference (not the links); fix if it matters;
"""
if os.path.isdir(item):
shutil.copytree(item, dest) # does copy2() = content + stat
else:
shutil.copyfile(item, dest) # content only
shutil.copystat(item, dest) # retain modtime and mode (permissions)
# merge all into site's root folder
for root in FROMS:
print('\nCopying part', root)
for item in os.listdir(root):
if item in os.listdir(thedir) and item not in DUPOK:
print('\tDuplicate item: %s in %s' % (item, root)) # and fix!
sys.exit()
else:
if verbose: print('\tCopying %s from %s' % (item, root))
copy(join(root, item), join(thedir, item))
# Feb-2018: fix READMEs for Apache autoindex pages (all "Add": UNION rebuilt)
print('\nFixing READMEs in', thedir)
stat = os.system('python fix-readmes.py ' + thedir)
check(stat, 'fix-readmes failed') # not critical, but should fix
say('Zipping sites to zipfiles------------------------------------------------')
# this step is changed: make two half zips by listing names
halves = {}
if not ZIPDIR:
print('--Skipping union dir zips--')
else:
# run zip in union dir with source="*" so can unzip in site root directly,
# else must unzip to temp folder and move all items (or dir) on server;
# "zip-create.py zipfile folder" records items as nested in a folder
# instead: that requires a post-zip move, but may be arguably safer;
# nit: the os.system assumes names are quoted or have no spaces in them;
#
# Feb18: len(all) // 2 now leaves half2 @ 2.3g; use larger half1 ratio;
# could use sizes (not names), but may be better handled in ziptools;
# manually quote filenames (no shell), or use shlex.quote() in py 3.3+;
os.chdir(thedir)
#extras = ('.htaccess') # not in shell * expansion (don't care here)
ziphome = '/MY-STUFF/Code/mergeall/test/ziptools' # edit me
all = os.listdir('.')
# original: list halves
#mid = len(all) // 2 # by name, evenly
# alternative 1: numeric fudge
#mid = len(all) // 2 # yuck, but sufficient
#mid = mid + (len(all) // 5) + (len(all) // 30) - 1 # fix me to use sizes?
# alternative 2: by manual name (edit me) # also yuck, but better
mid = all.index('pygadgets-products') # pick half2 start point
halves['half1'], halves['half2'] = all[:mid], all[mid:]
assert halves['half1'] + halves['half2'] == all
# bail if no non-hidden items in either (or empty)
if (not any(not item.startswith('.') for item in halves['half1']) or
not any(not item.startswith('.') for item in halves['half2'])):
assert False, 'Cannot split zips into halves'
for half in sorted(halves):
print('\nZipping', half)
thezip = '%s-%s.zip' % (thedir, half) # build zip name
quoted = ("'%s'" % item for item in halves[half]) # quote, separate
items = ' '.join(quoted) # or shlex.quote()
zipcmd = '%s %s/zip-create.py %s/%s %s -skipcruft' % \
(python, ziphome,
homedir, thezip,
items)
stat = os.system(zipcmd)
check(stat, 'zip-create failed') # see learning-python.com/ziptools.html
os.chdir(homedir)
if KEEPDIR:
print('--Retaining union folder--')
else:
shutil.rmtree(thedir) # or keep around for testing
say('Uploading sites zipfiles-------------------------------------------------')
# this step is changed: upload two zipfiles
if not UPLOAD:
print('--Skipping upload step--')
else:
import ftplib
from getpass import getpass
remotesite = 'learning-python.com' # edit me
remotedir = '.'
remoteuser = input('User name? ')
remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))
for half in sorted(halves):
thezip = '%s-%s.zip' % (thedir, half)
# upload call is atomic
zipsize = os.path.getsize(thezip)
print('Uploading site zipfile %s, %d bytes...' % (thezip, zipsize))
connection = ftplib.FTP(remotesite) # connect to FTP site
connection.login(remoteuser, remotepass) # log in as user/password
connection.cwd(remotedir) # cd to directory to xfer
localfile = open(thezip, 'rb')
connection.storbinary('STOR ' + thezip, localfile) # xfer zip in binary mode
localfile.close()
connection.quit()
say('Done.--------------------------------------------------------------------')
thezips = ' and '.join('%s-%s.zip' % (thedir, half) for half in sorted(halves))
if KEEPDIR:
print('See the combination site in local folder %s.' % thedir)
if ZIPDIR:
print('See the zipfiles %s in the local root folder.' % thezips)
if UPLOAD:
print('Ssh to user@domain and move+unzip %s in the site HTML root folder.' % thezips)