PLEASE NOTE: this script is posted here for its genhtml and ziptools usage examples only, and no longer reflects the version currently in use. Its algorithm is much in need of major redesign; a suggested exercise... ---- #!/usr/bin/env python3 """ ================================================================================ publish-halves.py: part of genhtml (with same copyright, author, and license). A last-resort option for large sites if Unix "unzip" fails and no recent Python 2.X or 3.X (or other usable unzip tool) is available. This is a variant of PUBLISH.py that zips and uploads content in halves to avoid Unix "unzip" issues when size is over the 2.1G ZIP64 cutoff. Its 2-file zip step (only) is similar to running 2 Unix zip commands, though Unix zip may not handle cruft or links as well (also see POSTSCRIPT ahead): $ zip -r UNION-1.zip UNION/.htaccess html/[2-n]* $ zip -r UNION-2.zip UNION/[o-z]* Both ziptool's zip-extract.py and the Mac's Archive Utility (Finder clicks) handle files above this size created by zip-create.py (and hence, Python's zipfile), but some command-line Unix unzips may not extract correctly unless split up as done here. The "unzip" on Mac and Godaddy.com both failed, though both also have a Python 2.7 that can generally run zip-extract.py (as long as Python 2.X's zipfiles doesn't bomb on odd Unicode filenames!). This script is meant only for large files, and hosts with just Unix "unzip" and no way to run ziptools' zip_extract.py with a recent Python 3.X or 2.X. When possible, use PUBLISH.py and tools that allow a single-file unzip; upload ziptools and run zip_extract.py if a recent Python is available. Ideally, this script should be size- (not name-) based, and should produce as many parts as needed to keep all under the size limit, but this sufficed for the subject site (2.3G at last count [Feb-18: now 3.36G/3.28G raw/zipped]). See PUBLISH.py for more general docs omitted here. ---- UPDATE, Feb-2018: fix-readmes.py Added post-generate fix-readmes.py run step. to copy any local README*.txt to _README*.txt for broken Apache autoindex pages. This ensures that the local UNION copy matches the remote site (see fix-readmes.py for details). ---- POSTSCRIPT, Feb-2018: versus "zip -s" It should be noted that some (many?) Unix command-line zips support a "-s" option which limits zipfiles to a given size, and produces multiple files if needed. For example, to zip the UNION folder into two files, UNION.zip and UNION.z01, neither of which is larger than the 2.1G ZIP64 constraint: $ zip -s 2g -r UNION.zip UNION This works, but it's not as portable (or fun) as a Python solution, and was not required when this script was first developed. It may be easier than ad-hoc name divisions if simple name-list halves don't suffice, though this script should really split on sizes anyhow (barring a ziptools extension). The real problem with the Unix zip "-s" option, though, is that its split multi-file archives ARE NOT YET SUPPORTED by Unix unzips in common use (yes, !!). Splits work on Mac OS only if all parts are combined as follows: $ zip -s- UNION.zip --out single.zip $ unzip -d . single.zip This workaround may or may not work on other platforms - which is one of the main reasons for coding portable Python alternatives in the first place. ================================================================================ """ import sys, os, shutil join = os.path.join # switches KEEPDIR = True # retain zipped union folder for testing? ZIPDIR = 0#True # zip union upload dir into two files? UPLOAD = 0#True # upload zipped file by ftp automatically? thedir = 'UNION' # where final joined content appears verbose = False # trace file copies? (else just folders) homedir = os.getcwd() # zipfile name generated from thedir here python = sys.executable def say(msg): print('\n\n' + msg + '\n', flush=True) def check(stat, msg): if stat != 0: say('Error: ' + msg); sys.exit() say('Generating sites---------------------------------------------------------') # this step is unchanged: genhtml all parts # edit me PARTS = ['Books', 'Programs', 'Posts', 'Author', 'Training', 'OldMain', 'Class'] GENER = [part for part in PARTS if part not in ['Class']] # or set() - set() for gendir in GENER: say('Generating ' + gendir) os.chdir(join(gendir, 'Current')) stat = os.system('%s /MY-STUFF/Code/genhtml/genhtml.py' % python) # works on '.' check(stat, 'genhtml failed') os.chdir(homedir) say('Collecting sites---------------------------------------------------------') # this step is unchanged: make a single union dir FROMS = [(join(part, 'Current', 'Complete') if part in GENER else part) for part in PARTS] # favicon.ico, .htaccess => in Books only DUPOK = ('PythonPowered.gif', '.DS_Store', '_main.css') # zip drops cruft later! if os.path.exists(thedir): shutil.rmtree(thedir) os.mkdir(thedir) def copy(item, dest): """ retains original files' modtimes, as does the zip: this is a bit gray, but the history can be useful, and this avoids full copies on incremental backups; NOTE: this follows any symlinks, and copies what they reference (not the links); fix if it matters; """ if os.path.isdir(item): shutil.copytree(item, dest) # does copy2() = content + stat else: shutil.copyfile(item, dest) # content only shutil.copystat(item, dest) # retain modtime and mode (permissions) # merge all into site's root folder for root in FROMS: print('\nCopying part', root) for item in os.listdir(root): if item in os.listdir(thedir) and item not in DUPOK: print('\tDuplicate item: %s in %s' % (item, root)) # and fix! sys.exit() else: if verbose: print('\tCopying %s from %s' % (item, root)) copy(join(root, item), join(thedir, item)) # Feb-2018: fix READMEs for Apache autoindex pages (all "Add": UNION rebuilt) print('\nFixing READMEs in', thedir) stat = os.system('python fix-readmes.py ' + thedir) check(stat, 'fix-readmes failed') # not critical, but should fix say('Zipping sites to zipfiles------------------------------------------------') # this step is changed: make two half zips by listing names halves = {} if not ZIPDIR: print('--Skipping union dir zips--') else: # run zip in union dir with source="*" so can unzip in site root directly, # else must unzip to temp folder and move all items (or dir) on server; # "zip-create.py zipfile folder" records items as nested in a folder # instead: that requires a post-zip move, but may be arguably safer; # nit: the os.system assumes names are quoted or have no spaces in them; # # Feb18: len(all) // 2 now leaves half2 @ 2.3g; use larger half1 ratio; # could use sizes (not names), but may be better handled in ziptools; # manually quote filenames (no shell), or use shlex.quote() in py 3.3+; os.chdir(thedir) #extras = ('.htaccess') # not in shell * expansion (don't care here) ziphome = '/MY-STUFF/Code/mergeall/test/ziptools' # edit me all = os.listdir('.') # original: list halves #mid = len(all) // 2 # by name, evenly # alternative 1: numeric fudge #mid = len(all) // 2 # yuck, but sufficient #mid = mid + (len(all) // 5) + (len(all) // 30) - 1 # fix me to use sizes? # alternative 2: by manual name (edit me) # also yuck, but better mid = all.index('pygadgets-products') # pick half2 start point halves['half1'], halves['half2'] = all[:mid], all[mid:] assert halves['half1'] + halves['half2'] == all # bail if no non-hidden items in either (or empty) if (not any(not item.startswith('.') for item in halves['half1']) or not any(not item.startswith('.') for item in halves['half2'])): assert False, 'Cannot split zips into halves' for half in sorted(halves): print('\nZipping', half) thezip = '%s-%s.zip' % (thedir, half) # build zip name quoted = ("'%s'" % item for item in halves[half]) # quote, separate items = ' '.join(quoted) # or shlex.quote() zipcmd = '%s %s/zip-create.py %s/%s %s -skipcruft' % \ (python, ziphome, homedir, thezip, items) stat = os.system(zipcmd) check(stat, 'zip-create failed') # see learning-python.com/ziptools.html os.chdir(homedir) if KEEPDIR: print('--Retaining union folder--') else: shutil.rmtree(thedir) # or keep around for testing say('Uploading sites zipfiles-------------------------------------------------') # this step is changed: upload two zipfiles if not UPLOAD: print('--Skipping upload step--') else: import ftplib from getpass import getpass remotesite = 'learning-python.com' # edit me remotedir = '.' remoteuser = input('User name? ') remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite)) for half in sorted(halves): thezip = '%s-%s.zip' % (thedir, half) # upload call is atomic zipsize = os.path.getsize(thezip) print('Uploading site zipfile %s, %d bytes...' % (thezip, zipsize)) connection = ftplib.FTP(remotesite) # connect to FTP site connection.login(remoteuser, remotepass) # log in as user/password connection.cwd(remotedir) # cd to directory to xfer localfile = open(thezip, 'rb') connection.storbinary('STOR ' + thezip, localfile) # xfer zip in binary mode localfile.close() connection.quit() say('Done.--------------------------------------------------------------------') thezips = ' and '.join('%s-%s.zip' % (thedir, half) for half in sorted(halves)) if KEEPDIR: print('See the combination site in local folder %s.' % thedir) if ZIPDIR: print('See the zipfiles %s in the local root folder.' % thezips) if UPLOAD: print('Ssh to user@domain and move+unzip %s in the site HTML root folder.' % thezips)