Programs Post-Release Updates

This page is a supplemental resource for the application programs and other software available on this site. Initially posted after the first major applications release on June 15, 2017, this page hosts both:

Why this page? Although each application program comes with a README.txt file that describes known issues and workarounds at the time of its latest publication, new items are bound to arise over time in any nontrivial software. This page documents issues uncovered after major releases and hence unmentioned in prior versions' documentation, and logs later software releases. Users are invited to consider this page a virtual docs appendix, and check back for updates over time.

The latest content update here: September 28, 2018.

New Releases

This section announces new releases of published applications and programs. Items here:

Aug-14-2017 Frigcal and PyMailGUI Mac Apps Rerelease: Broken-Pipe Workaround
Sep-28-2017 PyGadgets GUI "Toy Box" Added to the Applications Lineup
Oct-5-2017 PyGadgets Rerelease: PyClock Optimization + Hand Redraws Fix
Oct-17-2017 New Industrial-Strength Version of the tagpix Photos Organizer
Nov-1-2017 PyGadgets Rerelease: PyPhoto Mod for Single-File Thumbnails Cache
Dec-9-2017 Mergeall Rerelease: Folder Modtimes, Linux Flushes, Scripts "-u"
Apr-28-2018 All Apps' User Guides: New Mobile-Friendly Versions Online
Sep-28-2018 PyGadgets Rerelease: PyPhoto Image Auto-Rotation + Pillow Fixes

 

Frigcal and PyMailGUI Mac Apps Rerelease: Broken-Pipe Workaround (Aug-14-2017)

The Mac app packages of the Frigcal and PyMailGUI programs were rereleased on August 14, 2017, to address a rare output error that likely stems from an issue in the Mac's app-launcher system. Users of prior releases of the Mac app versions of Frigcal and PyMailGUI are encouraged to fetch and install the new versions; see your program's main README file for upgrade pointers. The source-code packages of these two programs were also rereleased with the fix's code, but just as examples for developers; the error occurs only when these two programs are run as Mac apps—not when they are run as source code on Macs, and never on other platforms. No other program packages were updated.

The complete description of the fix applied to Frigcal and PyMailGUI apps is off-page here, because it is mostly of interest to developers. In short, the main GUI in these two programs runs as a child process spawned by a launcher GUI. Likely due to a misfeature or bug in the Mac's app-launching system, it was not impossible that printed console output generated by these apps' main GUIs could eventually trigger a broken-pipe exception after the launcher GUI exited. Though rare, generally harmless, and an issue in Mac apps only, these failures would manifest as GUI popup error messages naming "Broken Pipe" as cause. The only observed consequence to date was the need to rerun a calendar save operation in Frigcal just once in nearly one year of daily usage.

[Back to Index]

 

PyGadgets GUI "Toy Box" Added to the Applications Lineup (Sep-28-2017)

PyGadgets, a set of 4 smaller GUI programs, was released as an addition to the applications available on this site. Its main page lives here, and its entry on the main programs page is here. Though less broadly focused than the other applications here, the PyGadgets toolset—a calculator, clock, photo viewer, and game—are both potentially useful and educational; work the same on all major desktop platforms; and are available as both stand-alone executables and full source code. Note that this was a separate, additional package release; no other application packages were updated.

[Back to Index]

 

PyGadgets Rerelease: PyClock Optimization + Hand Redraws Fix (Oct-5-2017)

PyGadgets was rereleased (all its packages) to fix a minor defect in PyClock, which delayed redraws of the analog display's minute and hour hands too long in some contexts. These two hands are normally not redrawn until the second hand reaches 12, as a valid and important optimization. This delay can be problematic, though, if used after any state that precludes analog clock updates — including window minimization, digital-display mode, system suspend, and menu and modal-dialog view on some platforms. In all such cases, the analog time might not be current or correct until the second hand again reaches 12. To avoid this lag, PyClock now monitors the last-analog-redraw time to force all hands to be redrawn immediately in all these contexts.

As a related optimization, PyClock now also avoids updating the analog display's AM/PM label every second, just like the minute and hour hands. This seems to have further reduced the memory leakage that occurs while the analog display window is open on Macs: open-window growth is now just 1M per 20 minutes, which translates to 3M/hour, 72M/day, and 500M/week. This is roughly half what it was formerly — a compelling reason to retain the prior paragraph's optimization — and some popular Mac apps use substantially more memory over time (see your Mac Activity Monitor). Oddly, the digital display now leaks memory fastest, but is likely lesser-used; optimizing it remains a low-priority TBD.

Also changed PyCalc to automatically set the focus on new "cmd" popups' entry fields (which also shifts focus to their windows on Windows), and added a Mac All Desktops tip (below) to the README.

[Back to Index]

 

New Industrial-Strength Version of the tagpix Photos Organizer (Oct-17-2017)

tagpix—a command-line script used to normalize photo collections—has been released in a much-enhanced version, with new support for duplicates resolution, error recovery, by-year grouping, and much more. For the full story, including a list of changes in this version, see the new User Guide. To download a copy, visit the tagpix web page.

Update: tagpix was released again in 2018 with new support for copy-based file transfer modes, customizable subfolder skips, prior-result-deletion verifications, and improved duplicates handling. For more details, see its release notes.

[Back to Index]

 

PyGadgets Rerelease: PyPhoto Mod for Single-File Thumbnails Cache (Nov-1-2017)

PyGadgets was rereleased (all its packages) to include a new version of PyPhoto, which stores a folder's thumbnail images in a single pickle file, instead of individual image files in a subfolder. This new design requires no more space or time, but avoids extra files (15k images formerly meant 15k thumbnail files); multiple file loads and saves; and nasty modtime-copy issues for backup programs too rare and complex to cover here. See the source code for full details.

This is a mildly backward-incompatible change. Prior PyGadgets/PyPhoto version users: when upgrading to a newer release, run the included delete-pyphoto2.0-thumbs-folders.py script (or its frozen executable) from a command line to delete all former PyPhoto thumbnail subfolders. This program is run with no command line arguments, and asks for a folder path and delete verifications. In the Mac app, it's an executable at PyGadgets.app/Contents/MacOS (see Show Package Contents); in other packages, it's in your install folder. PyPhoto will still work if you don't delete the former subfolders, but they will be unused trash.

Apart from this compatibility fix, the main user-visible artifact of this upgrade is a single _PyPhoto-thumbs.pkl file per opened folder, instead of the former thumbs subfolder. This thumbnail cache is still built on first open and auto-updated on changes as before, to make later opens fast. Though less prominent, the new PyPhoto also displays images in the same order everywhere; uses a placeholder thumbnail for photos with errors (instead of omitting); and works on unwriteable folders (though slowly).

PyPhoto now also supports a new NoThumbChanges configuration-file setting and command-line argument, which can be used to prevent rare but spurious thumbnail regenerations for large, static archives, when file modification times are skewed between platforms or filesystems. See PyGadgets_configs.py. This setting's False default need not be changed in typical usage. For example, PyPhoto photo archives and thumb caches have been seen to work correctly without this change when used on a single platform, burned to BD-R discs, or transferred between Mac OS and Windows on exFAT drives.

This release also applied a minor change to PyCalc, to allow fractional floating-point numbers to be entered with a leading . instead of 0.. To halve a number, for instance, 24 * .5 and 24 * 0.5 both now work. Numbers with E exponents allow both forms too: .1E-99 or 0.1E-99.

[Back to Index]

 

Mergeall Rerelease: Folder Modtimes, Linux Flushes, Scripts "-u" (Dec-9-2017)

Version 3.1 of the Mergeall content backup and propagation application includes multiple minor enhancements for all download packages, and is a recommended upgrade for all users (be sure to save and restore your mergeall_configs.py customizations). In this release:

For the full story on these changes, see the release notes.

[Back to Index]

 

PyGadgets Rerelease: PyPhoto Image Auto-Rotation + Pillow Fixes (Sep-28-2018)

The PyPhoto image-viewer GUI program included in the PyGadgets bundle has been updated to include three enhancements, the first two which were borrowed from the thumbspage image-gallery builder. The new PyPhoto:

This PyPhoto upgrade is now available in all PyGadgets download packages. Users of the prior PyPhoto version should delete their _PyPhoto-thumbs.pkl cache files in image folders to enable the image auto-rotation enhancement. Otherwise, images will rotate when viewed, but already-created thumbnails will remain askew till deleted and recreated.

For more details, see the version 2.2 release notes in code files here and here

[Back to Index]

New Usage Pointers

This section collects usage pointers for published applications that arose after releases, and hence are not covered in earlier releases' documentation. Items here:

Jun-2017 All Apps: Ignore First-Run Warnings
Jun-2017 Windows Exes: Don't Install to "C:\Program Files"
Jun-2017 Mac Apps and Source: Homebrew Tk 8.6 is DOA
Aug-2017 PyEdit Auto-Saves: Converting from UTF-8 Encoding
Aug-2017 Mac Apps: Avoid Dock-Menu Zombies with 3-Finger Downswipes
Aug-2017 Mac Apps: Avoid Installing to Desktop if Apps Fail?
Sep-2017 PyEdit RunCode: Package Imports May Require __init__.py Files
Sep-2017 Mac Apps: Assign to All Desktops for Quick Access Everywhere
Oct-2017 PyEdit on Mac: Restart if Memory Use Grows Too High
Oct-2017 Mergeall: Mac OS Sierra's Finder Hides ".DS_Store" Files
Nov-2017 Mergeall: Tailing Redirected Output of the diffall Utility
Dec-2017 Mergeall: Unzipped Files May Trigger Differences and Copies
Mar-2018 PyEdit: Dropping the BOM in Unicode Files
Mar-2018 PyEdit: More About Unicode Encoding Defaults
Apr-2018 All Apps: User Guides Mobile-Friendly Makeover
May-2018 Mac Source: New Python Installers Include Tk 8.6
Aug-2018 PyGadgets on Mac: PyPhoto Source, "Too many open files"
Sep-2018 showcode and PyEdit: The Perils of Mixed Unicode Encodings
Sep-2018 Mergeall on Android: It Runs — but It Doesn't Work

 

All Apps: Ignore First-Run Warnings (Jun-2017)

On some Mac OS X systems, and on Windows 7, 8, and 10, you may get a warning when first trying to use this site's applications, because of defaults regarding unverified sources — arguably overkill at best, and a step towards proprietary lockdown at worst. You can safely ignore these, but may have to approve a program the first time you use it; see the warning popup for more details. On Macs, for example, Open in the 2-finger-press or control-click menu approves a program quickly and permanently (and may be faster than opening the security-preferences screen).

This inconvenience is regrettable, but this site's proprietor is an independent developer who does not work for Apple or Microsoft, and has no interest in the supplication inherent in program registration. Some web browsers can be Orwellian about zip files too; and Windows 10 S, unfortunately, is right out.

[Back to Index]

 

Windows Exes: Don't Install to "C:\Program Files" (Jun-2017)

On Windows, you should generally avoid saving unzipped exe folders in C:\Program Files because neither you nor programs may have permission to save files there. The current program READMEs suggest this location as one possibility, but this can lead to issues. For instance, using that folder can complicate config-file edits (you may need to run editors as administrator), and can even prevent the Frigcal launcher from closing (it won't find a sentinel file because one cannot be written in its install folder). To avoid such issues, save your unzipped exe folders to your Desktop or elsewhere instead.

[Back to Index]

 

Mac Apps and Source: Homebrew Tk 8.6 is DOA (Jun-2017..May-2018)

On Mac OS X, the apps have not been tested or built with Python 3.X and Tk 8.6 from Homebrew (a leading alternative distribution that supports the newer Tk) because the Homebrew install is currently broken—Python and Tk build correctly, but crash immediately with an "Abort trap: 6." This is a widespread issue that impacts both app builds and source-code use, and makes it difficult to explore possible fixes for Tk 8.5 Mac issues (e.g., Dock zombies and scroll speed). Given the requirements of a manual build, this effectively puts further research on hold.

You can read others' reports about this issue here and here. The latter includes a curt refusal to fix from the project, delegating the job to impacted users. No, really. It's not clear where this bug lies, but projects that publish a product clearly have some responsibility for that product. That is: broken + rude = punt; Homebrew is currently neither viable nor recommended. Hopefully, python.org's Mac Python3 will support Tk 8.6 soon.

Update: as of August 7, it appears that this critical Homebrew Tk 8.6 bug may have been fixed, per later posts on the GitHub thread. Apparently, an early release of the next version of Tk was required. This is good news if true (and the Mac apps here may be revisited soon), but the outright crash on startup doesn't exactly instill confidence in Homebrew and/or its Tk on Macs going forward. To be fair, though, the Mac's rich user interface is stateful enough to pose challenges to any GUI toolkit.

Update: after testing Homebrew Python 3.6 + Tk 8.6 on Mac OS Sierra in September, 2017, it now appears that Tk 8.6 on Mac is a non-starter. It does indeed fix the Dock menu zombies problem of ActiveState's Tk 8.5. But 8.6 also:

Alas, Tk on Mac is not always all it should be. It's possible to use it for programs like those on this site, but this requires substantial workarounds (of the duct-tape-and-twine sort), and the resulting programs have to live with a set of defects that varies per Mac Tk release. If you're developing commercial-grade GUIs, see PyQt for one portable alternative, and PyObjC for a non-portable option; and mind the pitfalls inherent in development under the open-source "batteries included" banner, and its proprietary cousins.

Update: as of spring 2018, python.org now offers Python 2.7 and 3.6 installers for Mac OS 10.9 and later that include Tk 8.6. This is welcome news and obviates the need to install Tk separately from third-party sources, though early testing suggests that the new install may pose issues and bugs of its own. See the details ahead.

[Back to Index]

 

PyEdit Auto-Saves: Converting from UTF-8 Encoding (Aug-2017)

A technical note for PyEdit users only: as documented in PyEdit's UserGuide.html and textConfig.py, its auto-save feature always writes text with unsaved changes to files using the general UTF-8 Unicode encoding scheme. This is necessary, because there may be no known encoding (e.g., the text may not yet have been saved to a file), and a known encoding may fail (e.g., Unicode symbols may be inserted into the text of a file originally opened as ASCII, precluding ASCII encoding on saves).

This UTF-8 policy may cause issues, however, for HTML files that declare a different encoding explicitly using <meta> tags. If you must recover such a file from the auto-saves folder, you can either change its <meta> tag to declare UTF-8, or convert its text back to the original encoding. For the latter, you can easily restore the original encoding by:

  1. Opening the auto-saved file in PyEdit as UTF-8
  2. Clicking the File=>Save As menu action (or typing its control/Alt-s accelerator)
  3. Entering your desired Unicode encoding name in the popup issued in the GUI

Naturally, this assumes your text is still compatible with the encoding you enter (else, PyEdit will generally fall back on UTF-8 again), and be sure to use "Save As" ("Save" silently uses the encoding provided on "Open" if the text came from a file). This issue is both rare and subtle, but unavoidable in files with explicit and usage-specific encoding declarations that may diverge from actual content. For more encoding-conversion options, see savesUseKnownEncoding in textConfig.py, and the command-line conversion utility script unicodemod.

[Back to Index]

 

Mac Apps: Avoid Dock-Menu Zombies with 3-Finger Downswipes (Aug-2017)

As mentioned in the main README files of all the complete applications available on this site, the current releases of the Mac apps can leave zombie entries for closed windows in their Dock menus due to a bug in the underlying Tk 8.5 GUI library used. This remains a to-be-fixed item, pending adoption of a new Tk version (which now seems unlikely; see above).

Fortunately, it turns out that there is a standard and easy way to see the apps' truly active windows anyhow: simply use a 3-finger downswipe on the trackpad (or its control+downarrow keyboard equivalent) to activate the "App Exposé" view of active app windows. This gesture can be performed on any of the app's windows, or on its Dock icon. When run on the Dock icon, this is no more difficult than opening the Dock menu with a 2-finger click, and yields an arguably better and full-screen display that does not include any zombie entries.

This is a standard Mac feature, but may be unknown to some users, and is mentioned only in passing in the apps' READMEs. You may need to enable it once in System Preferences by clicking the App Exposé checkmark. Once enabled, though, this provides a simple way to view an app's open windows, and is immune to Tk 8.5 zombies.

[Back to Index]

 

Mac Apps: Avoid Installing to Desktop if Apps Fail? (Aug-2017)

Similar to the Windows exes note above: one user has reported that the Frigcal Mac app can fail if copied to and run from Desktop on Mac OS X Sierra, because the app does not have permission to write files (Frigcal needs to create an initial calendar file on first run, and a sentinel file on each run). This couldn't be recreated on El Capitan or Sierra machines, may be user-specific, and is outside the apps' scope. If your apps have similar problems on your Desktop, though, copy them to your /Applications folder instead; this has the added advantage of adding the app to Launchpad for quick access.

[Back to Index]

 

PyEdit RunCode: Package Imports May Require __init__.py Files (Sep-2017)

For reasons to be determined, import statements in code run by PyEdit's RunCode can fail if they attempt to import a module package having no __init__.py file. That is, Python 3.3+ namespace packages don't seem to be fully operative in code run by a normal compile()/exec() pair, despite all the run-time context set up by PyEdit's code proxy. This may reflect an anomaly or bug in Python's import machinery (which changes so often as to be fairly accused of thrashing), and may require use of Python's runpy module or similar code.

Barring a future fix, though, the workaround is simple: simply make sure all your package folders to be used in RunCode have an __init__.py, even if it's completely empty. Unless your use case really requires namespace packages (and almost none do), an __init__.py is good and recommended practice anyhow. It makes your package imports more efficient, and your code's structure more explicit. For more on PyEdit's RunCode, see its Tools menu docs. For a primer on 3.3+ namespace packages, try Chapter 24 in Learning Python, 5th Edition.

[Back to Index]

 

Mac Apps: Assign to All Desktops for Quick Access Everywhere (Sep-2017)

Here's another Mac user-level tip that may not be obvious to everyone, but is especially relevant to utility programs like PyGadgets: to have access to an app on every Mac desktop, right-click its Dock app icon, select Options, and choose the Assign To section's All Desktops. Once you do so, single-clicking the minimized program's Dock icon will reopen it on whatever desktop you happen to be viewing at the time.

PyGadgets' calculator and clock, for example, are the sorts of desktop utilities you might want to access occasionally and quickly. Simply open once and set to All Desktops per above, then minimize when not in use, and click the Dock icon to reopen. This both displays an open gadget on every desktop, and reopens a hidden gadget immediately on the current desktop. Perhaps best of all, this avoids the annoying and attention-shattering desktop switches that occur by default when you reopen an app assigned to its single, original desktop.

Two fine points here. First, this works whether your Dock preferences set "Minimize windows into application icon" or not, but the All Desktops setting is in the Dock's application icon. Second, this can also be used for the Frigcal calendar GUI (which also reopens its month image on the current desktop), but may be less desirable for apps like the PyEdit text editor that create many windows or take special actions on Dock clicks (see Programs for both).

[Back to Index]

 

PyEdit on Mac: Restart if Memory Use Grows Too High (Oct-2017)

Though usually not a concern, PyEdit's memory usage on Mac OS might grow high if used for a long time without a restart. The exact cause remains to be isolated, but this seems to occur when using PyEdit's Run Code option to run edited programs; is noticeable only after intense work spanning multiple days; and isn't particularly grievous by Mac standards. The worst case to date saw PyEdit reach 2G memory (from its 36M start) on El Capitan, but it was still #3 on the worst-memory-offenders list at the time, behind both Firefox and WindowServer, and just ahead of Excel. Moreover, almost all of PyEdit's memory space was compressed (not in active use).

Still, if this grows problematic on your machine, the simplest solution is to periodically close all PyEdit windows and restart—an unfortunately common cure for Mac app ills. For a related topic, see the memory leak workarounds in the PyClock program of PyGadgets, covered in its README; though nonfatal, memory issues seem a recurring theme for Tk apps on Macs.

[Back to Index]

 

Mergeall: Mac OS Sierra's Finder Hides ".DS_Store" Files (Oct-2017)

The Mergeall backup/mirroring application goes to great lengths to avoid propagating "cruft" files (platform-specific trash), and in its User Guide points to the numerous .DS_Store hidden files on Mac OS as prime offenders. These files can be pathological on Macs for anyone involved in programming or content production, and were responsible for many of the changes required to support the Mac platform.

As of Mac OS Sierra (10.12), setting your defaults to display hidden files as described in that guide still works as before, but Finder has been special-cased to never display .DS_Store files. That is, the .DS_Store files are still there (and can be seen via a ls -a in Terminal, or an os.listdir() in Python), but Finder will no longer show them to you; even if you ask it to.

You can read more about this curious new Finder policy on the web. This seems the worst of both worlds. Not only does Finder still create these files in every folder you view (changing your folders' modification times in the process), but not displaying them can easily lead to major problems if they wind up being inadvertently uploaded, transferred, or otherwise included with actual content. Pretending a problem doesn't exist is not a valid solution to a problem—especially when users may have to pay the price for the deception!

Luckily, you can still take control of cruft like .DS_Store files with tools like Mergeall and ziptools that callout such items explicitly to help you minimize their impacts. We can also hope that Apple someday finds a better way to record Finder information than dumping it in hidden-but-real .DS_Store files all over your drives. Sadly, this still seems wishful thinking as of the new High Sierra and its oddly mandatory APFS filesystem.

Footnote: also in the oddly column, Mac OS High Sierra abruptly dropped the longstanding and widely used ftp client program, in yet another agendas-versus-customers move. See the web for discussion; in short, secure sftp is still present, but works only for sites that support it, and this doesn't help programs or users that relied on the functionality removed. Alas, open-source software is not the only domain where the whims of the few can rudely trounce the needs of the many. On the upside, a simple Python script can shatter many an Orwellian decree...

Postscript: and in the too-ironic-to-bear department, Mac OS High Sierra also came with a massive security flaw which allowed anyone to gain root access to a machine without a password, and required an emergency overnight patch. But ftp was too risky to ship.

[Back to Index]

 

Mergeall: Tailing Redirected Output of the diffall Utility (Nov-2017)

The Mergeall content backup/propagation program is usually run from its GUI launcher, but can also be run from a command-line, and includes some extra command-line scripts useful for managing archives. The most notable of the extras may be diffall, a program which does a byte-by-byte comparison of everything in two folder trees, as described in Mergeall's User Guide.

Because diffall can run for a long time on large trees, it's convenient on Unix to run it in the background and monitor its output file with a tail using command-lines like the following (typed in Terminal on Mac OS):

~$ python3 diffall.py /MY-STUFF /MY-COPY -skipcruft > Desktop/temp.txt &
~$ tail -f Desktop/temp.txt 

That works on Mac OS's El Capitan release, but not quite on its High Sierra. For reasons that aren't clear, when redirected to a file, Python 3.5's stdout stream—the target for basic print() calls—is not buffered (or not buffered as much) on the former, but is fully buffered on the latter. Hence the tail may not show anything for quite some time on High Sierra, and even then, will print only in spurts. Technically, El Capitan may buffer stdout too, but its buffer blocksize may be so small that its output appears regularly, while High Sierra's does not.

To make printed text show up in the output file immediately on both Mac OS versions (as well as other Unix-like platforms), pass Python's -u unbuffered flag in the first command above:

~$ python3 -u diffall.py /MY-STUFF /MY-COPY -skipcruft > Desktop/temp.txt &

Or, set the equivalent environment variable in your shell (e.g., in ~/.bash_profile) and skip the -u argument in the command line:

~$ vi ~/.bash_profile
export PYTHONUNBUFFERED=1

Either way, this forces Python print() calls to send their output immediately on all platforms, so that it can be watched with a Unix tail. Unfortunately, -u doesn't apply and the environment variable has no effect in the Mac app's frozen diffall executable, so app users will want to grab the source-code version to tail its stdout on platforms where stdout is buffered. This isn't required on El Capitan, because the frozen diffall's stdout is not buffered much there either (though to be fair, it's not clear which systems are broken!).

For more possible-but-unverified ideas, see also this discussion thread. Per preliminary testing, however, its export NSUnbufferedIO=YES suggestion appears to have no effect on the app's frozen diffall.

And if you're willing to change code, you can also reset sys.stdout to an object whose write() method always calls flush(), or, in Python 3.3+ only, use the extended form print(x, flush=True) for all prints. It's not clear that this should be done, though, as buffering is an optimization, and diffall's output can be large (e.g., it's 6MB big and 144K-lines long for an archive with 101K files and 10k folders); if implemented, this should probably be a diffall command-line option. Consider these suggested exercises—until the next Mergeall release (spoiler: it grew its own dedicated -u).

[Back to Index]

 

Mergeall: Unzipped Files May Trigger Differences and Copies (Dec-2017)

Short story: though rare, Mergeall may report unexpected differences for files extracted by unzipping a zip file, due to the odd and inconsistent way unzipping programs handle zipped modification times. There is no complete fix for this, but you can use the same unzip tool each time to lessen impact; allow Mergeall to recopy unzipped files after they are extracted; or avoid including frequently unzipped files in your archive—include their zip file instead.

Details: due to inconsistent handling of file-modification times across the many unzipping tools in use, it is not guaranteed that a given file's times will survive a zip and unzip combination. Just as for FAT32, zip files generally record file times in "local" time, which may be adjusted on unzips for both daylight savings time (DST) and time-zone changes. This can in turn throw off any program that relies on file-modification times, including Mergeall; its change-detection is fully dependent on timestamps.

As discussed in more detail in Mergeall's User Guide, the FAT32 issue can be addressed by using a different file system such as exFAT for cross-platform drives. The unzip issue, however, is much more thorny: an unzipping program may actually modify a file's recorded modification time as it recreates the file, and only for files last modified in a given time zone or DST phase. Hence, the differences reported by Mergeall are real but spurious (timestamps differ even if content does not), and globally adjusting all files' times up or down isn't an option (only a subset of files may have their times changed on extracts).

Perhaps worse, different unzip tools may apply time-adjustment rules differently, precluding an automatic workaround. The ziptools system available at this site, for example, defers to the local-time handling of Python's zipfile and time modules, which has been observed to differ from that of the Archive Manager used by Finder on Mac OS. For more background on this issue, try a web search like this one.

The upshot of all these factors is that Mergeall may report differences and run recopies for arbitrarily many files in an archive after they are re-unzipped from a zip file. This is a rare issue (and has arisen just once in 4 years of regular Mergeall use), but has no absolute fix. It may be minimized by using the same unzipping tool every time for a given set of files (see ziptools for a portable option). Barring this, you'll need to allow Mergeall to recopy the unzipped files that differ after unzips, or avoid keeping their unzipped versions in a Mergeall archive tree in the first place. The latter may be the simplest approach for files that will be unzipped often.

Interestingly, standard zip-file times are also limited to two-second precision just like FAT32, but Mergeall automatically accommodates this thanks to former fixes. Zip files' bizarre munging of time can also impact thumbnail-change management in the PyPhoto gadget, but in this context would simply trigger one-time thumb rebuilds. Other programs may fare worse after unzips.

The real solution here, of course, lies in either abandoning zip files altogether, or standardizing time formats across all computer systems in use today. Given both the popularity of zip and this industry's tendency towards fragmentation and flux, the odds of either solution appearing in our lifetimes seem about as good as those of an open-source project settling on a feature set...

[Back to Index]

 

PyEdit: Dropping the BOM in Unicode Files (Mar-2018)

Short story: if you wish to use PyEdit to edit Unicode text files that begin with a BOM character, be sure to open them with an encoding name that discards the BOM if present (e.g., 'utf-8-sig' for UTF-8, and 'utf-16' for UTF-16). You can also delete their BOMs permanently by opening the same way and saving with an encoding that doesn't add a BOM on output (e.g., 'utf-8' for UTF-8), or removing the BOM in the edit window as it is displayed. PyEdit doesn't add BOMs unless your encodings ask it to, but other editors may insert them automatically. If not accommodated or removed, a BOM will make the first line render oddly and difficult to edit, though the effect varies per platform.

The Issue

To understand this issue, you need to know a bit about one of Unicode's darker corners. In brief, text may start with an identifying marker known as a BOM, in the UTF-8, UTF-16, and UTF-32 encoding schemes. This marker can be used to declare the bit order (little- or big-endian) and encoding scheme of the encoded text that follows.

Widely used UTF-8 files, for example, can begin with a BOM or not. When present, the BOM in such files is a nonprintable Unicode character with code point \ufeff, which is encoded as bytes b'\xef\xbb\xbf'. Because encodings handle BOMs differently, selecting the right one can be crucial. In Python (and Python programs like PyEdit), neither 'utf-8' nor 'utf-8-sig' require a BOM to be present, but only the latter discards a BOM on input and adds one back on output.

It's easy to see this in code. A binary-mode file read always retains an encoded BOM at the front, and text mode gives the BOM's decoded code point unless it is discarded by 'utf-8-sig'. Here's the story for a BOM-laden UTF-8 file in Python 3.X, the version PyEdit uses (codecs.open() works essentially the same for text mode in 2.X, sans endline transforms):

$ python3
>>> b = open('purchase-pointers.html', 'rb').read()
>>> b[:50]
b'\xef\xbb\xbf$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books:'
>>>
>>> t1 = open('purchase-pointers.html', 'r', encoding='utf-8').read()
>>> t2 = open('purchase-pointers.html', 'r', encoding='utf-8-sig').read()
>>>
>>> t1[:10]
'\ufeff$DOCTYPE$'
>>> t2[:10]
'$DOCTYPE$\n'
>>> 
>>> t1[1:] == t2     # Just the added BOM differs
True

This issue is rare, but it cropped up recently in the HTML of a web page edited in PyEdit. Somewhere along the way, a text editor on Windows or Mac OS silently inserted a BOM at the start of the file's UTF-8 content (as usual, Windows Notepad is the prime suspect). The covertly added BOM is harmless in web pages with content-type UTF-8, but causes the file's first line to be munged in PyEdit when opened with its 'utf-8' encoding default.

Specifically—and for reasons known only to the underlying Tk GUI library it uses—PyEdit displays the first line of a file oddly if it begins with a Unicode BOM character not discarded by the encoding used to open it. The BOM's impact, though, varies per PyEdit platform:

On Linux
The effect is mild: the BOM renders as a blank character at the start of the first line, which you can skip over or delete as usual.

On Windows
The effect is moderate: the BOM renders as an invisible character at the start of the first line, which requires an extra right-arrow to skip over. It can, however, be deleted with a normal Delete press at the start of the first line (and a leap of faith...).

On Mac OS
The effect is worst: the first line's content appears to be doubled, and editing it is chaotic at best. The BOM is actually the first character of the duplicate-text line; it can be removed by deleting the first character displayed in the line, but this isn't obvious, and retyping the line may seem the only recourse.

In other words, the BOM is rendered as the first character of the first line—whether you can tell or not. To see what happens on your machine, run code like the following to emulate the BOM-happy policies of editors like Notepad, and open the created file in PyEdit as 'utf-8':

>>> open('spam.txt', 'w', encoding='utf-8-sig').write('spam\nSPAM\n')
10
>>> open('spam.txt', 'r', encoding='utf-8').read()
'\ufeffspam\nSPAM\n'

By contrast, PyEdit never discards or adds BOMs automatically, because it supports the full spectrum of Python Unicode encodings for both opens and saves, as a major distinguishing feature; it could not guess your wishes for BOMs in output, especially if they were stripped; and it refuses to enforce implicit global policies that are invariably incorrect in some contexts eventually. The last point is paramount; to be blunt, the simple-minded policies in other editors are the reason that HTML files sprouted unwanted and error-prone BOMs in the first place!

The Fix

Because explicit beats implicit in programs that you trust with your content, PyEdit expects you to clarify your BOM goals, by either:

Either approach works because 'utf-8-sig' discards a BOM if present on input, and only 'utf-8-sig' adds one back on output. If you go with the first option, be sure to use the correct encoding name on each open; the second option is a one-time delete, after which 'utf-8' will suffice for opens.

You can arrange these combinations in PyEdit's configurations file by either fixing the open and/or save encodings, or having PyEdit ask for them (save's encoding defaults to open's if not fixed; see the end of your install's textConfig.py for more details). Some might even propose that PyEdit should automatically use the 'utf-8-sig' of these schemes for opens and/or saves, but magic is a very slippery slope: implicit BOM deletions and additions seem equally error-prone (and rude); this wouldn't work for people using other encodings like Latin-1; and most PyEdit users can safely ignore the issue altogether and stick with the preset 'utf-8' default.

In fact, if you don't care to deal with encoding names, you can generally accept the default 'utf-8' for both opens and saves, and simply delete any BOM characters as they are displayed in PyEdit, if and when they are added by other editors. The platform-specific renderings above give display details, but a delete at the top of the file suffices for all. This works well, but may not be as intuitive as explicit encoding names.

Either way, the net effect of deleting BOMs in PyEdit is also easy to verify in Python. The following was run after using the PyEdit Open/Save-As combination on the UTF-8 web-page file we met earlier, to save to a "-nobom" BOM-free copy:

>>> b1 = open('purchase-pointers.html', 'rb').read()
>>> b2 = open('purchase-pointers-nobom.html', 'rb').read()
>>> 
>>> b1[:50]
b'\xef\xbb\xbf$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books:'
>>> b2[:50]
b'$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books: Pu'
>>> 
>>> b1[3:] == b2     # Just the dropped BOM differs
True

UTF-8 is common for web pages, but it's not the only offender. UTF-16 and UTF-32 files may also be BOM-ridden, though their encodings work oppositely. In UTF-16, the general 'utf-16' always both discards a BOM on input and adds one on output (like the specific 'utf-8-sig'); but the more specific 'utf-16-le' does neither (like the general 'utf-8'); and ditto for UTF-32. To you, this means 'utf-16' and 'utf-32' generally suffice in PyEdit, because they both strip and restore BOMs in files:

>>> open('spam16.txt', 'w', encoding='utf-16').write('1\n2\n3\n')
6
>>> open('spam16.txt', 'rb').read()
b'\xff\xfe1\x00\n\x002\x00\n\x003\x00\n\x00'
>>> 
>>> open('spam16.txt', 'r', encoding='utf-16').read()
'1\n2\n3\n'
>>> open('spam16.txt', 'r', encoding='utf-16-le').read()
'\ufeff1\n2\n3\n'

Finally, if you want to see which files in a folder tree may be clandestinely harboring BOMs, try something like the following (this code looks for UTF-8 BOMs in all HTML files in the current working directory; tweak as needed):

import os
for (adir, subs, files) in os.walk('.'):
    for file in files:
        if file.endswith(('.htm', '.html')):
            path = os.path.join(adir, file)
            try:
                text = open(path, 'r', encoding='utf8').read()
            except:
                print('Not UTF8:', path)
            else:
                if text[:1] == '\ufeff':         # Or try file.read(1)
                    print('BOM=>', path)         # This file has a BOM 

When run by command line, file click, IDLE, or PyEdit's own Run Code, your output will be similar to this:

Not UTF8: ./lp3e-updates-notes-python.html
Not UTF8: ./lp4e-preface-preview.html
BOM=> ./lp4e-updates-clarifications-first-printing.html
BOM=> ./lp4e-updates-clarifications-recent.html

For more background on the Unicode BOM—including more about its behavior in the UTF-16 and UTF-32 encodings omitted here for space—see the documentation at the top of the unicodemod.py script on this site, as well as the more in-depth coverage in the Advanced Topics part of the book Learning Python. For related tips, also see the Unicode conversion note earlier on this page, and the next note.

[Back to Index]

 

PyEdit: More About Unicode Encoding Defaults (Mar-2018)

Speaking of PyEdit's configurations file: if you look at the Unicode settings near the end of textConfig.py, you'll notice that its fallback and prefill default encoding is sys.getdefaultencoding()—which is Python 3.X's default for string-object methods, and not locale.getpreferredencoding()—which is Python 3.X's default for open().

This is by design, because the former's UTF-8 setting is the same everywhere. If PyEdit used the latter, default file encodings could vary per platform. The net effect would be that people who work across multiple machines with different locale results (e.g., Unix and Windows) might have to remember where each file was last edited in order to provide an encoding that opens it properly! This is a major downside of 3.X's open() defaults, and one more reason that you should use explicit encodings whenever possible.

That said, if you generally work on just one platform and really want to use the locale module's setting (or any other value) as your PyEdit encoding default, it's easy to do so; the configurations file is just a Python module, after all:

import locale
opensEncoding = savesEncoding = locale.getpreferredencoding()

If you use locale and skip the encoding-input dialog, though, please remember that your files' encodings may vary per editing platform. Python 2.X allows the sys module's setting to vary too (it can be changed at start-up, and by dark hackery intentionally omitted here), but Python 3.X, PyEdit's implementation language, makes it more of a constant. For additional coverage of Python 3.X's encoding defaults, see the manuals or the overview in this article.

[Back to Index]

 

All Apps: User Guides Mobile-Friendly Makeover (Apr-2018..Aug-2018)

This update is half release announcement and half usage pointer, so it shows up in both tables on this page. As of April 2018, the user guides of all major apps (i.e., programs) on this site are now mobile-friendly. These guides' content is unchanged, but they have been restyled for viewing on both desktop and mobile browsers and devices. They have also improved in general, if only cosmetically, marginally, and subjectively.

At this writing, the new mobile-friendly versions are currently available online only. The original desktop versions are still shipped in product zipfiles, and are opened by in-program help widgets by default (these are desktop-only programs, after all). The original versions are no longer kept online, as they are prone to fall out of synch with upgrades; see your zipfile for prior versions.

You can find the new mobile-friendly user guides on program pages, as well as here:

There are additional examples of how these render on mobile devices here and here; their desktop rendering is similar. These new versions may eventually find their way into zipped packages in future builds. For now, to use any one of the new user guides locally on your machine, simply save the new page's UserGuide.html file in your browser (e.g., via right-click), and place it in the root folder of your program's install location. These docs are self-contained HTML files.

Notice that the PyGadgets program is not listed above, because it has no user guide document; for usage pointers, see its README.txt file and the in-program help of each of the programs it launches. Also note that the screenshot pages of major desktop apps are still not mobile-friendly; given that they span some 75 indexes among 25 zipped products (5 apps, 5 products each, and 3 platform pages apiece), they too await future app releases.

Update: as of August 2018, the online versions of all apps' screenshots have now also been converted to be mobile-friendly, and have been refreshed to use the latest version of thumbspage and its viewer-navigation pages. Updating online was a smaller-scale task (just 15 index pages and folders among 5 zipfiles, one for each app including PyGadgets). The new screenshot galleries will be incorporated into app zip packages over time as they are rereleased. To see the new screenshots now, visit app screenshot pages here, or browse the full set in the live demos list here.

[Back to Index]

 

Mac Source: New Python Installers Include Tk 8.6 (May-2018)

As of spring 2018 and Pythons 3.6.5 and 2.7.15, python.org now offers installers for Mac OS 10.9 and later that bundle Tcl/Tk 8.6. Assuming the new installs' Tks work properly, this means that users of any of the source-code versions of apps on this site who install these Pythons no longer need to install a Tk GUI library separately.

There are additional notes about this change on this page, and three fine points to keep in mind:

Verification results for the apps on this site will be posted here as time allows. As always, though, solutions of the past are still available if those of the present come up short.

Update: per early testing in May 2018, it now appears that the new Mac Python 3.6 + Tk 8.6 install may have serious issues, if not outright bugs. Notably, the new install crashed on a simple file-save dialog immediately after it was launched. While this does not impact apps or executables, such results make it difficult to recommend the new install for users of GUI source-code packages on this site. At the least, users of the new install should expect to find and resolve an arbitrary number of issues.

For the full story on the new install's testing results, see this post.

[Back to Index]

 

PyGadgets on Mac: PyPhoto Source, "Too many open files" (Aug-2018)

As uncovered in thumbspage, the third-party Pillow image library has a bug that can make it run into Too many open files errors during thumbnails generation in the PyPhoto program shipped with PyGadgets. In short, Pillow doesn't close image files when it should, which can cause it to breach system limits.

This can occur only for very large folders, and runs that generate very many new thumbnails (several hundred suffice). Moreover, it is generally a concern only when running PyPhoto's source code from a shell (a.k.a. Terminal) on Mac OS, because that platform's shell imposes a low open-files limit by default. This bug has not been seen to impact Windows, Linux, or users of PyPhoto in the PyGadgets Mac OS app. Where it does occur, the bug kills the PyPhoto GUI, but doesn't write or corrupt thumbnail information.

Luckily, the workaround is simple. When using source-code PyPhoto in a Mac OS shell, simply run the following command to raise the open-files limit before starting PyPhoto to view a large folder:

ulimit -n 9999

This handles as many images as you're ever likely to have in one folder, but use a higher number if needed. After running the above, launch PyPhoto's source code as usual:

python3 .../pygadgets/_PyPhoto/PIL/pyphoto.py -InitialFolder bigfolder

You can read much more about this Pillow issue in thumbspage's forked version of the viewer_thumbs.py module, available here. That module uses a code workaround that makes the ulimit fix above unnecessary, supports arbitrarily large folders everywhere, and may find its way into a future PyPhoto (the trick is to take manual control of file opens and closes). Given the obscurity of both the bug and use of PyPhoto's source code in general (even its author had to scrounge for details...), this is a low-priority item.

Update: PyPhoto eventually incorporated the thumbspage workaround too, in its 2.2 release that was included in PyGadgets' September 2018 rerelease. This means that PyPhoto, like thumbspage, is immune to the Pillow file-close bug. There's more on the PyPhoto patch release above.

[Back to Index]

 

showcode and PyEdit: The Perils of Mixed Unicode Encodings (Sep-2018)

This tip comes from showcode—a CGI script that loads files by trying candidate Unicode encodings in a list, and is used in conjunction with an Apache rewrite rule to display this site's code and other text files in HTML pages. showcode doesn't belong to the apps category, but this note's principles are widely applicable. Notably, they also touch on both genhtml, which uses a similar Unicode-choices scheme, and PyEdit, which relies on user inputs and defaults to correctly open and save files. Coders and users of any such system may find relevance here.

Short story: When using the showcode script, a site's displayable text files should generally all use a common Unicode encoding type for reliable display (e.g., UTF-8, which handles all text, and is the preset first candidate). Else, it's possible that some files may be loaded per an incorrect encoding if their data passes under other schemes. This is especially possible if files use several incompatible 8-bit encoding schemes: the first on the encodings list that successfully loads the data will win, and may munge some characters in the process.

The Issue

This issue cropped up in an older file at this site created with the CP-1252 (a.k.a. Windows-1252) encoding on Windows, whose tools have a nasty habit of silently using its native encodings. This file's slanted quotes failed to display correctly in showcode because Python happily loads the file as Latin-1 (a.k.a. ISO-8859-1), despite its non-Latin-1 quotes. The loaded text encodes as UTF-8 for transmission, but decodes with junk bytes.

Here's the story in code. Python does not allow the character to be encoded as Latin-1, in either manual method calls or implicit file-object writes. This is as it should be: the quote's 0x201c Unicode code point maps to and from byte value 0x93 in Windows' CP-1252, but is not defined in Latin-1's 8-bit-oriented character set:

>>> c = '“'               # run in Python 3.X 
>>> hex(ord(c))           # same in 2.X (using u'“', codecs.open(), print)
'0x201c'

>>> c.encode('cp1252')    # valid in CP-1252, but not Latin-1
b'\x93'
>>> c.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 0: ordinal not in range(256)

>>> n = open('temp', 'w', encoding='cp1252').write(c)
>>> n = open('temp', 'w', encoding='latin1').write(c)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 0: ordinal not in range(256)

Conversely, decoding this character's CP-1252 byte to Latin-1 works both in manual method calls and file-object reads. This is presumably because byte value 0x93 maps to an obscure and unprintable "STS" C1 control character in some Latin-1 definitions, though the decoder may simply allow any 8-bit value to pass. It's not a CP-1252 quote in any event:

>>> b = b'\x93'
>>> b.decode('cp1252')    # the proper translation
'“'
>>> b.decode('latin1')    # but it's not a quote in latin1
'\x93'

>>> n = open('temp', 'wb').write(b)
>>> open('temp', encoding='cp1252').read()
'“'
>>> open('temp', encoding='latin1').read()    # <= what showcode did
'\x93'

This is problematic in showcode, because this script relies on encoding failures to find one that matches the data and translates its content to code points correctly. Because a CP-1252 file loads without error as Latin-1, its UTF-8 encoding for reply transmission is erroneous; the quote's code point never makes the cut:

>>> b.decode('cp1252').encode('utf8').decode('utf8')    # load, reply, browser
'“'
>>> b.decode('latin1').encode('utf8').decode('utf8')    # the Latin-1 munge...
'\x93'

>>> n = open('temp', 'w', encoding='utf8').write(b.decode('cp1252'))
>>> open('temp', encoding='utf8').read()
'“'
>>> n = open('temp', 'w', encoding='utf8').write(b.decode('latin1'))
>>> open('temp', encoding='utf8').read()
'\x93'

The net effect turns the quote into a garbage byte that browsers simply ignore (it's an odd box in Firefox's view-source, but is otherwise hidden).

The Fix

If your non-UTF-8 files are only CP-1252, replacing Latin-1 with CP-1252 in the encodings list fixes the issue. However, if your site's files use multiple encodings whose byte ranges overlap but map to different characters, using CP-1252 may fix some files but break others. Latin-1 files using the 0x93 control code, for example, would sprout quotes when displayed (unlikely, but true). The real issue here is that content of mixed encodings is inherently ambiguous in the Unicode model.

The better solution is to make sure your site's displayable text files don't use incompatible encoding schemes. At showcode's site, the simplest fix was to adopt UTF-8 as the site-wide encoding, by opening its handful of CP-1252 files as CP-1252, and saving as UTF-8. The set of suspect files can be easily isolated by trying UTF-8 opens (in a variation of other code on this page):

>>> import os
>>> textexts = ('.html', '.htm', '.py', '.pyw', '.txt')
>>> for (dirhere, subshere, fileshere) in os.walk('/Websites/path'):
...     for filename in fileshere:
...         if filename.endswith(textexts):    # or mimetypes
...             pathname = os.path.join(dirhere, filename)    
...             try:
...                 x = open(pathname, mode='r', encoding='utf8').read()
...             except:
...                 print('Failed:', pathname)

Converting to UTF-8 universally will not only help avoid corrupted text in showcode, it might also avoid issues in text editors that are given or guess encoding types. If you give the wrong encoding to an editor, saves may corrupt your data. If you expect a tool to deal with mixed encoding types, guessing may be its only recourse. But guessing is overkill; is impossible to do accurately anyhow; and is not science. Skip the drama and convert your files. We can't fix Unicode's built-in ambiguity, but we can take it out of the game.

Because mixed encodings are such a common concern, you'll find ample background on the web. As a sampler: learn about encoding guesses here; read more about the Latin-1 encoding here and here; and dig deeper into the politically charged Latin-1/CP-1252 encoding mess here and here.

Update: in light of the above, 'latin1' was eventually replaced by 'cp1252' in showcode's preset input-encodings list, to accommodate a few files at this site that are intentionally not UTF-8 (this is similar in spirit to the policies for parsing web pages in HTML5). CP-1252 is a superset of Latin-1 and should work more broadly, but change as needed for your site's files. This is still only a partial solution for mixed-content ambiguity; use a common Unicode type to avoid encoding mismatches altogether.

Footnote: Latin-1 Pass-Through

Subtly, some scripts, including this site's genhtml page builder, can often get away with treating CP-1252 files as Latin-1 files, because bytes whose interpretations differ between the two are passed through unchanged from load to save:

>>> c = '“'
>>> n = open('temp', 'w', encoding='cp1252').write(c)    # save as cp1252
>>> open('temp', 'r', encoding='cp1252').read()
'“'
>>> L = open('temp', 'r', encoding='latin1').read()      # load as Latin-1
>>> L
'\x93'
>>> n = open('temp', 'w', encoding='latin1').write(L)    # save as Latin-1
>>> open('temp', 'r', encoding='cp1252').read()          # retains CP-1252 quote
'“'
>>> open('temp', 'rb').read()                            # 0x93's meaning varies  
b'\x93'

In other words, what Latin-1 reads and writes as 0x93 is still to CP-1252. This means that 'latin1' generally works as well as 'cp1252' and other 8-bit encodings in genhtml and other pass-through contexts. In fact, it's tempting to think of Latin-1 files as bytes files, because their encoded values are also code-point values for the characters Latin-1 supports:

>>> x = 'Ä'
>>> ord(x)                        # Ä is code point 196
196
>>> x.encode('latin1')            # a non-ASCII byte
b'\xc4'
>>> x.encode('latin1')[0]         # Latin-1 encoded bytes == code points
196
>>> chr(196)
'Ä'

But this analogy doesn't quite survive contact with Unicode reality. For one thing, Latin-1 can't decode text encoded outside its 8-bit range—whether the text's characters are in the Latin-1 alphabet or not:

Latin-1 characters
>>> x = 'Ä'
>>> x.encode('utf8'), x.encode('utf16')    # Latin-1 can't load these 
(b'\xc3\x84', b'\xff\xfe\xc4\x00')

>>> x.encode('latin1').decode('latin1')    # file save, file load
'Ä'
>>> x.encode('utf8').decode('latin1')      # not an 8-bit encoding
'Ã\x84'

Non-Latin-1 characters
>>> c = '⻨'
>>> hex(ord(c))                            # code point is > 8 bits
'0x2ee8'
>>> c.encode('utf8')                       # encoding is > 8 bits
b'\xe2\xbb\xa8'
>>> c.encode('utf8').decode('utf8')        # Latin-1 can't encode or decode
'⻨'
>>> c.encode('utf8').decode('latin1')
'⻨'

For another, even in the limited 8-bit world, Latin-1's results will fail to match text outside its character set (this was ultimately to blame for showcode's missing quotes):

Load/save pass-through works
>>> '“'.encode('cp1252').decode('latin1').encode('latin1').decode('cp1252')
'“'

But comparisons may not
>>> '“'.encode('cp1252').decode('latin1') == '“'    # cp1252's meaning is lost
False

Though only for non-Latin-1 code points
>>> for char in 'xē':
...     print(char.encode('cp1252').decode('cp1252') == char,    # load as cp1252
...           char.encode('cp1252').decode('latin1') == char)    # load as latin1
... 
True True
True True
True False

In the end, Latin-1's pass-through behavior is a mixed bag:

Using the correct encoding—and preferably just one encoding—is still the safest bet.

But don't quote me on that. (Hey, I had to get a pun on this page somewhere...)

[Back to Index]

 

Mergeall on Android: It Runs — but It Doesn't Work (Sep-2018)

The Mergeall incremental-backup and content-propagation program has been recently tested on Android devices, as a possible way to simplify synchronization of onboard content copies. Merges between the external SD card and a connected USB flashdrive, for example, might make it unnecessary to pop the SD card in and out of the phone. Results so far have been illuminating but mixed—and in the end, exemplary of the designed-in limitations of mobile devices in general.

The Good News

Although Mergeall's GUI cannot be used on Android, both its interactive-console and command-line-script usage modes run as advertised on that platform, and without any code changes. For example, the console mode can be used in the QPython3 app, and the command-line mode can be run in the Termux app after installing its Python package (these apps currently run source code with Python 3.2 and 3.6, respectively; QPython also offers a 2.X option). Mergeall's GUI mode won't work, because it is based on the Tk desktop GUI toolkit which has yet to be ported to Android. The underlying Mergeall script which its GUI launches, however, can be used directly anywhere that Python runs—an advantage of decoupled program architecture.

The Bad News

While Mergeall's console and command-line modes on Android can correctly read and compare files on both SD cards and USB flashdrives (using their /storage/xxxx-xxxx access paths on devices tested), these modes still cannot change content on either type of device. This unfortunately makes Mergeall mostly useless on Android: its comparison phase works well, but its resolution phase fails to update any files, and produces an error message for each change attempted. In other words, Mergeall is currently a comparison-only tool on Android; it cannot be used to update archives on SD cards or USB drives for changes.

The Android Showstopper

The update failures stem from Android's convoluted and wildly proprietary permissions model, which does not directly support the paradigm of general command-line scripts. In short, apps—including those that execute text-file scripts—run a sandbox, which by design restricts access to writeable media in ways that vary per Android version, and may involve dedicated folders, app manifests, Java-oriented code to trigger permission dialogs, and unique APIs and filesystem requirements for USB drives.

Some Android file explorer apps use these measures to support updates to SD and USB media. For a cross-platform file-processing program like Mergeall, though, such constraints are crippling, if not lethal. Android may be based on Linux, but it's been gutted of much of the Linux development experience, and most of its open-access philosophy.

The Android Workaround

Because the permissions and access code required by Android is too custom to integrate into Mergeall without substantial changes, users of both systems today are encouraged to employ an Android device with a removable SD card. When it's time to synchronize either to or from your phone's content copy, simply pop the card out and merge on a real computer; any Mac OS, Windows, or true Linux device will do.

It's not impossible that a future Mergeall could support use on Android directly. The program already has some unique code for each desktop platform, though none impose third-party dependencies as Android would (a port would minimally require a Python-to-Java interface for permissions, and may necessitate a complete and custom Android app). As it stands, however, the only conceived use case does not justify the effort.

It's worth noting that this post pertains to devices as they are shipped. "Rooting" (a.k.a. "jailbreaking") yours may open up additional Mergeall prospects. This option was not tested, because rooting is not possible for every user and device, and is strongly and even actively discouraged by most hardware and software vendors—which brings us to this post's conclusion.

The Mobile Corral

Mergeall issues aside, it's difficult to recommend mobile platforms for content storage in general. The net effect of Android's proprietary permissions model both limits device scope and locks down user options—outcomes surely much more in line with the goals of multiple revenue-seeking parties, than those of device owners. This is hardly a basis for trust in a data-storage relationship.

Nor is Android the only rustler in the mobile corral. iOS is even more closed, with no general-purpose filesystem or removable media to be found, and a complete lockout of software outside the company store. This is about as proprietary as a computer system can be (and would almost certainly have raised government eyebrows in decades past).

As a result of both of these systems' practices and dominance, mobile users are compelled to choose between one platform seemingly designed to reap advertising data and boost cloud subscriptions, and another ostensibly crafted to trap users' media and coerce brand dependency. These may not be the best places to keep your cherished photos and personal documents.

But it doesn't have to be this way. Today's mobile options are far too pervasive and powerful to justify user-experience constraints. In a world where computing-device companies truly have their customers' best interests at heart, interoperability would take a back seat only to privacy. Let's hope that world shows up soon.

[Back to Index]




[Home] Books Programs Blog Python Author Training Search Email ©M.Lutz