This page is a supplemental resource for the application programs and other software available on this site. Initially posted after the first major applications release on June 15, 2017, this page hosts both:
Why this page? Although each application program comes with a README.txt file that describes known issues and workarounds at the time of its latest publication, new items are bound to arise over time in any nontrivial software. This page documents issues uncovered after major releases and hence unmentioned in prior versions' documentation, and logs later software releases. Users are invited to consider this page a virtual docs appendix, and check back for updates over time. The latest content update here: March 26, 2018.
This section announces new releases of published applications and programs. Items here:
|▶ (Aug-14-2017)||Frigcal and PyMailGUI Mac Apps Rerelease: Broken-Pipe Workaround|
|▶ (Sep-28-2017)||PyGadgets GUI "Toy Box" Added to the Applications Lineup|
|▶ (Oct-5-2017)||PyGadgets Rerelease: PyClock Optimization + Hand Redraws Fix|
|▶ (Oct-17-2017)||New Industrial-Strength Version of the tagpix Photos Organizer|
|▶ (Nov-1-2017)||PyGadgets Rerelease: PyPhoto Update for Single-File Thumbnails Cache|
|▶ (Dec-9-2017)||Mergeall Rerelease: Folder Modtimes, Linux Flushes, Scripts "-u"|
The Mac app packages of the Frigcal and PyMailGUI programs were rereleased on August 14, 2017, to address a rare output error that likely stems from an issue in the Mac's app-launcher system. Users of prior releases of the Mac app versions of Frigcal and PyMailGUI are encouraged to fetch and install the new versions; see your program's main README file for upgrade pointers. The source-code packages of these two programs were also rereleased with the fix's code, but just as examples for developers; the error occurs only when these two programs are run as Mac apps—not when they are run as source code on Macs, and never on other platforms. No other program packages were updated.
The complete description of the fix applied to Frigcal and PyMailGUI apps is off-page here, because it is mostly of interest to developers. In short, the main GUI in these two programs runs as a child process spawned by a launcher GUI. Likely due to a misfeature or bug in the Mac's app-launching system, it was not impossible that printed console output generated by these apps' main GUIs could eventually trigger a broken-pipe exception after the launcher GUI exited. Though rare, generally harmless, and an issue in Mac apps only, these failures would manifest as GUI popup error messages naming "Broken Pipe" as cause. The only observed consequence to date was the need to rerun a calendar save operation in Frigcal just once in nearly one year of daily usage.
PyGadgets, a set of 4 smaller GUI programs, was released as an addition to the applications available on this site. Its main page lives here, and its entry on the main programs page is here. Though less broadly-focused than the other applications here, the PyGadgets toolset—a calculator, clock, photo viewer, and game—are both potentially useful and educational; work the same on all major desktop platforms; and are available as both stand-alone executables and full source code. Note that this was a separate, additional package release; no other application packages were updated.
PyGadgets was rereleased (all its packages) to fix a minor defect in PyClock, which delayed redraws of the analog display's minute and hour hands too long in some contexts. These two hands are normally not redrawn until the second hand reaches 12, as a valid and important optimization. This delay can be problematic, though, if used after any state that precludes analog clock updates — including window minimization, digital-display mode, system suspend, and menu and modal-dialog view on some platforms. In all such cases, the analog time might not be current or correct until the second hand again reaches 12. To avoid this lag, PyClock now monitors the last-analog-redraw time to force all hands to be redrawn immediately in all these contexts.
As a related optimization, PyClock now also avoids updating the analog display's AM/PM label every second, just like the minute and hour hands. This seems to have further reduced the memory leakage that occurs while the analog display window is open on Macs: open-window growth is now just 1M per 20 minutes, which translates to 3M/hour, 72M/day, and 500M/week. This is roughly half what it was formerly — a compelling reason to retain the prior paragraph's optimization — and some popular Mac apps use substantially more memory over time (see your Mac Activity Monitor). Oddly, the digital display now leaks memory fastest, but is likely lesser-used; optimizing it remains a low-priority TBD.
Also changed PyCalc to automatically set the focus on new "cmd" popups' entry fields (which also shifts focus to their windows on Windows), and added a Mac All Desktops tip (below) to the README.
tagpix—a command-line script used to normalize photo collections—has been released in a much-enhanced version, with new support for duplicates resolution, error recovery, by-year grouping, and much more. For the full story, including a list of changes in this version, see the new User Guide. To download a copy, visit the tagpix web page.
PyGadgets was rereleased (all its packages) to include a new version of PyPhoto, which stores a folder's thumbnail images in a single pickle file, instead of individual image files in a subfolder. This new design requires no more space or time, but avoids extra files (15k images formerly meant 15k thumbnail files); multiple file loads and saves; and nasty modtime-copy issues for backup programs too rare and complex to cover here. See the source code for full details.
This is a mildly backward-incompatible change. Prior PyGadgets/PyPhoto version users: when upgrading to a newer release, run the included "delete-pyphoto2.0-thumbs-folders.py" script (or its frozen executable) from a command line to delete all former PyPhoto thumbnail subfolders. This program is run with no command line arguments, and asks for a folder path and delete verifications. In the Mac app, it's an executable at "PyGadgets.app/Contents/MacOS" (see Show Package Contents); in other packages, it's in your install folder. PyPhoto will still work if you don't delete the former subfolders, but they will be unused trash.
Apart from this compatibility fix, the main user-visible artifact of this upgrade is a single "_PyPhoto-thumbs.pkl" file per opened folder, instead of the former "thumbs" subfolder. This thumbnail cache is still built on first open and auto-updated on changes as before, to make later opens fast. Though less prominent, the new PyPhoto also displays images in the same order everywhere; uses a placeholder thumbnail for photos with errors (instead of omitting); and works on unwriteable folders (though slowly).
PyPhoto now also supports a new "NoThumbChanges" configuration-file setting and command-line argument, which can be used to prevent rare but spurious thumbnail regenerations for large, static archives, when file modification times are skewed between platforms or filesystems. See PyGadgets_configs.py. This setting's False default need not be changed in typical usage. For example, PyPhoto photo archives and thumb caches have been seen to work correctly without this change when used on a single platform, burned to BD-R discs, or transferred between Mac OS and Windows on exFAT drives.
This release also applied a minor change to PyCalc, to allow fractional floating-point numbers to be entered with a leading "." instead of "0.". To halve a number, for instance, "24 * .5" and "24 * 0.5" both now work. Numbers with "E" exponents allow both forms too: ".1E-99" or "0.1E-99".
Version 3.1 of the Mergeall content backup and propagation application includes multiple minor enhancements for all download packages, and is a recommended upgrade for all users (be sure to save and restore your mergeall_configs.py customizations). In this release:
For the full story on these changes, see the release notes.
This section collects usage pointers for published applications that arose after releases, and hence are not covered in earlier releases' documentation. Items here:
|▶ (Jun-2017)||All Apps: Ignore First-Run Warnings|
|▶ (Jun-2017)||Windows Exes: Don't Install to "C:\Program Files"|
|▶ (Jun-2017)||Mac Apps and Source: Homebrew Tk 8.6 is DOA|
|▶ (Aug-2017)||PyEdit Auto-Saves: Converting from UTF-8 Encoding|
|▶ (Aug-2017)||Mac Apps: Avoid Dock-Menu Zombies with 3-Finger Downswipes|
|▶ (Aug-2017)||Mac Apps: Avoid Installing to Desktop if Apps Fail?|
|▶ (Sep-2017)||PyEdit RunCode: Package Imports May Require __init__.py Files|
|▶ (Sep-2017)||Mac Apps: Assign to All Desktops for Quick Access Everywhere|
|▶ (Oct-2017)||PyEdit on Mac: Restart if Memory Use Grows Too High|
|▶ (Oct-2017)||Mergeall: Mac OS Sierra's Finder Hides ".DS_Store" Files|
|▶ (Nov-2017)||Mergeall: Tailing Redirected Output of the diffall Utility|
|▶ (Dec-2017)||Mergeall: Unzipped Files May Trigger Differences and Copies|
|▶ (Mar-2018)||PyEdit: Dropping the BOM in Unicode Files|
|▶ (Mar-2018)||PyEdit: More About Unicode Encoding Defaults|
On some Mac OS X systems, and on Windows 7, 8, and 10, you may get a warning when first trying to use this site's applications, because of defaults regarding unverified sources — arguably overkill at best, and a step towards proprietary lockdown at worst. You can safely ignore these, but may have to approve a program the first time you use it; see the warning popup for more details. On Macs, for example, Open in the 2-finger-press or control-click menu approves a program quickly and permanently (and may be faster than opening the security-preferences screen).
This inconvenience is regrettable, but this site's proprietor is an independent developer who does not work for Apple or Microsoft, and has no interest in the supplication inherent in program registration. Some web browsers can be Orwellian about zip files too; and Windows 10 S, unfortunately, is right out.
On Windows, you should generally avoid saving unzipped exe folders in "C:\Program Files" because neither you nor programs may have permission to save files there. The current program READMEs suggest this location as one possibility, but this can lead to issues. For instance, using that folder can complicate config-file edits (you may need to run editors as administrator), and can even prevent the Frigcal launcher from closing (it won't find a sentinel file because one cannot be written in its install folder). To avoid such issues, save your unzipped exe folders to your Desktop or elsewhere instead.
On Mac OS X, the apps have not been tested or built with Homebrew Python 3.X and Tk 8.6 (a leading alternative distribution that supports a newer Tk) because the Homebrew install is currently broken—Python and Tk build correctly, but crash immediately with an "Abort trap: 6." This is a widespread issue that impacts both app builds and source-code use, and makes it difficult to explore possible fixes for Tk 8.5 Mac issues (e.g., Dock zombies and scroll speed). Given the requirements of a manual build, this effectively puts further research on hold.
You can read others' reports about this issue here and here. The latter includes a curt refusal to fix from the project, delegating the job to impacted users. No, really. It's not clear where this bug lies, but projects that publish a product clearly have some responsibility for that product. That is: broken + = punt; Homebrew is currently neither viable nor recommended. Hopefully, python.org's Mac Python3 will support Tk 8.6 soon.
Update: as of August 7, it appears that this critical Homebrew Tk 8.6 bug may have been fixed, per later posts on the GitHub thread. Apparently, an early release of the next version of Tk was required. This is good news if true (and the Mac apps here may be revisited soon), but the outright crash on startup doesn't exactly instill confidence in Homebrew and/or its Tk on Macs going forward. To be fair, though, the Mac's rich user interface is stateful enough to pose challenges to any GUI toolkit.
Update: after testing Homebrew Python 3.6 + Tk 8.6 on Mac OS Sierra in September, 2017, it now appears that Tk 8.6 on Mac is a non-starter. It does indeed fix the Dock menu zombies problem of ActiveState's Tk 8.5. But 8.6 also:
Alas, Tk on Mac is not always all it should be. It's possible to use it for programs like those on this site, but this requires substantial workarounds (of the duct-tape-and-twine sort), and the resulting programs have to live with a set of defects that varies per Mac Tk release. If you're developing commercial-grade GUIs, see PyQt for one portable alternative, and PyObjC for a non-portable option; and mind the pitfalls inherent in development under the open-source "batteries included" banner, and its proprietary cousins.
A technical note for PyEdit users only: as documented in PyEdit's UserGuide.html and textConfig.py, its auto-save feature always writes text with unsaved changes to files using the general UTF-8 Unicode encoding scheme. This is necessary, because there may be no known encoding (e.g., the text may not yet have been saved to a file), and a known encoding may fail (e.g., Unicode symbols may be inserted into the text of a file originally opened as ASCII, precluding ASCII encoding on saves).
This UTF-8 policy may cause issues, however, for HTML files that declare a different encoding explicitly using <meta> tags. If you must recover such a file from the auto-saves folder, you can either change its <meta> tag to declare UTF-8, or convert its text back to the original encoding. For the latter, you can easily restore the original encoding by:
Naturally, this assumes your text is still compatible with the encoding you enter (else, PyEdit will generally fall back on UTF-8 again), and be sure to use "Save As" ("Save" silently uses the encoding provided on "Open" if the text came from a file). This issue is both rare and subtle, but unavoidable in files with explicit and usage-specific encoding declarations that may diverge from actual content. For more encoding-conversion options, see "savesUseKnownEncoding" in textConfig.py, and the command-line conversion utility script unicodemod.
As mentioned in the main README files of all the complete applications available on this site, the current releases of the Mac apps can leave zombie entries for closed windows in their Dock menus due to a bug in the underlying Tk 8.5 GUI library used. This remains a to-be-fixed item, pending adoption of a new Tk version (which now seems unlikely; see above).
Fortunately, it turns out that there is a standard and easy way to see the apps' truly-active windows anyhow: simply use a 3-finger downswipe on the trackpad (or its control+downarrow keyboard equivalent) to activate the "App Exposé" view of active app windows. This gesture can be performed on any of the app's windows, or on its Dock icon. When run on the Dock icon, this is no more difficult than opening the Dock menu with a 2-finger click, and yields an arguably-better and full-screen display that does not include any zombie entries.
This is a standard Mac feature, but may be unknown to some users, and is mentioned only in passing in the apps' READMEs. You may need to enable it once in System Preferences by clicking the App Exposé checkmark. Once enabled, though, this provides a simple way to view an app's open windows, and is immune to Tk 8.5 zombies.
Similar to the Windows exes note above: one user has reported that the Frigcal Mac app can fail if copied to and run from Desktop on Mac OS X Sierra, because the app does not have permission to write files (Frigcal needs to create an initial calendar file on first run, and a sentinel file on each run). This couldn't be recreated on El Capitan or Sierra machines, may be user-specific, and is outside the apps' scope. If your apps have similar problems on your Desktop, though, copy them to your /Applications folder instead; this has the added advantage of adding the app to Launchpad for quick access.
For reasons to be determined, import statements in code run by PyEdit's RunCode can fail if they attempt to import a module package having no __init__.py file. That is, Python 3.3+ namespace packages don't seem to be fully operative in code run by a normal compile()/exec() pair, despite all the run-time context set up by PyEdit's code proxy. This may reflect an anomaly or bug in Python's import machinery (which changes so often as to be fairly accused of thrashing), and may require use of Python's runpy module or similar code.
Barring a future fix, though, the workaround is simple: simply make sure all your package folders to be used in RunCode have an __init__.py, even if it's completely empty. Unless your use case really requires namespace packages (and almost none do), an __init__.py is good and recommended practice anyhow. It makes your package imports more efficient, and your code's structure more explicit. For more on PyEdit's RunCode, see its Tools menu docs. For a primer on 3.3+ namespace packages, try Chapter 24 in Learning Python, 5th Edition.
Here's another Mac user-level tip that may not be obvious to everyone,
but is especially relevant to utility programs like
PyGadgets: to have access to an app on every Mac
desktop, right-click its Dock app icon, select Options, and choose the Assign To
section's All Desktops. Once you do so, single-clicking the minimized program's
Dock icon will reopen it on whatever desktop you happen to be viewing at the time.
for example, are the sorts of desktop utilities you might want to access
occasionally and quickly.
Simply open once and set to All Desktops per above, then minimize when not in use,
and click the Dock icon to reopen. This both displays an open gadget on
every desktop, and reopens a hidden gadget immediately on the current desktop.
Perhaps best of all, this avoids the annoying and attention-shattering
desktop switches that occur by default when you reopen an app assigned
to its single, original desktop.
Two fine points here. First, this works whether your Dock preferences set "Minimize
windows into application icon" or not, but the All Desktops setting is in the
Dock's application icon. Second, this can also be used for the
calendar GUI (which also reopens its month image on the current desktop), but may be
less desirable for apps like the
text editor that create many windows or take special actions on Dock clicks
(see Programs for both).
PyEdit on Mac: Restart if Memory Use Grows Too High (Oct-2017)
PyGadgets' calculator and clock, for example, are the sorts of desktop utilities you might want to access occasionally and quickly. Simply open once and set to All Desktops per above, then minimize when not in use, and click the Dock icon to reopen. This both displays an open gadget on every desktop, and reopens a hidden gadget immediately on the current desktop. Perhaps best of all, this avoids the annoying and attention-shattering desktop switches that occur by default when you reopen an app assigned to its single, original desktop.
Two fine points here. First, this works whether your Dock preferences set "Minimize windows into application icon" or not, but the All Desktops setting is in the Dock's application icon. Second, this can also be used for the Frigcal calendar GUI (which also reopens its month image on the current desktop), but may be less desirable for apps like the PyEdit text editor that create many windows or take special actions on Dock clicks (see Programs for both).
Though usually not a concern, PyEdit's memory usage on Mac OS might grow high if used for a long time without a restart. The exact cause remains to be isolated, but this seems to occur when using PyEdit's Run Code option to run edited programs; is noticeable only after intense work spanning multiple days; and isn't particularly grievous by Mac standards. The worst case to date saw PyEdit reach 2G memory (from its 36M start) on El Capitan, but it was still #3 on the worst-memory-offenders list at the time, behind both Firefox and WindowServer, and just ahead of Excel. Moreover, almost all of PyEdit's memory space was compressed (not in active use).
Still, if this grows problematic on your machine, the simplest solution is to periodically close all PyEdit windows and restart—an unfortunately common cure for Mac app ills. For a related topic, see the memory leak workarounds in the PyClock program of PyGadgets, covered in its README; though nonfatal, memory issues seem a recurring theme for Tk apps on Macs.
The Mergeall backup/mirroring application goes to great lengths to avoid propagating "cruft" files (platform-specific trash), and in its User Guide points to the numerous ".DS_Store" hidden files on Mac OS as prime offenders. These files can be pathological on Macs for anyone involved in programming or content production, and were responsible for many of the changes required to support the Mac platform.
As of Mac OS Sierra (10.12), setting your defaults to display hidden files as described in that guide still works as before, but Finder has been special-cased to never display ".DS_Store" files. That is, the ".DS_Store" files are still there (and can be seen via a "ls -a" in Terminal, or an "os.listdir()" in Python), but Finder will no longer show them to you; even if you ask it to.
You can read more about this curious new Finder policy on the web. This seems the worst of both worlds. Not only does Finder still create these files in every folder you view (changing your folders' modification times in the process), but not displaying them can easily lead to major problems if they wind up being inadvertently uploaded, transferred, or otherwise included with actual content. Pretending a problem doesn't exist is not a valid solution to a problem—especially when users may have to pay the price for the deception!
Luckily, you can still take control of cruft like ".DS_Store" files with tools like Mergeall and ziptools that callout such items explicitly to help you minimize their impacts. We can also hope that Apple someday finds a better way to record Finder information than dumping it in hidden-but-real ".DS_Store" files all over your drives. Sadly, this still seems wishful thinking as of the new High Sierra and its oddly-mandatory APFS filesystem.
Footnote: also in the oddly column, Mac OS High Sierra abruptly dropped the longstanding and widely-used "ftp" client program, in yet another agendas-versus-customers move. See the web for discussion; in short, secure "sftp" is still present, but works only for sites that support it, and this doesn't help programs or users that relied on the functionality removed. Alas, open-source software is not the only domain where the whims of the few can rudely trounce the needs of the many. On the upside, a simple Python script can shatter many an Orwellian decree...
Postscript: in the too-ironic-to-bear department, Mac OS High Sierra also came with a massive security flaw which allowed anyone to gain root access to a machine without a password, and required an emergency overnight patch. But "ftp" was too risky to ship.
The Mergeall content backup/propagation program is usually run from its GUI launcher, but can also be run from a command-line, and includes some extra command-line scripts useful for managing archives. The most notable of the extras may be diffall, a program which does a byte-by-byte comparison of everything in two folder trees, as described in Mergeall's User Guide.
Because diffall can run for a long time on large trees, it's convenient on Unix to run it in the background and monitor its output file with a "tail" using command-lines like the following (typed in Terminal on Mac OS):
~$ python3 diffall.py /MY-STUFF /MY-COPY -skipcruft > Desktop/temp.txt & ~$ tail -f Desktop/temp.txt
That works on Mac OS's El Capitan release, but not quite on its High Sierra. For reasons that aren't clear, when redirected to a file, Python 3.5's stdout stream—the target for basic print() calls—is not buffered (or not buffered as much) on the former, but is fully buffered on the latter. Hence the "tail" may not show anything for quite some time on High Sierra, and even then, will print only in spurts. Technically, El Capitan may buffer stdout too, but its buffer blocksize may be so small that its output appears regularly, while High Sierra's does not.
To make printed text show up in the output file immediately on both Mac OS versions (as well as other Unix-like platforms), pass Python's "-u" unbuffered flag in the first command above:
~$ python3 -u diffall.py /MY-STUFF /MY-COPY -skipcruft > Desktop/temp.txt &
Or, set the equivalent environment variable in your shell (e.g., in ~/.bash_profile) and skip the "-u" argument in the command line:
~$ vi ~/.bash_profile export PYTHONUNBUFFERED=1
Either way, this forces Python print() calls to send their output immediately on all platforms, so that it can be watched with a Unix "tail." Unfortunately, "-u" doesn't apply and the environment variable has no effect in the Mac app's frozen diffall executable, so app users will want to grab the source-code version to tail its stdout on platforms where stdout is buffered. This isn't required on El Capitan, because the frozen diffall's stdout is not buffered much there either (though to be fair, it's not clear which systems are broken!).
For more possible-but-unverified ideas, see also this discussion thread. Per preliminary testing, however, its "export NSUnbufferedIO=YES" suggestion appears to have no effect on the app's frozen diffall.
And if you're willing to change code, you can also reset sys.stdout to an object whose write() method always calls flush(), or, in Python 3.3+ only, use the extended form "print(x, flush=True)" for all prints. It's not clear that this should be done, though, as buffering is an optimization, and diffall's output can be large (e.g., it's 6MB big and 144K-lines long for an archive with 101K files and 10k folders); if implemented, this should probably be a diffall command-line option. Consider these suggested exercises—until the next Mergeall release (spoiler: it grew dedicated "-u").
Short story: though rare, Mergeall may report unexpected differences for files extracted by unzipping a zip file, due to the odd and inconsistent way unzipping programs handle zipped modification times. There is no complete fix for this, but you can use the same unzip tool each time to lessen impact; allow Mergeall to recopy unzipped files after they are extracted; or avoid including frequently-unzipped files in your archive—include their zip file instead.
Details: due to inconsistent handling of file-modification times across the many unzipping tools in use, it is not guaranteed that a given file's times will survive a zip and unzip combination. Just as for FAT32, zip files generally record file times in "local" time, which may be adjusted on unzips for both daylight savings time (DST) and time-zone changes. This can in turn throw off any program that relies on file-modification times, including Mergeall; its change-detection is fully dependent on timestamps.
As discussed in more detail in Mergeall's User Guide, the FAT32 issue can be addressed by using a different file system such as exFAT for cross-platform drives. The unzip issue, however, is much more thorny: an unzipping program may actually modify a file's recorded modification time as it recreates the file, and only for files last modified in a given time zone or DST phase. Hence, the differences reported by Mergeall are real but spurious (timestamps differ even if content does not), and globally adjusting all files' times up or down isn't an option (only a subset of files may have their times changed on extracts).
Perhaps worse, different unzip tools may apply time-adjustment rules differently, precluding an automatic workaround. The ziptools system available at this site, for example, defers to the local-time handling of Python's zipfile and time modules, which has been observed to differ from that of the Archive Manager used by Finder on Mac OS. For more background on this issue, try a web search like this one.
The upshot of all these factors is that Mergeall may report differences and run recopies for arbitrarily many files in an archive after they are re-unzipped from a zip file. This is a rare issue (and has arisen just once in 4 years of regular Mergeall use), but has no absolute fix. It may be minimized by using the same unzipping tool every time for a given set of files (see ziptools for a portable option). Barring this, you'll need to allow Mergeall to recopy the unzipped files that differ after unzips, or avoid keeping their unzipped versions in a Mergeall archive tree in the first place. The latter may be the simplest approach for files that will be unzipped often.
Interestingly, standard zip-file times are also limited to two-second precision just like FAT32, but Mergeall automatically accommodates this thanks to former fixes. Zip files' bizarre munging of time can also impact thumbnail-change management in the PyPhoto gadget, but in this context would simply trigger one-time thumb rebuilds. Other programs may fare worse after unzips.
The real solution here, of course, lies in either abandoning zip files altogether, or standardizing time formats across all computer systems in use today. Given both the popularity of zip and this industry's tendency towards fragmentation and flux, the odds of either solution appearing in our lifetimes seem about as good as those of an open-source project settling on a feature set...
Short story: if you wish to use PyEdit to edit Unicode text files that begin with a BOM character, be sure to open them with an encoding name that discards the BOM if present (e.g., 'utf-8-sig' for UTF-8, and 'utf-16' for UTF-16). You can also delete their BOMs permanently by opening the same way and saving with an encoding that doesn't add a BOM on output (e.g., 'utf-8' for UTF-8), or removing the BOM in the edit window as it is displayed. PyEdit doesn't add BOMs unless your encodings ask it to, but other editors may insert them automatically. If not accommodated or removed, a BOM will make the first line render oddly and difficult to edit, though the effect varies per platform.
Details: to understand this issue, you need to know a bit about one of Unicode's darker corners. In brief, text may start with an identifying marker known as a BOM, in the UTF-8, UTF-16, and UTF-32 encoding schemes. Widely-used UTF-8 files, for example, can begin with a BOM or not. When present, the BOM in such files is a nonprintable Unicode character with code point '\ufeff', which is encoded as bytes b'\xef\xbb\xbf'. Because encodings handle BOMs differently, selecting the right one can be crucial. In Python (and Python programs like PyEdit), neither 'utf-8' nor 'utf-8-sig' require a BOM to be present, but only the latter discards a BOM on input and adds one back on output.
It's easy to see this in code. A binary-mode file read always retains an encoded BOM at the front, and text mode gives the BOM's decoded code point unless it is discarded by 'utf-8-sig'. Here's the story for a BOM-laden UTF-8 file in Python 3.X, the version PyEdit uses (codecs.open() works essentially the same for text mode in 2.X, sans endline transforms):
$ python3 >>> b = open('purchase-pointers.html', 'rb').read() >>> b[:50] b'\xef\xbb\xbf$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books:' >>> >>> t1 = open('purchase-pointers.html', 'r', encoding='utf-8').read() >>> t2 = open('purchase-pointers.html', 'r', encoding='utf-8-sig').read() >>> >>> t1[:10] '\ufeff$DOCTYPE$' >>> t2[:10] '$DOCTYPE$\n' >>> >>> t1[1:] == t2 # Just the added BOM differs True
This issue is rare, but it cropped up recently in the HTML of a web
page edited in PyEdit. Somewhere along the way, a text editor on Windows
or Mac OS silently inserted a BOM at the start of the file's UTF-8 content
(as usual, Windows Notepad is the prime suspect). The covertly-added BOM
is harmless in web pages with content-type UTF-8, but causes the file's
first line to be munged in PyEdit when opened with its 'utf-8' encoding
Specifically—and for reasons known only to the underlying Tk GUI library
it uses—PyEdit displays the first line of a file oddly if it begins with
a Unicode BOM character not discarded by the encoding used to open it.
The BOM's impact, though, varies per PyEdit platform:
Specifically—and for reasons known only to the underlying Tk GUI library it uses—PyEdit displays the first line of a file oddly if it begins with a Unicode BOM character not discarded by the encoding used to open it. The BOM's impact, though, varies per PyEdit platform:
In other words, the BOM is rendered as the first character of the first line—whether you can tell or not. To see what happens on your machine, run code like the following to emulate the BOM-happy policies of editors like Notepad, and open the created file in PyEdit as 'utf-8':
>>> open('spam.txt', 'w', encoding='utf-8-sig').write('spam\nSPAM\n') 10 >>> open('spam.txt', 'r', encoding='utf-8').read() '\ufeffspam\nSPAM\n'
By contrast, PyEdit never discards or adds BOMs automatically, because it supports the full spectrum of Python Unicode encodings for both opens and saves, as a major distinguishing feature; it could not guess your wishes for BOMs in output, especially if they were stripped; and it refuses to enforce implicit global policies that are invariably incorrect in some contexts eventually. The last point is paramount; to be blunt, the simple-minded policies in other editors are the reason that HTML files sprouted unwanted and error-prone BOMs in the first place!
Because explicit beats implicit in programs that you trust with your content, PyEdit expects you to clarify your BOM goals, by either:
Either approach works because 'utf-8-sig' discards a BOM if present on input, and only 'utf-8-sig' adds one back on output. If you go with the first option, be sure to use the correct encoding name on each open; the second option is a one-time delete, after which 'utf-8' will suffice for opens.
You can arrange these combinations in PyEdit's configurations file by either fixing the open and/or save encodings, or having PyEdit ask for them (save's encoding defaults to open's if not fixed; see the end of your install's textConfig.py for more details). Some might even propose that PyEdit should automatically use the 'utf-8-sig' of these schemes for opens and/or saves, but magic is a very slippery slope: implicit BOM deletions and additions seem equally error-prone (and rude); this wouldn't work for people using other encodings like Latin-1; and most PyEdit users can safely ignore the issue altogether and stick with the preset 'utf-8' default.
In fact, if you don't care to deal with encoding names, you can generally accept the default 'utf-8' for both opens and saves, and simply delete any BOM characters as they are displayed in PyEdit, if and when they are added by other editors. The platform-specific renderings above give display details, but a delete at the top of the file suffices for all. This works well, but may not be as intuitive as explicit encoding names.
Either way, the net effect of deleting BOMs in PyEdit is also easy to verify in Python. The following was run after using the PyEdit Open/Save-As combination on the UTF-8 web-page file we met earlier, to save to a "-nobom" BOM-free copy:
>>> b1 = open('purchase-pointers.html', 'rb').read() >>> b2 = open('purchase-pointers-nobom.html', 'rb').read() >>> >>> b1[:50] b'\xef\xbb\xbf$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books:' >>> b2[:50] b'$DOCTYPE$\n\n<HTML>\n\n<HEAD>\n\n<TITLE>Python Books: Pu' >>> >>> b1[3:] == b2 # Just the dropped BOM differs True
UTF-8 is common for web pages, but it's not the only offender. UTF-16 and UTF-32 files may also be BOM-ridden, though their encodings work oppositely. In UTF-16, the general 'utf-16' always both discards a BOM on input and adds one on output (like the specific 'utf-8-sig'); but the more specific 'utf-16-le' does neither (like the general 'utf-8'); and ditto for UTF-32. To you, this means 'utf-16' and 'utf-32' generally suffice in PyEdit, because they both strip and restore BOMs in files:
>>> open('spam16.txt', 'w', encoding='utf-16').write('1\n2\n3\n') 6 >>> open('spam16.txt', 'rb').read() b'\xff\xfe1\x00\n\x002\x00\n\x003\x00\n\x00' >>> >>> open('spam16.txt', 'r', encoding='utf-16').read() '1\n2\n3\n' >>> open('spam16.txt', 'r', encoding='utf-16-le').read() '\ufeff1\n2\n3\n'
Finally, if you want to see which files in a folder tree may be clandestinely harboring BOMs, try something like the following (this code looks for UTF-8 BOMs in all HTML files in the current working directory; tweak as needed):
import os for (adir, subs, files) in os.walk('.'): for file in files: if file.endswith(('.htm', '.html')): path = os.path.join(adir, file) try: text = open(path, 'r', encoding='utf8').read() except: print('Not UTF8:', path) else: if text[:1] == '\ufeff': # Or try file.read(1) print('BOM=>', path) # This file has a BOM
When run by command line, file click, IDLE, or PyEdit's own Run Code, your output will be similar to this:
Not UTF8: ./lp3e-updates-notes-python.html Not UTF8: ./lp4e-preface-preview.html BOM=> ./lp4e-updates-clarifications-first-printing.html BOM=> ./lp4e-updates-clarifications-recent.html
For more background on the Unicode BOM—including more about its behavior in the UTF-16 and UTF-32 encodings omitted here for space—see the documentation at the top of the unicodemod.py script on this site, as well as the more in-depth coverage in the Advanced Topics part of the book Learning Python. For related tips, also see the Unicode conversion note earlier on this page, and the next note.
Speaking of PyEdit's configurations file: if you look at the Unicode settings near the end of textConfig.py, you'll notice that its fallback and prefill default encoding is sys.getdefaultencoding()—which is Python 3.X's default for encoding methods, and not locale.getpreferredencoding()—which is Python 3.X's default for open().
This is by design, because the former's UTF-8 setting is the same everywhere. If PyEdit used the latter, default file encodings could vary per platform. The net effect would be that people who work across multiple machines with different locale results (e.g., Unix and Windows) might have to remember where each file was last edited in order to provide an encoding that opens it properly! This is a major downside of 3.X's open() defaults, and one more reason that you should use explicit encodings whenever possible.
That said, if you generally work on just one platform and really want to use the locale module's setting (or any other value) as your PyEdit encoding default, it's easy to do so; the configurations file is just a Python module, after all:
import locale opensEncoding = savesEncoding = locale.getpreferredencoding()
If you use locale and skip the encoding-input dialog, though, please remember that your files' encodings may vary per editing platform. Python 2.X allows the sys module's setting to vary too (it can be changed at start-up, and by dark hackery intentionally omitted here), but Python 3.X, PyEdit's implementation language, makes it more of a constant. For additional coverage of Python 3.X's encoding defaults, see the manuals or the overview in this article.