PP4E: Updates Page
Last revised: February 18, 2018
This page collects notes, updates, and some errata related to the book Programming Python, 4th Edition (PP4E). It does not list every errata reported over the years, because this book's more advanced audience will probably be tolerant of the few inevitable typos along the way in a 1600-page book. Instead, this page mainly serves to collect supplemental notes and materials for readers.
For more book resources, be sure to also see the following external pages, some of which are newer than this page, and continue its mission:
Note that only the last section below is true errata (book corrections), patched in later reprints. The other items here form an informal book "blog" of sorts. If you're looking for a complete corrections list, or find a new issue you wish to report, please see the publisher's errata page for this book; I'm automatically emailed posts made at that page. In recent years, that errata page has also grown to host a few answers to reader questions and general book notes not duplicated here, and should be considered an extension to this page.
This book-related example, Mergeall, consists of a script and GUI that synchronize directory trees, and can provide a manual alternative to cloud storage. Its main script reuses a number of directory-processing examples that appear in the book's systems programming part. This program's coverage includes code and screenshots, but grew too long for inline treatment, and was moved off page.
⇨ Click here to go to the Mergeall page
Update: for more book-related eample code, see also the newer Frigcal calendar GUI example, which uses Python's tkinter GUI library covered extensively in the book, and the even newer programs page, which leads to updated versions of many of the book's major examples.
I've recently begun running book examples on a Linux dual-boot system, under Fedora 20 and Gnome 3 (and later, Ubuntu). This is partly in response to the focus in Windows on clouds, subscriptions, advertising, and devices that are proprietary and seem intentionally crippled, but that's not what this note is about (see the related post).
So far, all the major GUI-based examples work well unchanged, with one minor exception: the script used to launch PyMailGUI after selecting from one of N email accounts contains an unfortunately nonportable and hardcoded Windows path. You may never encounter this; PyMailGUI can be run directly too—and is by the book's demo launchers—and this script is in part coded to work when PYTHONPATH has not been set to include the book's examples root. But to fix the account selector script so it works on Linux too, in this book examples tree file:
.../PP4E/Internet/Email/PyMailGui/altconfigs/launch_PyMailGui.pychange the 2nd-from-last line from the first of the following to the second, in order to pick up the underlying platform's path separator portably:
os.environ['PYTHONPATH'] = r'..\..\..\..\..' # hmm; generalize me os.environ['PYTHONPATH'] = '..%s..%s..%s..%s..' % ((os.path.sep,) * 4) # hmm; generalize meYou may also want to change the last line from "os.system('PyMailGui.py')" to something like "os.system('python3 PyMailGui.py')" in order to force 3.X execution on Linux for the spawned PyMailGUI, but this depends on your system's links and configuration (Python Windows launcher settings don't apply in any event—you'll need to specialize this line's code per sys.platform if needed). There are undoubtedly other Linux portability issues in smaller book examples, especially those in the Systems section; more here as they surface. See also: tkinter Linux portability notes elsewhere on this page.
Short story: due to a temporary regression in Python's email package, you probably should not run the book's PyMailGUI email client on Python 3.3.3. Instead, use any other Python 3.X version—3.1 or 3.2; 3.3.0 through 3.3.2; 3.3.4 or later 3.3; or 3.4.0 or later 3.4.
In 3.3.3 only, Python's email package changed in a way that broke this book's PyMailGUI. The break occurs when replying to or forwarding a message whose main body text contains a non-ASCII character that was encoded per base64 or quoted-printable in the original message. Such email messages worked fine in PyMailGUI from Pythons 3.1 through 3.3.2. In 3.3.3, though, a simple slanted quote or emdash suffices to cause problems; when such characters are present in the body text, PyMailGUI doesn't crash, but the message can't be sent, and the GUI displays an error dialog with text:
Send failed: <class 'UnicodeEncodeError'> 'utf-8' codec can't encode character '\udce2' in position 688: surrogates not allowed
This makes no sense, given that surrogates are supposed to be employed in the email package's new bytes API only—an API which PyMailGUI predates, and does not use in any way (PyMailGUI decodes message fulltext to str text instead). The 3.3.3 email package must be mutating the already-decoded body text, and inserting surrogate Unicode escapes—something it absolutely should not do, and did not do until the 3.3.3 point release.
Timing: because the error occurs on Send in the Message.set_payload() call following the fix_text_required() workaround for an earlier email issue, a change in character-set output encoding logic is the prime suspect. Before this point, the fetched raw text of mails is correct (double-clicks show its original encoded form), as is the result of mail parsing (View, Reply, and Fwd all display correctly decoded text, including any non-ASCII characters). Both failing cases observed were attempting to encode text per UTF8 and base64 on Send. In any event, the next section makes this largely a moot point.
The good news is that this Python regression appears to have been present in just one point release, and was fixed quickly. It has been observed in 3.3.3 only (plus an early 3.4 beta which inherited the issue temporarily). It is not present in 3.3.2, and is fixed as of 3.3.4. Its repair was also propagated to later 3.4 prereleases. Since the latest official 3.3 and 3.4 downloads available at python.org—currently 3.3.4, 3.3.5rc2, and 3.4.0rc2—do not have the problem, its impact should be minimal.
PyMailGUI itself was coded for Pythons 3.1 and 3.2, current at book development time, but is known to work well through 3.4.0, apart from this temporary surrogates issue in 3.3.3. I use this program constantly, but only recently discovered the issue when using a newer 3.3.3 install. It's less than ideal for point releases to break working programs this way, of course, but mistakes happen, and programs like PyMailGUI have to mind the bleeding edge of Python releases more than most; book readers tend to prefer the latest Python either way.
For examples of other PyMailGUI breakages caused by Python email package changes, see the patch for item #3 on this page; it's been a potential source of problems with each new Python release installed. I've also observed some Windows line-break strangeness in recent email package versions (text is sometimes saved as one long line), but this is to be investigated. In the end, this makes for a reasonable lesson in itself: library dependencies are an unavoidable aspect of real-world software development.
Footnote: You can verify the Python version that PyMailGUI is using by clicking Write, entering the following program code in the Write window's main text area, and then clicking its Tools -> Run Code menu option; this is a feature of the PyEdit component, which runs the edited text as program code, and shows its output in the console window where PyMailGUI was launched (don't try that in Outlook...):
import sys print(sys.version)
After using the threaded PyMailGUI on a daily basis for 8 years (more than 3 in its latest 4th Edition form), a new issue cropped up when someone sent an email whose alternative text part contained a Unicode character not supported by the underlying Tk GUI library—character 🙊, which is Unicode codepoint U+1f64a and u'\U0001F64A' in Pythonese, the "Speak-No-Evil Monkey" character (no really; look it up). The Tk GUI system can't handle character codes over 16 bits like this one, and PyMailGUI relies on Tk's rendering prowess to do the right thing for Unicode, as described in the book; see Pages 538-548.
As is, PyMailGUI reports the Tk error message in the console window and doesn't crash per se, but the GUI is partly disabled, because this error is raised and uncaught in a thread-exit callback, thus preventing a thread-busy lock from being released, which in turn disables future Loads, Views, Deletes, and Quits (in fact, Task Manager may be required to close the GUI on Windows).
To do better, fetch this updated ViewWindows.py, and copy it into your book example tree's PP4E\Internet\Email\PyMailGUI folder. It simply catches the Tk library exception, displays a popup and stack trace, and continues, so that thread-busy locks are released. Search for "1.5" in the file for more on the changes; the too-large Unicode character also triggers a Tk exception in other places (e.g., viewing the text part later), but these are already caught and reported with popups, and don't impact thread locks.
Update: for more on this Tk limitation, see its later description in the frigal docs. Also, the limitation in PyMailGUI was eventually lifted in its standalone release available here; to fix, non-BMP Unicode characters are replaced with the Unicode replacement character � for display (until Tk supports more of Unicode, including emojis).
Unrelated room for improvement: additional PyMailGUI changes to support POP over SSL, SMTP over SSL/TLS, and POP servers that limit logins by time (thereby perhaps requiring a single persistent login instead of one login per transaction) are in progress, but are also suggested exercise. Accounts on outlook.com are the motivation for these mods. The first two (SSL/TLS) are now supported in Python's libs, but not yet in PyMailGUI; see Python manuals for usage details. This also eventually found its way into the standalone release.
Update, Oct-1-15: it's now also been verified that this patch and the 1.4 examples release suffice to make the book examples mentioned here work under Python 3.5, per its final 3.5.0 release.
Update, Nov-26-13: it's now been verified that this patch and the 1.4 examples release also suffice to make the book examples described here work under Python 3.4, per its beta releases.
Update, Oct-15-13: There is a new 1.4 release of the book's examples package which incorporates the small Python 3.3 patch described below. Get the new examples release here, and read about its 3.3 (and later) changes here.
Per the description elsewhere on this page, a Python 3.3 standard library change broke some email address displays of non-ASCII names in PyMailGUI, the largest example in the book. In short, the Python 3.3 email package's formataddr utility function now applies a new automatic MIME encoding for names, which it did not in the past—a curious and undocumented incompatible change, which did not account for display-oriented use cases, and can break code that worked well under Pythons 3.0 through 3.2 (including some in this book). Luckily, this is fairly easy to repair.
To apply and use the patch for this Python change, simply fetch the following two files, and copy them to the PP4E\Internet\Email\PyMailGui folder in your book examples tree, per the more detailed instructions in the first of these:
(And yes, this is a module-level example of what's called "monkey patching" today, though applying a new label to an old technique doesn't necessarily make it any more palatable...)
Because it may be a FAQ, this post includes the important bits from a dialog with a reader who was having trouble running the web examples in the preview chapter of the book. In short, on some machines you may need to change the hardcoded port number used in this script to something other than "80", and list it in the URL explicitly (and read ahead in the book itself to the full coverage of this subject later in the book).
> > -----Original Message----- > > To: firstname.lastname@example.org > > Subject: Programming Python 4th Ed > > Date: Wed, 21 Aug 2013 13:49:19 +0100 > > > > Dear Sir > > > > Programming Python 4th Edition. > > I'm stuck on page 53. Example 1-30 runs ok but I can't get example > > 1-31 to reply. When I run the html script it list the contents of > > 1-31, How do I get it to execute 1-31? > > I am using Python 3.3 on Windows 7. > > > -----Original Message----- > From: Mark Lutz [mailto:email@example.com] > Sent: 21 August 2013 17:14 > Subject: Re: Programming Python 4th Ed > > There are too many things that can go wrong in the Web realm to offer advice > based on your email (including but not limited to running the web server > shown a page or two ahead). My advice is to read ahead to the server side > scripting chapter of the book for the full story on the Web/CGI domain. > > > -----Original Message----- > To: 'Mark Lutz'
> Subject: RE: Programming Python 4th Ed > Date: Thu, 22 Aug 2013 11:39:13 +0100 > > Hi Mark > Thanks for your prompt reply. I followed your advice to read further. > When I run Example 1-32 Pg56 webserver.py I get the following output:- > > Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32 > Type "copyright", "credits" or "license()" for more information. > >>> ================================ RESTART================================ > >>> > Traceback (most recent call last): > File "C:\Users\...\Documents\AAAPROJECTS\COMPUTERSCIENCE\PYTHON\PROGPY33\C01\webserver.py", line 15, in > srvrobj = HTTPServer(srvraddr, CGIHTTPRequestHandler) > File "C:\Python33\lib\socketserver.py", line 430, in __init__ > self.server_bind() > File "C:\Python33\lib\http\server.py", line 135, in server_bind > socketserver.TCPServer.server_bind(self) > File "C:\Python33\lib\socketserver.py", line 441, in server_bind > self.socket.bind(self.server_address) > OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions > >>> > > Could you please help me with "socket access permissions" as I am new to web > programming. This is the reason I purchased your book to extend my Python > Programming into web applications. > > I am running on my own Dell desktop as administrator using Windows 7 > Internet Explorer 10. > The webserver script works fine for me on Python 3.3 and Windows7. Running the server in a Command Prompt window: c:\PP4E\Examples\PP4E\Preview> py -3.3 webserver.py 127.0.0.1 - - [22/Aug/2013 09:04:08] code 404, message File not found 127.0.0.1 - - [22/Aug/2013 09:04:08] "GET /favicon.ico HTTP/1.1" 404 - 127.0.0.1 - - [22/Aug/2013 09:04:17] "POST /cgi-bin/cgi101.py HTTP/1.1" 200 - 127.0.0.1 - - [22/Aug/2013 09:04:17] command: C:\Python33\python.exe -u c:\PP4E\Examples\PP4E\Preview\cgi-bin\cgi101.py "" 127.0.0.1 - - [22/Aug/2013 09:04:17] CGI script exited OK And responding to this URL typed in a web browser window: http://localhost/cgi101.html Probably, you cannot run a server on port #80 (the script's default) on your machine, because it is locked down by something else (e.g., virus software?). Try changing the port# in the webserver script, and then name the port# in the URL explicitly: port = 8080 # default http://localhost/, else use http://localhost:xxxx/ c:\PP4E\Examples\PP4E\Preview> py -3.3 webserver.py http://localhost:8080/cgi101.html Or, pass the port# in on the command line to the expanded version of this script that appears later in the book, and run the examples in that later section's directory: c:\PP4E\Examples\PP4E\Internet\Web> py -3.3 webserver.py . 8080 webdir ".", port 8080 ...server log... http://localhost:8080/languages.html This is all explained in detail later in the book in the server-side scripting chapter. It's also mentioned in the preview chapter you're reading; quoting from page 57: """ One pragmatic note here: you may need administrator privileges in order to run a server on the script’s default port 80 on some platforms: either find out how to run this way or try running on a different port. To run this server on a different port, change the port number in the script and name it explicitly in the URL (e.g., http://localhost:8888/). We’ll learn more about this convention later in this book. """ If changing port #s doesn't suffice, I'm afraid there's nothing more I can offer; server setup is widely variable, and may require some supplemental exploration. Best wishes, --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz)
There is a simple patch for an address display issue introduced by a change in Python 3.3's email package, that is included in version 1.4 of the examples package. For details, see the bug, as well as its fix.
I've posted a new release of the book examples package, version 1.3, which has patches for all 6 of the PyMailGUI updates listed in this section below. The first two of these were already patched in release 1.2, but the rest are new in 1.3. Get the change log and the complete new examples zip file at O'Reilly's site, or fetch just the files changed within it in this zip file. See the book for details on running PyMailGUI in the examples package (via auto launchers, command lines, etc.). I don't distribute this program standalone, partly because it uses many other files in the book examples tree, and partly because the book is its documentation.
One admin note: Some of the changes made in version 1.3 of the examples package are too large to find their way into reprints of the book itself, but I recommend using the new version in general, and studying the files changed to see what was involved; it's a fair example of code maintenance in action. For changes too big to merge into the book, versions of the changed source files which mirror the code in the book are retained in the examples package with a "BOOK-" name prefix. For details, see the change log which is also file changes\CHANGES.txt in the examples package.
After using the book's PyMailGUI email client for just over a year, I've collected a list of additional enhancements beyond those already described in the book (see the original enhancements list at the end of PyMailGUI's Chapter 14). I am the entire testing department and user base for this program, so some issues have taken longer to shake out than others. The following is a list of all these additional PyMailGUI enhancements discovered and applied after the book was published, for completeness; their write-ups are located elsewhere on this page:
|1||[Feb-01-11]||Using POP and SMPT timeout parameters (patched in 1.2, and book)||write-up|
|2||[Jan-10-11]||Closing temporary output files for HTML-only emails (patched in 1.2, and book)||write-up|
|3||[Aug-08-11]||Decoding and encoding non-ASCII attachment filenames (patched in 1.3, and book)||write-up|
|4||[Oct-01-11]||Improved sent-time display in list windows (patched in 1.3)||write-up|
|5||[Sep-29-11]||Delete and Save timing issue, rare bug (patched in 1.3)||write-up|
|6||[Jul-29-11]||Using authenticating SMTP servers for sends in mailconfig (patched in 1.3)||write-up|
Interestingly, two of these changes, #1 and #3, are also inherited by the less functional PyMailCGI webmail example of Chapter 16, because they were applied in the common mailtools package. There were a handful of additional changes made in the examples package and their book listings (e.g., a focus fix in the PyEdit component used by PyMailGUI); see the change log as well as the changes' write-ups on this page for more details.
To sample the effect of changes #3 and #4 above, see the following PyMailGUI screenshots:
The support for non-ASCII attachment filenames and local-relative time in these is new in the 1.3 example package, but the rest is original behavior. See the book for more on PyMailGUI's i18n and Unicode support in other headers and mail content.
If a book example which uses the input() built-in seems to be failing, and you are using Python 3.2.0 in a Windows console window, see this post on Learning Python 4E's update pages.
This built-in was apparently broken temporarily in 3.2.0 (3.2) in Windows console mode, but has been fixed in later Python releases. The quickest fix is to upgrade to 3.2.1 or later, or try a different environment; the book examples work fine in all other Pythons and most other contexts such as IDLE. Scripts in both books may be impacted by this regression.
Another cross-post from the Learning Python update pages about a Python change which impacts examples in Programming Python too—see this note for details on Python 3.2's decision to drop support of str strings for the "s" type code in struct.pack.
This impacts a variety of examples in this book. The simplest fix is to manually encode str Unicode strings to bytes byte strings when passing to struct.pack, per the referenced note. You can also run these examples in 3.1 or earlier if that's an option, though newer Pythons are generally better Pythons.
I've started testing the book's examples under Python 3.3, the latest release which features:
Of these, the last 3.3 enhancement listed above will probably have the broadest impact (in fact, it affects every Python 3.3+ user on Windows), and merits a few more words. Those words have grown too large for this page, however, so I've moved them to this separate article:
⇨ The New Windows Launcher in Python 3.3
The very short story on the launcher is that it registers new executables which are installed on your system path normally; attempts to parse "#!" Unix-style lines at the top of scripts to determine which version of Python run; and supports command-line arguments that give Python version numbers. The net effect is to better support multiple Pythons coexisting on the same machine, by allowing Python version numbers to be specified on both a per-file and per-command-line basis, and in both full and partial form. It's quite a useful trick, though not without the pitfalls described below. For much more on the launcher in general, see the link above, or the new appendix on the subject in 2013's Learning Python, 5th Edition.
The book's examples were initially developed on 3.1, but tested successfully on 3.2 alpha before publication. In general, most examples tested so far appear to work well on 3.3 and as shown in the book. As expected, though, the evolution in the 3.X line has impacted some behaviors. Among the most notable 3.3 changes that affect book examples:
The short story in this department is that the new launcher:
With respect to book examples, the first point requires changing "#!/bin/env" to "#!/usr/bin/env" in a dozen examples files, and the second point can be addressed by setting the launcher's default to 3.X, via "set PY_PYTHON=3" (or the equivalent in Control Panel). The Python 3.3 installer also has an optional PATH extension feature, which seems contradictory to the new launcher's goals, but shouldn't cause scripts to fail in general.
As the off-page launcher write-up concludes, the new launcher is net Good Thing, but you need to be aware that it may break some formerly valid scripts with "#!" lines, and may choose a default version you don't expect which causes many scripts to fail initially.
More here on 3.3 in general as testing continues.
Update, May 2013: It looks like the PIL ports at the site described below are now an official fork named Pillow, and are now also available at the PyPI site. Despite the name and location changes, this package is still imported as "PIL", and is fully compatible with PIL for the book's examples, and others. I've also used it successfully to extract EXIF metadata tags from photos in this script (tagpix.py).
Update, July 2012: I've now verified that the "unofficial" PIL ports for Python 3.X described in the prior update do work correctly, at least on Windows under Python 3.2 and 3.3 and for the PIL subset used by the book's examples—tkinter image display, thumbnail generation, and resize operations. Specifically:
Update, May 2012: a quick web search says that there are unofficial PIL installers for Python 3.2 and 3.3, including those here: http://www.lfd.uci.edu/~gohlke/pythonlibs/, though I have yet to test their operation with book examples.
This book uses the PIL (Python Imaging Library) extension for some image-based examples, both to render thumbnail images, and to display additional image file types in tkinter GUIs. Because PIL was not yet ported to 3.X, the book employed a custom installer provided by PIL's creator, and included this installer in its examples package as a temporary measure pending an official 3.X port.
A reader wrote recently to note that the PIL installer in the book's examples package works only under Python 3.1, and not for 3.2. I don't track PIL's progress, but it has much more utility than the book leverages, and I suspect that this has held up the 3.X port (naturally, this is a non-issue for 2.X readers, for which PIL installers are available). Since this is a general issue which other readers have asked about too, the relevant portion of my reply follows:
About a PIL installer for 3.2: an official 3.X PIL port has yet to materialize; it was considered imminent two years ago. The stop-gap installer I was given by PIL's creator and shipped in the book examples package is an executable for 3.1 only, which I unfortunately have no way to update. I recommend contacting PIL's creator, Fredrik Lundh, about this, and/or browsing the archives of and posting your query to PIL's email list to see what may be possible today. Fredrik's last known email address (two years ago): (please search pythonware.com) and the image-sig email list for PIL lives here: http://mail.python.org/mailman/listinfo/image-sig If you get a resolution on this and can spare the time, I'd appreciate a copy on what you find; other readers have run into the same issue. If I'm able to uncover anything myself, I'll follow-up. In the worst case, you can always install 3.1 alongside 3.2 to experiment with PIL examples, or take the examples' code as demonstrative if not runnable.
I posted a note about pickling and bound methods on the clarifications page of the book Learning Python 4th Edition which provides some additional background on how and why bound methods cannot be pickled.
Since that note also pertains to the coverage of pickling in this book—in Chapter 1's quick tour, Chapter 5's multiprocessing module section, and Chapter 17's in-depth database material—I'm posting a cross reference to it here too: read this related note here.
On Pages 279-282, the book discusses in substantial depth how print calls can fail for some Unicode filenames in Python 3.X, and uses a try statement to catch such failures in the tree walker script on page 276. As described in the book, scripts should generally print filenames with care in 3.X if those names might ever be non-ASCII, by either catching print call exceptions or changing the PYTHONIOENCODING environment variable to allow for specific Unicode encodings in the standard streams. For simplicity, though, some of the scripts in the book do not take this advice: they simply print filenames blindly, and assume that you understand the issue given the general description starting at Page 279, and will configure your environment if/as needed.
As an illustrative example, I recently noticed a print failure in the diffall.py directory comparison script on pages 311-313 after I added a few files with non-ASCII names to the examples tree (tests for PyMailGUI enhancements described elsewhere on this page). As is and without environment configurations, the book's version of the script dies with an exception on Windows when printing non-ASCII filenames, even if the output is redirected to a file (which sometimes avoids such errors):
c:\...\PP4E\System\Filetools>diffall.py e: C:\SD-card-xfer-oct2711 > temp Traceback (most recent call last): File "C:\...\PP4E\System\Filetools\diffall.py", line 80, in <module> comparetrees(dir1, dir2, diffs, True) # changes diffs in-place File "C:\...\PP4E\System\Filetools\diffall.py", line 69, in comparetrees comparetrees(path1, path2, diffs, verbose) ... File "C:\...\PP4E\System\Filetools\diffall.py", line 56, in comparetrees if verbose: print(name, 'matches') File "C:\Python31\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: character maps to <undefined>
The end of the temp output file reflects the location of the failure:
-------------------- Comparing e:Books\4E\PP4E\examples-official\1.3\unpacked\PP4E-Examples-1.3\changes\detailed-diffs\1.3\patched-files-13\i18n-filenames-tests to [...] Directory lists are identical Comparing contents Mail-saved-after-sent--OpenMeInGUI.txt matches
Now, as suggested in the book's footnote on Page 282, this script works as is without errors if you simply change your PYTHONIOENCODING environment variable to UTF-8 Unicode encoding for the standard streams. No code changes are needed, and the script winds up printing Unicode text to the output file. On Windows (where you also set this once and for all via the System icon in your Control Panel):
c:\...\PP4E\System\Filetools>set PYTHONIOENCODING=utf-8 c:\...\PP4E\System\Filetools>diffall.py e: C:\SD-card-xfer-oct2711 > temp c:\...\PP4E\System\Filetools>notepad temp
Here is part of the temp output file, at the place where the prints failed before the environment setting:
-------------------- Comparing e:Books\4E\PP4E\examples-official\1.3\unpacked\PP4E-Examples-1.3\changes\detailed-diffs\1.3\patched-files-13\i18n-filenames-tests to [...] Directory lists are identical Comparing contents Mail-saved-after-sent--OpenMeInGUI.txt matches Поворот IMG_1412.txt matches 从~技~术~走~向~管~理.xls matches 金牌销售2天一夜实战训练.xls matches --------------------
Alternatively, it might be a bit more robust and convenient in some scenarios to catch the exception in the script itself and print as raw bytes, instead of printing Unicode text that must be encodable per the print setting. A modified version of this script that does this and works without environment setting changes is available here: diffall-SAFE.py. Here are the parts added and modified in diffall.py:
def tryprint(*args): """ Added Oct-27-11, post publication and post examples 1.3: Don't fail with an exception for unprintable filenames; See pages 279-282, and the similar tryprint on page 276; Started failing for non-ASCII filenames in email test dirs; In general, any filename printers in 3.X may require this, unless PYTHONIOENCODING is set as needed (e.g., to utf-8); """ try: print(*args) # filenames might fail to encode except UnicodeEncodeError: print('--UNPRINTABLE FILE NAME--', *(arg.encode() for arg in args)) def comparetrees(dir1, dir2, diffs, verbose=False): ... if (not bytes1) and (not bytes2): if verbose: tryprint(name, 'matches') ... if bytes1 != bytes2: diffs.append('files differ at %s - %s' % (path1, path2)) tryprint(name, 'DIFFERS') ... for name in missed: diffs.append('files missed at %s - %s: %s' % (dir1, dir2, name)) tryprint(name, 'DIFFERS') if __name__ == '__main__': ... for diff in diffs: tryprint('-', diff)
This changed script still makes some assumptions (it must be able to encode the filename per the platform default, which is UTF-8 on Windows) and may need further honing on some platforms. When run, though, it avoids print failures explicitly; here's what it does with non-ASCII filenames in the printed output in the temp file:
-------------------- Comparing e:Books\4E\PP4E\examples-official\1.3\unpacked\PP4E-Examples-1.3\changes\detailed-diffs\1.3\patched-files-13\i18n-filenames-tests to [...] Directory lists are identical Comparing contents Mail-saved-after-sent--OpenMeInGUI.txt matches --UNPRINTABLE FILE NAME-- b'\xd0\x9f\xd0\xbe\xd0\xb2\xd0\xbe\xd1\x80\xd0\xbe\xd1\x82 IMG_1412.txt' b'matches' --UNPRINTABLE FILE NAME-- b'\xe4\xbb\x8e~\xe6\x8a\x80~\xe6\x9c\xaf~\xe8\xb5\xb0~\xe5\x90\x91~\xe7\xae\xa1~\xe7\x90\x86.xls' b'matches' --UNPRINTABLE FILE NAME-- b'\xe9\x87\x91\xe7\x89\x8c\xe9\x94\x80\xe5\x94\xae2\xe5\xa4\xa9\xe4\xb8\x80\xe5\xa4\x9c\xe5\xae\x9e[...]\x83.xls' b'matches' --------------------
This works, but it many cases it might be simpler and will require much less code to set your PYTHONIOENCODING as needed, rather than trying to safeguard all your filename prints with try statements on the off chance that they may someday fail. Even in the diffall.py case, other modules that this script uses could potentially fail on filename prints too, and may require additional error trapping code unless we use the simpler and broader PYTHONIOENCODING scheme.
I'm not going to mark this for changing in reprints or new example releases, partly because this is a general issue which is already well documented in the book; partly because this is not the only book example that takes a loose approach to printing filenames; and mostly because such scripts will work without changes if you set your PYTHONIOENCODING as needed. In other words, one can make a very strong case that this is more an operational issue than a program bug, and so does not merit a code change. As usual, if your scripts fail on filename prints in Python 3.X, fix as prescribed.
This book-related program recursively walks a web site's links to locate unused files, by parsing the HTML files accessible from one or more root pages, and noting the files of any type they reference. Its goal is to help locate unused files that may be removed. There is a new version 1.1 of this script available in this zipfile: cleansite11.zip. This newer version of the script, cleansite.py, handles parsing of non-ASCII Unicode HTML files better; catches unused local files that have the same name as a remote site's version named in a link; and is a bit more coherent on parameter selection.
The zipfile also has new example run logs, and updated versions of the book's downloadflat_modular.py and uploadflat_modular.py FTP utility examples, updated to skip local and remote directories (now present in my sites). Typical usage: run the download script to copy the site's toplevel; run cleansite to move likely-unused files to a subdirectory; manually inspect and adjust as needed (e.g., restore favicon.ico); and run the upload script to clear the server site and upload the used files.
These scripts were used to purge old, unreferenced files from my websites—some now nearly 2 decades old—by adding pages to ignore until old material was discarded as desired. For the target site, cleansite caught 115 unused files among 317 total, and shaved 3G off its 10G total size. You may need to augment this with multifile searches (see the Grep tool in the book's PyEdit GUI), and this is still a bit user-specific, but serves its purpose for my use cases.
As a sort of reward for stumbling onto this page, I've uploaded an extra example script here which would have appeared with the book's HTML parsing coverage in Chapter 19, had this book project enjoyed unlimited time and space. This script uses Python's HTML and URL parsing tools to try to isolate all the unused files in a web site's directory.
I use this script for my training web site, as well as my book support site (the latter after fixing some HTML errors that rendered the script inaccurate when Python's strict HTML parser failed and caused some used files to be missed). This script also includes code to delete the unused files from a remote site by FTP if you wish to enable it (pending a resolution on the parser failures issue), and includes suggestions for parsing with pattern matching instead of the HTML parser.
Download: The script itself lives here: cleansite.py. To see what it does, read its docstring, and see two sample runs provided in a zip file here: testruns.zip. See also the newer version of this script in the update note above.
In light of security constraints in a recent class, I've completely rewritten the PyLotto script for worst-case scenarios. It can now select from both emails, and a names file created manually or via web form submits, and can be run in both console and remote CGI modes. In pathological cases, it can be run locally to select from a local names file—the need for which was underscored by a recent class somewhere in the wilds of California. I've also updated to port the single script to work on both Python 3.X and 2.X, and to properly escape student names in the reply HTML. Here's the new code—use the "view source" option to view the form's HTML:
The rest of this section describes the original, now defunct version, but also gives the back story. The new version has the same goals, but supports web form and local file sign-up modes in addition to emails.
Here's another supplemental example that might have appeared in the book, if not for time and its conceptual dependencies. I wrote this script, PyLotto, in order to give away free books in some of the classes I teach. O'Reilly always sends a batch of free copies to authors, and if I kept a dozen copies of every one of the 12 Python books I've written, some of which are not exactly small, I'd probably need a bigger house.
To enter the book lottery, students send an email message to the book's account, with "PYLOTTO" in the subject line. At the end of the class, the script scans, parses, and deletes these emails, and selects a set of their "From" addresses at random as winners. It's not Vegas, but it's fair, and serves as a nice example of practical Python programming that ties together a number of tools presented in the book. This script also has a test mode that sends test emails, and an as-CGI mode for running on a Web server if the training site doesn't allow POP email access or SSH (many don't).
Download: Fetch the script here: pylotto-orig.py. To see what it does, read its docstring, and see the text file that traces its outputs in its various modes here: pylotto-orig-run.txt.
New: See also pylotto-orig-24.py, a version of pylotto.py modified to run remotely as a CGI script on a Python 2.4 web server (that's the latest Python available on godaddy.com, as of January 2011!).
Naturally, I don't give away books in every class I teach (this basically depends on how many freebies O'Reilly has given to me, and how willing I am to lug around a big, giant book in my checked luggage). Even so, scripts such as this one and others in the book which address real, practical needs can go far to help illustrate Python applications in action once students or readers have mastered Python language fundamentals. As stated in the book, Python tends to become a sort of enabling technology for most people who've learned to use it well.
I've written a new and very different version of this script to be used to isolate differences in iTunes collections on different laptops or archives, rather than just flatten directories and detect protected files. Grab the new 2.X/3.X version here: flatten-itunes-2.py.
I wrote this version to help resolve differences between iTunes collections on multiple machines that have fallen out of sync over time. It's easy to get in this state if you buy songs on whatever laptop you have at the moment and don't synchronize religiously (cloud storage addresses this in theory, though not without potential downsides of its own). Because iTunes may accumulate different files and directory structures on different machines, it's nearly impossible to synchronize unless you normalize its files into a simpler, uniform structure.
This new version addresses this by sorting all files in the iTunes directory tree into 4 flat directories (playable, protected, irrelevant, and other), for each collection it's run against. This makes it simpler both to run later comparisons to spot differences (e.g., using the book's dirdiff.py or more in-depth diffall.py scripts of Chapter 6), and to copy merged collections from device to device. This version also retains all files in the tree; renames duplicates with a numeric suffix; produces a richer report; and was updated to run on both Python 2.X and 3.X. It's safe to run against an iTunes location just to experiment, because it only copies files from there, and does not modify the iTunes tree itself in any way.
After collecting files into flat directories with this script, I later merged the playable files directories with manual drag-and-drop operations, and also ran a simpler script, renamer.py, on the result to strip the leading track/disk numbers at the front of some filenames, making it easier to compare and sort, and isolate more duplicates (e.g., "02 xxx.mp3" and "xxx.mp3"). The end result is a single, flat directory of song files to use on multiple devices.
Note: this is only an iTunes utility, for analyzing your collections' content or moving your music files. It's not a player or iTunes replacement, and although its result directories retain all iTunes music files, they may not retain all iTunes information. For example, due to the way the script flattens music directory trees, its result directories may lose some associations between music files and their album artwork images, at least for images not embedded in music files themselves (e.g., via ID3v2 frames for MP3 files). Writing a full replacement for iTunes in Python (PyTunes?) would be an interesting project, and others have already made progress along these lines—check out:
Here's something similarly practical, but a bit simpler than the prior two sections' programs—a Python script which walks all the folders and subfolders in an iTunes directory tree, to copy all music files in the tree to a flat directory. I use this to create a single directory of all my music on a memory stick, so it can be used conveniently on the harddrive in a vehicle I drive. iTunes seems fond of nested directories, and the vehicle in question doesn't do well in their presence. This example might have appeared in the larger systems examples chapter, if not for time, space, and the fact that it's not too much different from tree-walkers already in that chapter.
Download: Fetch the script here: flatten-itunes.py. To see what it does, read its code and docstring, and see the text file that traces its outputs here: flatten-itunes.out.txt.
For another directory-walker media tool, see also tagpix.py, which extracts EXIF metadata tags from photos with PIL, and is referenced elsewhere on this page.
As described in its Preface, this book was written under Python 3.1, and its major examples were retested and verified to work under Python 3.2 alpha just before publication. Because of that, this book is technically based on both 3.1 and 3.2, though it addresses the entire 3.X line in general.
That said, you will find some discussion of 3.1 library issues in the book that have changed or improved in the upcoming 3.2 version, which is due to be released roughly two months after this book's release date (3.2 final is currently scheduled for mid-February 2011). Some of the issues in 3.1's email package which the book must workaround, for instance, have been improved or repaired in 3.2.
In fact, many or most of the issues of the 3.1 email package described in Chapter 13 are fixed in 3.2. The email workarounds coded in that chapter still work under 3.2 (and were verified and even enhanced to do so before publication), but some are no longer required with 3.2. Notably, the email package in 3.2 now supports parsing the raw bytes returned by the SMTP module, thereby eliminating the need for the partially heuristic and potentially error prone pre-parse decoding to str that the book's 3.1-based examples must perform. The next section explains how this works in 3.2.
As a prominent example of email's improvements, 3.2's What's New document states that the 3.2 email package's "New functions message_from_bytes() and message_from_binary_file(), and new classes BytesFeedParser and BytesParser allow binary message data to be parsed into model objects". Interestingly, the 3.2 email parser still does not parse bytes internally. Instead, these extensions work their magic by decoding raw binary bytes data to Unicode str text prior to parsing, using the ASCII encoding and passing "surrogateescape" for the decoding call's errors flag.
In short, the surrogateescape error replacement scheme translates undecodable bytes to Unicode codepoint escape sequences, which allow the bytes' original values to be recovered when the text is encoded back again to bytes by compatible software. When parsed message parts are later fetched through the Message API, re-encoding back to binary form with the same errors replacement scheme is expected to restore the original data. At least potentially, this arguably clever trick could resolve the initial decode-to-str issue for parsing email messages in Chapter 13.
On the downside, because this scheme assumes that message data is both decoded to Unicode text and re-encoded to bytes later using the surrogateescape error handler for both steps, this trick works for data passed through Python's APIs which follow this translation protocol, but can fail for data which is not. Moreover, this scheme also assumes that any data mangled by the surrogates replacement step is not significant to the parser's analysis, as it might not match expected characters in the stream—a non-issue for binary data or encoded text parts, but potentially significant for some forms of full-message raw text (though non-ASCII bytes are unlikely to mean much to a message parser in any form).
Also note that while this change may be a first step towards addressing the related CGI uploads issue described in Chapters 15 and 16, this issue still exists in Python 3.2. As described in the book, CGI uploads are somewhat broken in 3.X today because Python's CGI module uses the email parser, but its uploaded data can be arbitrary combinations of both binary data and text of a variety of Unicode encodings, with or without MIME encodings and content type headers. Such data cannot be decoded to str in 3.1 as required by its email parser. Unfortunately, the CGI module in Python 3.2 still uses the str-based email parsing API, not the new bytes-based API, so this CGI uploads limitation appears to still be present in 3.2. I verified that this is the case in 3.2 final: cgi does not use the email's new bytes parser interface, but still performs a pre-parse decoding from bytes to str per UTF-8, which may fail for some data streams. A resolution to this appears to await a future Python.
For email, though, 3.2's library fixes represent a significant improvement over 3.1: the decode-to-str preparse issue for email, as well as other Chapter 13 email package workarounds, may have been rendered superfluous in 3.2. On the other hand, the book's 3.1 workarounds code is harmless under 3.2, and is representative of the sorts of dilemmas faced by real-world development in general—a major theme of this book. Unless you're lucky enough to use the same version of software for all time, change is probably an inevitable part of your job description.
For more on the Python 3.2 release, including its new __pycache__ subdirectory bytecode storage model, please see its note on this site in the Learning Python 4E updates page (a book less impacted by 3.2, since 3.2 was supposed to change only libraries, not core language—and nearly succeeded).
Not covered in that note is the very late 3.2 addition of its concurrent.futures library. This library, based upon a Java package, provides yet another way to generalize the notion of multitasking with threads and processes, in addition to the existing subprocess and multiprocessing modules which are covered in this book. This new library is also a bit of a work in progress, intended for future expansion. For more details, please see 3.2 release details and manuals.
While you're at the Learning Python site, see also its preview of mid-2012's expected Python 3.3.
As an afterthought, note that the 3.2 email changes bear specifically on the Chapter 13 discussion of Unicode and email that starts on page 926. Most notably, the preparse decode from raw bytes to Unicode str needed in 3.1 is no longer required (but is harmless) in 3.2, because the email package can now parse bytes data directly, using the errors replacement scheme described above. I'd mark this as an insert for future reprints, but the book can't possibly track all future Python changes (especially with a new and possibly incompatible email package under development); instead, this web page is meant to serve as a virtual and more easily updated appendix to the book.
(Update Oct-19-11: I eventually fixed this one after all: see the release description above for more on version 1.3 of the examples package in which this fix appears. This fix was too large to add to the book itself.)
As originally coded, message sent time is not displayed very usefully in PyMailGUI's list windows. The GUI blindly displays the full, raw text of the Date header field. Worse, the time portion of this header is truncated by the display such that the "+NNNN" field which denotes what the sent time really is relative to GMT is not shown. The net effect is that you can't tell in the GUI when a message was sent or which messages were sent before others without looking at the raw text and deciphering the Date time string manually. The GUI lists emails in order received at the mail server only.
To fix: the time in the Date field of list windows should be shown in full, and be shown relative to either the local time zone or GMT uniformly for all emails received. See the Python email package for pointers and possible tools; this doesn't seem crucial enough to detail the code fix here. As a hint, though, formatting dates for use in new mails can be either relative to GMT or the local time zone:
>>> from email.utils import formatdate # in Python 3.2 >>> formatdate() 'Thu, 29 Sep 2011 17:27:53 -0000' # relative to gmt >>> formatdate(localtime=True) 'Thu, 29 Sep 2011 13:27:55 -0400' # relative to local (us eastern, -4 hours) >>> formatdate(usegmt=True) 'Thu, 29 Sep 2011 17:27:59 GMT' # explicit gmt relative for http
Applying the corresponding technique for adjusting a received date/time string to the local time zone in PyMailGUI's code is officially delegated to suggested exercise, but the following might help. To convert a GMT-based date/time string to a date/time string in the local US Eastern time zone, try this (it's 5:45 PM GMT and 1:45 PM locally):
>>> from email.utils import formatdate >>> from email._parseaddr import parsedate_tz, mktime_tz >>> now = formatdate() # gmt-based => local (eastern) >>> now 'Thu, 29 Sep 2011 17:45:26 -0000' >>> parsedate_tz(now) # time string => time tuple (2011, 9, 29, 17, 45, 26, 0, 1, -1, 0) >>> mktime_tz(parsedate_tz(now)) # time tuple => to utc timestamp 1317318326.0 >>> formatdate(mktime_tz(parsedate_tz(now))) # utc timestamp => time string 'Thu, 29 Sep 2011 17:45:26 -0000' >>> formatdate(mktime_tz(parsedate_tz(now)), localtime=True) 'Thu, 29 Sep 2011 13:45:26 -0400'
Using this scheme to convert from local to GMT, or local to local could proceed as follows:
>>> here = formatdate(localtime=True) # eastern => gmt or local (eastern) >>> here 'Thu, 29 Sep 2011 13:45:47 -0400' >>> formatdate(mktime_tz(parsedate_tz(here))) 'Thu, 29 Sep 2011 17:45:47 -0000' >>> formatdate(mktime_tz(parsedate_tz(here)), localtime=True) 'Thu, 29 Sep 2011 13:45:47 -0400'
And finally, converting a date/time string from the US Pacific time zone to either GMT or the local US Eastern time zone might be done this way (it's now 10:46 AM Pacific, 5:46 PM GMT, and 1:46 Eastern/local):
>>> there = 'Thu, 29 Sep 2011 10:46:32 -0700' # pacific => gmt or local (eastern) >>> formatdate(mktime_tz(parsedate_tz(there))) 'Thu, 29 Sep 2011 17:46:32 -0000' >>> formatdate(mktime_tz(parsedate_tz(there)), localtime=True) 'Thu, 29 Sep 2011 13:46:32 -0400'
(Update Oct-19-11: I wound up patching (and fixing) this in version 1.3 of the examples package, to prohibit delete overlaps via the code changes described below; see the release description above. There appeared to also be a similar potential for timing issues for Saves due to their file selection dialog, which was also patched to check for blocking state before the dialog instead of after, though this seemed much less likely or harmful. These fixes were too large to add to the book itself.)
The short story: though unlikely, it's possible that deletes may delete the wrong message if you request a new delete while one is already running. Don't do this, or read ahead to see how to fix this potential issue in code.
Here's the longer story. There is an obscure timing issue related to delete operations. As coded in the book, it's not impossible that a deletion thread's exit action may be allowed to run and clear the delete-in-progress flag, before a new delete request has a chance to check this flag. This can occur apparently because the new delete's confirmation dialog popup releases control to the GUI event stream, which can then run a prior delete's exit action from an after() timer callback event. Unfortunately, the new delete issues the confirmation dialog before checking the delete-in-progress flag, and after fetching message numbers to be deleted from the GUI.
The net result is that a new delete might overlap with one in progress, and incorrectly delete the wrong POP message numbers made invalid in the GUI by the prior delete—a scenario the book and code both explicitly state must be avoided. Note that the system does go to great lengths to compare mail headers so as to ensure that each message being deleted in the GUI matches the message being deleted on the server once deletions begin, in case the server's inbox changes before a selected message is deleted (see method deleteMessagesSafely in mailFetcher.py). That doesn't help in this scenario, though, because the GUI client's mail list has been updated after selected message numbers were fetched, such that the prior selections no longer match the GUI or the server.
This behavior is timing dependent, rare, and can occur only if you issue a new delete request while one is already in progress, and then only if you're unlucky enough to have the prior request's exit action run exactly after the time you press Delete for the new request and before you're able to click "OK" in the delete confirmation popup. However, this is also a classic and even illustrative timing bug; it reflects both the lack of broader testing for a book's examples, and a misconception of the confirmation dialog's modality—the program assumed this dialog was truly modal (blocking), such that all the code from the start of a new delete callback handler through its delete-in-progress test ran atomically. This must not be case, as I've seen a few incorrect mails deleted when running many deletes in parallel.
To avoid this potential entirely, don't run overlapping delete requests. To fix the code to avoid it in all cases, run the delete-in-progress test immediately in the delete callback handler, and before the confirmation dialog is issued. Because the delete-in-progress test logic differs between the server and file list windows, this can be done by either moving the confirmation dialog call into each subclass's code, or adding a subclass-specific okay-to-delete method called from the superclass delete callback code. For more details, see onDeleteMail in the ListWindow class on Page 1074, as well as its two subclass's doDelete methods on Pages 1079 and 1083. I may patch this in the examples package eventually, and in a book reprint if possible; for now, consider it a maintenance exercise, as well as a lesson on both the need for rigorous testing and the complexity of code that may overlap in time.
(Update Oct-19-11: this was patched in release 1.3 of the examples package: see the release description which includes screenshots of the fix, and the formal patch description below. This fix was small enough that it will appear in the book itself in a future reprint. Note that because the fix was applied in common mailtools package code, it is inherited by PyMailGUI, and also by the less functional PyMailCGI webmail example in Chapter 16, though for the latter you may need to set your browser's encoding to UTF-8 to view the non-ASCII filenames embedded in the HTML reply stream.)
After presenting the PyMailGUI client, Chapter 14 discusses a variety of suggested improvements to this system. Among them, its Unicode enhancements section on Page 1123 mentions that the filenames of attached parts might also be in i18N encoded form in some rare cases, and require the same MIME and Unicode decoding steps that are already applied to other primary email headers such as Subject, From, and To. In PyMailGUI the latter of these are properly decoded for display and encoded for sends, but attachment filenames are currently not.
Encoded attachment filenames weren't present in the test cases used to develop this version of PyMailGUI, so they received only a brief mention in the improvements list. Moreover, some minor Unicode issues were intentionally given limited attention, partly because this book's size and time constraints limit its scope, but also because this edition uses the Python 3.1 email package which has well-known Unicode issues and limitations described in the text.
Lately, however, I've noticed that encoded filenames are becoming more common. This doesn't seem like a great feature of email in general—Unicode filenames that are encoded in a way not supported by the receiving platform's filesystem won't work and would have to be renamed automatically (e.g., sending a Russian or Chinese filename may fail when saved on an ASCII-only filesystem). Because of the increasing prevalence of such emails, though, I want to elaborate on ways to address them here.
As is, the only GUI-based way to handle parts with encoded filenames in the client are to save the full enclosing email in a mail save file (list window: select, and Save); edit the mail's text in its save file to rename the attachment file name in its mail header line; and then reopen the mail from its save file (list window: Open save file, select in popup file list window, and View). This works, but obviously isn't a very user-friendly procedure.
To do better, it might be simple to augment the partName method in Chapter 13's MailParser class to route the raw filename fetched from headers, to the decodeHeader header text decoding method already present in this class. This would apply the required email, MIME, and Unicode decodings to such filenames, and yield a decoded Unicode filename string. Since the result might use a character set that doesn't work on the underlying platform's filesystem, though, this code would also need to try to encode it per the local platform's filesystem encoding type, and come up with a different name if the encode fails; it could follow the same naming pattern used for attachments that don't have a filename or name header present ("partNNN.xxx").
For the adventurous, the MailParser class, used by PyMailGUI and other clients, appears on Page 976; its partName method on Page 978; and the required decodeHeader method on Page 980. The platform's filesystem encoding is described elsewhere in the book. It's too late to add this enhancement in the book, of course, but changing this would make a nice exercise in code maintenance (or see the next section, unless you want to try this on your own!).
To help you get started, it appears that the first part of the coding fix, decoding i18n filenames (but not also testing for their correctness on the receiving platform), is simply a matter of changing the very last line of the partName method in the MailParser class from the first of the following to the second:
return (filename, contype) return (self.decodeHeader(filename), contype) # aug 2011: decode fname
At least per minimal testing so far, this does the trick—Chinese and Russian encoded attachment filenames are properly decoded, just like other encoded text in primary email headers. These decoded filenames also happen to work unchanged on the Windows operating system (per UTF-8) in their decoded forms, though they may not work on some platforms, and their original i18n undecoded string forms always fail as filenames on Windows too and generate error popups in the GUI.
For instance, a Russian email's jpeg image attachment whose i18n filename was given in its part headers with a non-ASCII character set ("KOI8-R"), base64 translation ("B"), and the encoded text ("8M/..."):
Content-Type: image/jpeg; name="=?KOI8-R?B?8M/Xz9LP1CBJTUdfMTQxMi5KUEc=?=" Content-Disposition: attachment; filename="=?KOI8-R?B?8M/Xz9LP1CBJTUdfMTQxMi5KUA==?= =?KOI8-R?B?Rw==?=" Content-Transfer-Encoding: base64with the patch is now correctly decoded to filename: Поворот IMG_1412.JPG. The original encoded filename text in the headers does not work as is. Similarly, a Chinese spam email attachment's headers using UTF-8, base64 encoded values:
Content-Type: application/vnd.ms-excel; name="=?utf-8?B?6YeR54mM6ZSA5ZSuMuWkqeS4gOWknOWunuaImOiuree7gy54bHM=?=" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="=?utf-8?B?6YeR54mM6ZSA5ZSuMuWkqeS4gOWknOWunuaImOiuree7gy54bHM=?="now decodes to filename 金牌销售2天一夜实战训练.xls (which will look right in this web page if your browser understands UTF-8 text). With this patch, these decoded filenames now appear correctly in the GUI's part buttons and its Parts list pop-up, and when saved to files and opened in other applications. Non-encoded filenames pass though the decoder unchanged as expected, so this should not break any cases that worked previously.
Because this is a simple change that doesn't alter line numbers, and because encoded filenames are becoming more common, I may mark this as a future reprints update, but be sure to patch your code copy this way if it does not include the enhancement and you start seeing encoded filenames in emails which you care to view.
Also keep in mind that the prior section's fix is partial: it still does not address filenames which won't work on the underlying platform's file system and may need to be renamed. Perhaps just as glaring, the fix handles decoding for display of fetched emails only; there is no special logic for encoding non-ASCII attachment filenames per i18n standards for sends—something which might have to be very similarly addressed near the end of the MailSender class's addAttachments method by passing filenames through that class's encodeHeader method, which modifies non-ASCII text only. As is, on sends non-ASCII attachment filenames cause smtplib to raise ASCII encoding exceptions for the full mail text in which the filename is embedded. Given that such encoding is not supported by Python's email package automatically, the change may be as simple as this—on Page 964:
basename = os.path.basename(filename) basename = self.encodeHeader(os.path.basename(filename)) # aug 2011This should suffice: filenames that encode as ASCII are left intact and others are encoded per the UTF-8 default, though it would ideally also apply the mailconfig module's headersEncodeTo setting if present. But I'll leave this in the to-be-tested column; I don't send such emails, and doing so seems a bit dubious in any event on the portability grounds mentioned above. In other words, there is still plenty of room for improvement. Enhance as desired.
Ideally, Python's email package would decode filename header contents automatically and provide an alternative call for the rare cases when the raw text is required, and encode filenames for sends automatically in non-ASCII cases. Unfortunately, a future version of the email package may either decode and encode filenames this way or not, so it's impossible to predict the optimal resolution to this in future Pythons (as stated in the book, this was a primary reason some email issues were not addressed as completely as they might have been). As usual for software dependent on external libraries, be sure to watch for changes on this front.
As stated in the book, you may need to change your SMTP mail server configurations to use the book's email clients to send email directly. This is especially true when you try to send mail by SMTP on public networks. Some of this is ISP-specific, but here are a few pointers.
This recently resurfaced for me because my email sends stopped working after my broadband provider changed their network. In short, they locked down the standard SMTP send port 25 in order to prevent spam, but this prevented direct sends using the server configurations I had been using. In the mailconfig.py file used by the PyMailGUI example, my SMTP server configurations were originally the following—a simple non-authenticating server at one of my third-party ISPs (Godaddy), which worked fine on my home network and some others:
smtpuser = None # per your ISP, None = no login smtppasswdfile = '' # if login, set to '' to be asked smtpservername = 'smtpout.secureserver.net' # if port 25 open for SMTP on networkThis no longer worked after the SMTP port lockdown. To send emails, I had to change my SMTP server details to use an authenticating SMTP server at another third-party ISP, running on a non-standard port number (this server runs on Earthlink, which is different from my broadband provider):
smtpuser = 'firstname.lastname@example.org' # login, authenticate smtppasswdfile = '' # ask for password in GUI smtpservername = 'smtpauth.hosting.earthlink.net:587' # 3rd party ISP, port 587When so configured, I have to login to my email account at this ISP on the first email send in a session (PyMailGUI pops up a prompt for password input), but Python's smtplib module automatically parses off and converses over the custom SMTP port number included at the end of the server name string. This all just works in both PyMailGUI and smtplib, with no code changes required.
With these authenticated SMTP settings, email sends seem to work on many more networks than before, and I avoid having to resort to webmail with all its annoying advertising. As an alternative, I could have routed email sends through the broadband provider's SMTP server directly, but this scheme can sometimes be tagged as spam, and may require a direct connection to the provider's network which may not always be possible:
smtpuser = None # per your ISP smtppasswdfile = '' # set to '' to be asked smtpservername = 'mail.mailmt.com' # your isp or local network's serverSee your ISP for more on your server settings, and the comments in mailconfig.py and the book for more on client configuration settings.
Postscript: I recently had to change servers yet again to use a broadband provider account when Earthlink's SMTP server started timing out—the following works in all contexts now for me, but your mileage will certainly vary (and possibly, very often!):
smtpuser = 'myloginname' # nonblank=authenticated smtppasswdfile = '' # ''=ask in GUI once smtpservername = 'smtp.comcast.net:587'
(Update Oct-19-11: per the release description above, this was patched in release 1.2 of the examples package, as well as in reprints of the book itself. See the formal patch description ahead.)
I recently spotted something unusual in Chapter 14's PyMailGUI, in module ListWindows.py, method PyMailCommon.contViewFmt (after about a year, you get to be your own code reviewer!). This method's latter part opens an HTML-only email in a web browser after displaying its extracted plain text in a PyEdit frame, and has worked well for my email ever since it was coded. To be robust and explicit, though, it should probably run a tmp.close() after its tmp.write(asbytes), to ensure that the output file's buffers are flushed to disk before the browser opens it.
This isn't a bug and the code works as is, but apparently only because of fortunate timing: the browser started by the webbrowser module doesn't get around to opening the temporary file until well after this method exits, which deletes its local variables, thereby reclaiming the temporary file object and automatically closing and hence flushing it in the process. This works, but seems too implicit in retrospect, and may not be the best coding pattern to emulate in any event. In general, you should run an output file's close or flush methods (or use the with statement or unbuffered open modes) to flush output buffers to disk if you expect to be able to read the file in the same program.
See Chapter 4 for much more on output file closes (including rules of thumb which I failed to heed in this context), especially pages 137-141 and 145. The prior edition strung the open and write calls together: open(tempname, 'w').write(content), which, though still somewhat implicit, made the automatic close of the temporary file on collection more apparent and immediate, and not as dependent on timing.
After seeing this issue manifest itself for very small HTML-only emails on a much faster machine running a different operating system, I'm reclassifying this as a correction to be patched in reprints and next example package version (1.2); please see the patch below. When and where this problem occurs, an HTML-only email may open as a blank web page, because its temporary file has not been flushed to disk. Oddly, on the faster machine, the web browser somehow opens and reads the temporary file before the GUI method has a chance to close it on exit. Moreover, this only happens for very small emails (< 10K), suggesting a buffer size role. Adding an explicit close() ensures proper behavior for all platforms and emails.
This issue is platform- and timing-specific; impacts just one minor aspect of a very large program; and, even when it does, can always be worked around in the user interface in three different ways—by pressing the web browser's refresh/reload, by viewing the extracted plain-text in the GUI's main window, and by clicking the sole HTML's part button in the main GUI window to open it on demand. Still, it's annoying and simple enough to merit a patch.
For detail-minded readers, the faster machine with the incorrect behavior was a multicore Windows Vista machine; the book test machine where the code works as is was a single core Atom Windows 7 machine (a beefy netbook). Since the Python method in question returns immediately after this code, on Vista a web browser apparently starts and opens a local web page faster than a Python method function can exit. This may reflect differences in the implementations of start commands in the two systems. I'd be surprised if this occurred on other platforms which spawn processes to open browsers; on the other hand, unpredictable timing has a way of being unpredictable.
As described in the book, there are other issues related handling of HTML-only emails which I've left up to an interested reader to address. As is, display of such emails employs a somewhat temporary scheme in lieu of an HTML-enabled text viewer, which was tested and used by just a single user on a single platform. For example, its single temporary file model means only the last such email viewed is ever stored at any one point in time, regardless of how many are opened. Such oddness does not occur for HTML parts opened on demand, because their files are resaved and closed correctly in mailtools each time an open is requested. As for much of this system, improve as desired.
(Update Oct-19-11: per the release description, this was finally patched in release 1.2 of the examples package, as well as in reprints of the book itself. See the formal patch description ahead.)
The email-based code in Chapters 13 (and its clients in Chapters 14 and 16) should probably pass timeout arguments to the poplib and smtplib connection calls, to avoid waiting indefinitely. I've recently seen the book's PyMailGUI desktop client get stuck waiting for a poplib connection call that never returns, for example. This is rare, but without a timeout argument, your only recourse is to wait seemingly forever, or kill and restart the GUI. See Python's library manuals for more about passing timeout arguments in these modules.
I've added the timeout arguments as patches to be made in the next reprint and examples package version (1.2): see the patch below. This became a priority when I started seeing the email server at my ISP suddenly failing to respond to connect requests on a very regular basis; probably a temporary problem at the ISP, but killing and restarting the email client's process manually was much more painful than patching to use timeouts for server connections.
Please Note: If you wish to use PyMailGUI for real email work, you may need to increase its email transfer timeout values to accommodate slower Internet access speeds. As patched and shipped in the 1.2 examples package, both POP and SMTP timeouts are set to 20 seconds, which suffices in most cases. Especially for sending large emails over slow connections or slow servers, though, a higher SMTP timeout value such as 60 or more seconds may be required. See the patch ahead for the location of this value in the example code.
Discussion: This is a surprisingly subtle issue, not covered in the book. POP and SMTP timeout values were added after the book's publication to avoid transfer threads hanging indefinitely—a scenario which may require manually killing the email client's process in worst cases. For instance, failure to contact the server on mail Load requests can render the GUI largely inoperative (loads preclude most other operations, including Quit). This is despite the fact that the load is run in a thread; the GUI itself remains active, but cannot perform most server-based email processing in this state.
Passing timeouts when connected to email servers avoids this by triggering exceptions when server transactions hang, thereby terminating the server transfer thread and producing an error pop-up in the GUI. However, Python's library currently applies these timeouts to every server interaction step performed on the socket created for an email transfer: not just the initial connection calls, but also later data sends and receives. This makes the timeout setting sensitive to the speeds of email servers, the speed of your own Internet connection on the client, and message size in general. This is true for both fetches and sends (POP and SMTP), though it is more crucial for sends, which transfer a message's text all at once, than for fetches, which read messages line by line.
Unfortunately, this seems a bit of a Catch-22—larger timeout values allow larger mails to be transferred on slow connections, but also mean that email transfer threads will be hung longer when servers are truly inactive. Moreover, the timeouts can't generally be omitted altogether, or the transfer threads may hang interminably—as mentioned, in worst cases this may block other user operations in the GUI, and require the email client's process to be killed manually when email servers become unresponsive (this was the original motivation for the timeout patch). Really, Python's email library modules should probably support different timeout settings for different operations; one for all doesn't quite make sense—sending a large email requires very different timeout treatment than initial connection—but that's the API that exists today.
Although it's possible to implement more custom transfer timeouts manually instead of relying on the existing library module support (e.g., add top-level timer code around initial connect calls only), this would require too many post-publication code changes, and would not suffice if later operations hang. In principle, it's also possible to selectively enable and disable timeouts for the socket embedded in and used by the smtplib's object (e.g., disabling them with server.sock.settimeout(None) for sends), but this is less than ideal from a software perspective, as it makes smtplib module clients too dependent on the module's internals: the module's API is intended to encapsulate and hide the underlying socket object used. This would also fail to address servers which become unresponsive during sends.
For sends, timeouts also trigger what looks like a bug in Python 3.1's standard library—the mail send appears to fail after the mail's text has been fully sent and while trying to read the server's truncated reply to it, but only because the socket sendall() call 4 levels down from the book's code seems to simply stop sending data and truncate it when the timeout expires, without correctly raising an exception to signal this error (on a Windows Vista client, talking to my ISP's server, at least). This leads later to a confusing SMTPServerDisconnected exception with the error text "Connection unexpectedly closed" reported in the GUI and the console, even though the true cause was the earlier data send's timeout. (This is complex: I don't have time to explore it fully, or space to explain it here; for the full story, you'll have to trace through Python's source-code that raises this error, including its socket module's C code). Increasing the timeout allows sendall() to finish sending the data without truncation, and is required in some contexts in any event, but it is also effectively a workaround for this erroneous behavior in Python itself.
Summary: Because of all this, in one of my own contexts, I had to bump up the SMTP timeout to 60 seconds to allow for sending a large email on a slow network; otherwise, the send failed with an exception and an error pop-up. Change likewise if and as needed for your context. Ideally, the mail server timeouts would be configurable in the mailconfig module instead of in executable source code files, but they were added after publication when larger-scale changes in book code listings were no longer possible (examples in books are ultimately meant to be demonstrative, and don't enjoy, and may not even warrant, the level of update flexibility common to software at large). Also ideally, Python's email library modules' APIs would support different timeout settings for different operations as mentioned above, but this will have to await a core developer's attention. For better and worse, software dependencies make your software, well, dependent.
(Update Feb-2018: PyEdit has undergone major development in recent years, which addressed usability issues and limitations far in excess of those listed here. To get the standalone PyEdit release with all the new enhancements, visit its webpage. It's available as standalone executables, plus a source-code package that's suggested follow-up study (if far too large to include in a book). While you're browsing, see also the standalone release of PyMailGUI and other programs, which have similarly evolved since their book appearances.)
(Update Oct-19-11: per the release description above, one of the items here (focus loss) was patched in release 1.2 of the examples package, as well as in reprints of the book itself. See the formal patch description ahead.)
Chapter 11's PyEdit text editor is a large program with lots of functionality, which I use nearly every day in one context or another. Besides text edits, it's also a key component in the book's PyMailGUI email client. Still, like most non-trivial user interfaces, there are plenty of ways its interaction might be customized or improved, depending on its users' preferences. After using it recently with the more critical eye afforded by the passage of time, five items seem prime candidates for improvement; all are minor nits and not bugs, but would be simple to improve:
Update Feb-1-11: This is trivial to improve, so I'm posting this to be patched in book reprints and the next book example package release (1.2). Please see the patch details below.
Because PyEdit is a book example, written and provided in easily scriptable Python code, I'll leave applying these items as suggested exercise, along with anything else you might care to tweak. This is Python, after all.
A reader posted an errata report for this book on O'Reilly's site, which claimed that a db.close() call is required to avoid file corruption at the end of a Chapter 1 script that displays but does not update a shelve. This report was about Example 1-19, but pertains to many others in the book. Per the report, the shelve file triggers errors later, after it is updated by the next script and then displayed again. Here is the post's text:
Type: Minor technical mistake Description: There are no page numbers when using the Kindle version of the text. In the discussion of building dictionaries using classes, Example 1-19 (dump_db_classes.py) requires closing the db at the end of the script. The final line of the code example should be: db.close() If the db is not closed, the subsequent updating (Example 1-20) and then re-printing of the db (using dump_db_classes.py again) will fail, giving an error code: Traceback (most recent call last): File "C:\Python31\dump_db_classes.py", line 6, in
print(key, '=>\n', db[key].name, db[key].pay) File "C:\Python31\lib\shelve.py", line 113, in __getitem__ value = Unpickler(f).load() EOFError
I have not been able to reproduce this error, and suspect that the poster's kindle cut-and-paste simply dropped the close() call that appears at the end of the update script in Example 1-20. In my testing, the scripts in question work without error and as shown, both in Python 3.1 (using the "dumb" DBM file interface default in 3.X), as well as in Python 2.7 (using the bundled "bsddb" file interface default in 2.X). Moreover, these examples worked fine under Python 2.5 for the prior edition of this book, and similar scripts have been used successfully in earlier editions dating back to 1995. In general, shelves should not require close() calls unless the shelve has been updated. By this rule, a close() is not required in Example 1-19 (though it wouldn't hurt if added); however, all shelve update scripts in the book do call close() before exiting as required.
Because I can't reproduce this issue, I'm posting this as a clarification instead of a correction for now, barring a more detailed reader report. If you do see the same error, please email me with full context; the platform, Python version, and underlying DBM interface in use may factor into behavior too, and there are too many possible combinations for me to test exhaustively. In the poster's defense, shelves are notoriously error-prone; it's not impossible that file paths may have differed between scripts (the current working directory can sometimes change unexpectedly in IDLE), or that a bad cut-and-paste dropped the close() call in either the creation script of Example 1-18 or the update script of Example 1-20.
Update: The reader who posted this note was later unable to reproduce the error in question by repeating the steps which led to it, but believes it existed initially. That may close the case, though there are too many variables related to shelves to be certain.
O'Reilly (the book's publisher) recently asked me for written replies on a few questions related to this book's scope, audience, and goals. Since this might help both current and prospective readers understand the book in general, I've cut and paste the replies on this page.
Update, Nov-2016: Mac OS X poses even more Tk portability issues; see mergeall 3.0's release notes.
Recent development on the frigcal calendar GUI has underscored a number of portability issues and other usage notes regarding Python's tkinter GUI library.
In sum, for programs run on Linux:
In addition to Linux use, the frigcal program unearthed a number of noteworthy updates for tkinter in general:
Python's tkinter GUI library module, used extensively in the book and its examples, works largely as expected, and well enough to power tools like email clients and calendars that I use on a daily basis. Recently, though, I noticed that on rare occasions the book's PyMailGUI email client GUI can appear to hang, and not update its display for email server transactions run in threads. This most often manifests itself as non-modal message transfer "Busy" dialogs that are never erased.
After much puzzling, it was eventually determined that this occurs only when the system time has been changed—for instance, on daylight savings time adjustments, or other automatic or manual changes. And it turns out that this reflects a known issue in the Tcl/Tk library underlying Python's tkinter, which underlies PyMailGUI. The issue may be known, but not widely so, and underscores the tradeoffs inherent in software stack reliance.
In short, the widget.after() method arranges for an event to occur after a fixed duration, by scheduling it to occur at an absolute time in the future, based upon the current system time. If the system time changes, the event may never fire—or, more accurately, it will be postponed until its absolute scheduled time is eventually reached with respect to the new system time. To quote a Tcl/Tk document:
Tcl depends on the system time (converted to seconds from the start of the Unix epoch) increasing fairly close to monotonically for the correct behaviour of a number of things, but most particularly anything in the after command. Internally, after computes the absolute time that an event should happen and only triggers things once that time is reached, so that things being triggered early (which can happen because of various OS events) don't cause problems. If you set the system time back a long way, Tcl will wait until the absolute time is reached anyway, which will look a lot like a hang.
In PyMailGUI, this looks like a hang, but it's not. Really, the thread queue checker's widget.after() is postponed indefinitely, such that finished and queued email server transactions are never noticed in the main GUI thread. The GUI itself remains active, and the email and thread logic works as planned; the widget.after() timer loop simply stalls due to the clock change, and no longer pops and dispatches callbacks that have been added to the thread queue. Hence, the email operations succeed, but the GUI isn't updated.
There is no known fix to this. If your PyMailGUI appears to not be processing email transactions, simply restore your system clock, or restart PyMailGUI. There is a good chance that a future Tcl/Tk release may address this by using a "monotonic" clock: one that has no reference point but can never go backwards, and so isn't susceptible to such change. In fact, Python now has such an interface—see time.monotonic(), new in 3.3 and always available as of 3.5. A Tcl/Tk fix on this front would be inherited by both tkinter and PyMailGUI, as well as all other GUIs with timer loops based on after().
I've gotten numerous emails from readers who have trouble using some of the book's FTP and email Internet examples as coded in the book. In short, you must change the configuration settings in these scripts to use your own FTP server and email accounts. The CGI server-side scripting examples require no such change, as they run entirely locally—including their web server. I thought this was stated clearly in the book, but it's come up often enough to warrant pasting a reader query and reply on the subject here.
> -----Original Message----- > From: .... > To: email@example.com > Subject: chapter 13 example > Date: Fri, 13 Nov 2015 16:35:41 -0800 > > I have a question on the getone.py example. When I run it the ftp.rmi. > net doesn't seem to work. I see you have changed to the learning python > site but when I use the new site the same problem occurs with the > password input. I've tried (), anonymous, and my email address. ? what > to do to connect. > I've been reading your books now for about 13mths. I'm > approaching this like learning a foreign language and enjoying it. I > haven't moved to 3.5 as the 3.4 v seems to work well. Thanks for any > help you can give. > Thanks for your note. Unfortunately, I do not maintain an FTP account for use by readers of the book. With many thousands of readers around the world, this would be too large a task and security risk. Instead, the assumption is that you will replace the site and login details with those of servers to which you have access, if you wish to run such code live. This also applies to the later email (and other) examples: please change configuration files to use your own account, as described in the book. The server-side examples are more immune to this, as they use a Python-coded server running locally on your machine. Best wishes on Python and the book, --Mark Lutz, http://learning-python.com
In a book this large there are bound to be a significant number of typos, and I don't plan on listing them all here. Instead, this section will collect those typos that seem most grievous to me on purely subjective grounds (of course, your subjective grounds may vary). The items here reflect patches made to the book itself in reprints; code patches too large for the book appear in the example package only. See O'Reilly's webpage and its formal errata list for this book for the full list of typos collected and patched in reprints over time.
if askuser: try: text = open(file, 'r', encoding=askuser).read()to the following, adding the new first line (the rest of this code is unchanged):
self.text.focus() # else must click if askuser: try: text = open(file, 'r', encoding=askuser).read()Second, on page 704, at code line 8 near top of page, similarly change:
if askuser: try: text.encode(askuser)to the following, again just adding the new first line:
self.text.focus() # else must click if askuser: try: text.encode(askuser)Reprints: please let me know if there is not enough space for the inserts; I'd rather avoid altering page breaks in the process. This patch will also be applied to future versions of the book's examples package; in the package, the code in question is in file PP4E\Gui\TextEditor\textEditor.py, at lines 298 and 393.
Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip).
server = smtplib.SMTP(self.smtpServerName) # this may fail too server = smtplib.SMTP(self.smtpServerName, timeout=15) # this may fail tooSimilarly, change code line 4 on page 970 from the first of the following to the second:
server = poplib.POP3(self.popServer) server = poplib.POP3(self.popServer, timeout=15)In the book examples package, these changes would be applied to line 153 of mailSender.py, and line 34 of file mailFetcher.py, both of which reside in directory PP4E\Internet\Email\mailtools. They'll be patched in a future examples package version.
Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip). I made the timeout 20 seconds in the examples package, to allow for slower email servers; 15 is more than enough to detect a problem with mine, but tweak this as desired.
Update Mar-17-11: Because timeout settings are used for every server
interaction step, you may need to use a bigger timeout value (e.g., 60 or more seconds)
in some contexts, especially when sending a large email over a slow client-side connection.
See the update at the detailed description above for more on this.
tmp = open(tempname, 'wb') # already encoded tmp.write(asbytes) webbrowser.open_new('file://' + tempname)to read as follows, adding the text that starts with the semicolon (I'm combining statements to avoid altering page breaks):
tmp = open(tempname, 'wb') # already encoded tmp.write(asbytes); tmp.close() # flush output now webbrowser.open_new('file://' + tempname)In the book's examples package, this code is located at line 209 in file ListWindows.py at PP4E\Internet\Email\PyMailGUI; it will be patched there too in a future examples package release (version 1.2, date TBD).
Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip).
return (filename, contype) return (self.decodeHeader(filename), contype) # oct 2011: decode i18n fnamesSecond, on Page 964, change the 5th and 4th last lines of the addAttachments method def statement from the first of these to the second (this is mid page line -22, in file mailSender.py at PP4E\Internet\Email\mailtools):
# set filename and attach to container basename = os.path.basename(filename) # set filename (ascii or utf8/mime encoded) and attach to container basename = self.encodeHeader(os.path.basename(filename)) # oct 2011Update Oct-19-11: Patched in version 1.3 of the book examples package (PP4E-Examples-1.3.zip), and scheduled to be applied in future reprints of the book itself.