See the newer version of this page online here.

Programming Python 4th Edition: Updates Page

I've started collecting notes, updates, and errata related to the book on this page. This page may eventually grow to the have a more sophisticated multipage structure like that of the Learning Python 4E updates pages, but I'm adopting a simpler, informal policy here for now, partly because the book was just released, and partly because this book's more advanced audience will probably be more tolerant of the few inevitable typos along the way in a 1600-page book.

Below are the current and official lists of corrections and general notes for this edition of the book:

Only the last of these lists is true errata (corrections, intended to be patched in future reprints). The first two lists represent an informal book "blog." If you find an issue you wish to report, please post it on O'Reilly's errata page for this book; I'm automatically emailed such posts.


Book Notes

I use this section to post general notes about the book, and supplemental materials.


[Dec-28-10] Bonus example: web site maintenance, with HTML and URL parsing, FTP

As a sort of reward for stumbling onto this page, I've uploaded an extra example script here which would have appeared with the book's HTML parsing coverage in Chapter 19, had this book project enjoyed unlimited time and space. This script uses Python's HTML and URL parsing tools to try to isolate all the unused files in a web site's directory.

I use this script for my training web site, as well as my book support site (the latter after fixing some HTML errors that rendered the script inaccurate when Python's strict HTML parser failed and caused some used files to be missed). This script also includes code to delete the unused files from a remote site by FTP if you wish to enable it (pending a resolution on the parser failures issue), and includes suggestions for parsing with patttern matching instead of the HTML parser.

Download: The script itself lives here: cleansite.py. To see what it does, read its docstring, and see two sample runs provided in a zip file here: testruns.zip.


[Jan-11-11] Bonus example: book lottery, with POP, email, SMTP, and CGI

Update Feb-22-11: New version, rewrite for more usage modes

In light of security constraints in a recent class, I've completely rewritten the PyLotto script for worst-case scenarios. It can now select from both emails, and a names file created manually or via web form submits, and can be run in both console and remote CGI modes. In pathological cases, it can be run locally to select from a local names file (and in fact had to in a recent class somwhere in the wilds of California!). I've also updated to port the single script to work on both Python 3.X and 2.X, and to properly escape student names in the reply HTML. Here's the new code -- use the "view source" option to view the form's HTML:

Read the script's docstring for all the details. Among other things, it demonstrates how to implement simple state retention in CGI scripts, using flat files and locks for possible concurrent updates to the players file. You can also view the sign-up instructions files: lotto-howto-email.txt and lotto-howto-web.txt, as well as a last-resort script which became a class exercise and gave rise to (and is subsumed by) this new generalized Pylotto: simple-pylotto.py.

The rest of this section describes the original, now defunct version, but also gives the back story. The new version has the same goals, but supports web form and local file sign-up modes in addition to emails.

Older (original) version description

Here's another supplemental example that might have appeared in the book, if not for time and its conceptual dependencies. I wrote this script, PyLotto, in order to give away free books in some of the classes I teach. O'Reilly always sends a batch of free copies to authors, and if I kept a dozen copies of every one of the 12 Python books I've written, some of which are not exactly small, I'd probably need a bigger house.

To enter the book lottery, students send an email message to the book's account, with "PYLOTTO" in the subject line. At the end of the class, the script scans, parses, and deletes these emails, and selects a set of their "From" addresses at random as winners. It's not Vegas, but it's fair, and serves as a nice example of practical Python programming that ties together a number of tools presented in the book. This script also has a test mode that sends test emails, and an as-CGI mode for running on a Web server if the training site doesn't allow POP email access or SSH (many don't).

Download: Fetch the script here: pylotto-orig.py. To see what it does, read its docstring, and see the text file that traces its outputs in its various modes here: pylotto-orig-run.txt.

New: See also pylotto-orig-24.py, a version of pylotto.py modified to run remotely as a CGI script on a Python 2.4 web server (that's the latest Python available on godaddy.com, as of January 2011!).

Naturally, I don't give away books in every class I teach (this basically depends on how many freebies O'Reilly has given to me, and how willing I am to lug around a big, giant book in my checked luggage). Even so, scripts such as this one can help illustrate Python applications in action once students or readers have mastered Python language fundamentals; it's important to spend some time running after you've learned to walk.


[Feb-3-11] Bonus example: Extracting music files from an iTunes directory tree

Here's something similarly practical, but a bit simpler than the prior two sections' programs -- a Python script which walks all the folders and subfolders in an iTunes directory tree, to copy all music files in the tree to a flat directory. I use this to create a single directory of all my music on a memory stick, so it can be used conveniently on the harddrive in a vehicle I drive. iTunes seems fond of nested directories, and the vehicle in question doesn't do well in their presence. This example might have appeared in the larger systems examples chapter, if not for time, space, and the fact that it's not too much different from tree-walkers already in that chapter.

Download: Fetch the script here: flatten-itunes.py. To see what it does, read its code and docstring, and see the text file that traces its outputs here: flatten-itunes.out.txt.


[Jan-10-11] General book question replies

O'Reilly recently asked me for written replies on a few questions related to the book's scope, audience, and goals. I'm not sure if they'll post these separately, but since this might help readers understand the book in general, I've cut and paste the replies on this page.


[Dec-31-10] Python 3.2 updates

As described in its Preface, this book was written under Python 3.1, and its major examples were retested and verified to work under Python 3.2 alpha just before publication. Because of that, this book is technically based on both 3.1 and 3.2, though it addresses the entire 3.X line in general.

That said, you will find some discussion of 3.1 library issues in the book that have changed or improved in the upcoming 3.2 version, which is due to be released roughly two months after this book's release date (3.2 final is currently scheduled for mid February 2011). Some of the issues in 3.1's email package which the book must workaround, for instance, have been improved or repaired in 3.2.

In fact, many or most of the issues of the 3.1 email package described in Chapter 13 are fixed in 3.2. The email workarounds coded in that chapter still work under 3.2 (and were verified and even enhanced to do so before publication), but some are no longer required with 3.2. Notably, the email package in 3.2 now supports parsing the raw bytes returned by the SMTP module, thereby eliminating the need for the partially heuristic and potentially error prone pre-parse decoding to str that the book's 3.1-based examples must perform. The next section explains how this works in 3.2.

Email and bytes in 3.2: the surrogates trick

As a prominent example of email's improvements, 3.2's What's New document states that the 3.2 email package's "New functions message_from_bytes() and message_from_binary_file(), and new classes BytesFeedParser and BytesParser allow binary message data to be parsed into model objects". Interestingly, the 3.2 email parser still does not parse bytes internally. Insted, these extensions work their magic by decoding raw binary bytes data to Unicode str text prior to parsing, using the ASCII encoding and passing "surrogateescape" for the decoding call's errors flag. In short, this translates undecodable bytes to Unicode codepoint escape sequences which allow the bytes' original values to be recovered when the text is encoded back again to bytes by compatible software -- a trick which works for data passed through Python APIs which follow this translation protocol to decode to Unicode text and re-encode to bytes later using the surrogateescape error handler for both steps, but can fail for data which is not.

At least potentially, this change could resolve the initial decode-to-str issue for parsing email messages in Chapter 13, and may be a first step towards addressing (but is not a fix for) the related CGI uploads issue described in Chapters 15 and 16. The latter of these arises because the CGI module uses the email parser, but its uploaded data can be arbitrary combinations of both binary data and text of a variety of Unicode encodings, with or without content type headers; such data cannot be decoded to str in 3.1 as required by its email parser. Unfortunately, the CGI module still uses the str-based email parsing API in 3.2 beta2, so this CGI uploads limitation appears to still be present in 3.2. On the upside, the decode-to-str preparse issue for email, as well as other Chapter 13 email package workarounds, may have been rendered superfluous in 3.2, though they are harmless, and representative of the sorts of dilemas faced by real-world development in general -- a major theme of this book.

Other 3.2 changes

For more on the Python 3.2 release, please see its note on this site in the Learning Python 4E updates page (a book less impacted by 3.2, since 3.2 was supposed to change only libraries, not core language -- and nearly succeeded).

Not covered in that note is the very late 3.2 addition of its concurrent.futures library. This library, based upon a Java package, provides yet another way to generalize the notion of multitasking with threads and processes, in addition to the existing subprocess and multiprocessing modules which are covered in this book. This new library is also a bit of a work in progress, intended for future expansion. For more details, please see 3.2 release details and manuals.


[Jan-9-11] Minor usability tweaks for PyEdit (suggested changes)

Chapter 11's PyEdit text editor is a large program with lots of functionality, which I use nearly every day in one context or another. Besides text edits, it's also a key component in the book's PyMailGUI email client. Still, like most non-trivial user interfaces, there are plenty of ways its interaction might be customized or improved, depending on its users' preferences. After using it recently with the more critical eye afforded by the passage of time, five items seem prime candidates for improvement; all are minor nits and not bugs, but would be simple to improve:

Because PyEdit is a book example, written and provided in easily scriptable Python code, I'll leave applying these items as suggested exercise, along with anything else you might care to tweak. This is Python, after all.


Book Clarifications

This section provides clarifications for the book's material or code, which don't qualify as errors, but may address some common reader questions or concerns.


[Jan-10-11] Close of a temporary file seems too implicit in PyMailGUI

I recently spotted something unusual in Chapter 14's PyMaillGUI, in module ListWindows.py, method PyMailCommon.contViewFmt (after about a year, you get to be your own code reviewer!). This method's latter part opens an HTML-only email in a web browser after displaying its extracted plain text in a PyEdit frame, and has worked well for my email ever since it was coded. To be robust and explicit, though, it should probably run a tmp.close() after its tmp.write(asbytes), to ensure that the output file's buffers are flushed to disk before the browser opens it.

This isn't a bug and the code works as is, but apparently only because of fortunate timing: the browser started by the webbrowser module doesn't get around to opening the temporary file until well after this method exits, which deletes its local variables, thereby reclaiming the temporary file object and automatically closing and hence flushing it in the process. This works, but seems too implicit in retrospect, and may not be the best coding pattern to emulate in any event. In general, you should run an output file's close or flush methods (or use the with statement or unbuffered open modes) to flush output buffers to disk if you expect to be able to read the file in the same program.

See Chapter 4 for much more on output file closes (including rules of thumb which I failed to heed in this context), especially pages 137-141 and 145. The prior edition strung the open and write calls together: open(tempname, 'w').write(content), which, though still somewhat implicit, made the automatic close of the temporary file on collection more apparent and immediate, and not as dependent on timing.

Update, Feb-1-11: explicit close required on some machines

After seeing this issue manifest itself for very small HTML-only emails on a much faster machine running a different operating system, I'm reclassifying this as a correction to be patched in reprints; please see the patch below. When and where this problem occurs, an HTML-only email may open as a blank web page, because its temporary file has not been flushed to disk. Oddly, on the faster machine, the web browser somehow opens and reads the temporary file before the GUI method has a chance to close it on exit. Moreover, this only happens for very small emails (< 10K), suggesting a buffer size role. Adding an explicit close() ensures proper behavior for all platforms and emails.

This issue is platform- and timing-specific; impacts just one minor aspect of a very large program; and, even when it does, can always be worked around in the user interface in three different ways -- by pressing the web brower's refresh/reload, by viewing the extracted plain-text in the GUI's main window, and by clicking the sole HTML's part button in the main GUI window to open it on demand. Still, it's annoying and simple enough to merit a patch.

For detail-minded readers, the faster machine with the incorrect behavior was a multicore Windows Vista machine; the book test machine where the code works as is was a single core Atom Windows 7 machine (a beefy netbook). Since the Python method in question returns immediately after this code, on Vista a web browser apparently starts and opens a local web page faster than a Python method function can exit. This may reflect differences in the implementations of start commands in the two systems. I'd be surprised if this occurred on other platforms which spawn processes to open browsers; on the other hand, unpredictable timing has a way of being unpredictable.

As described in the book, there are other issues related handling of HTML-only emails which I've left up to an interested reader to address. As is, display of such emails employs a somewhat temporary scheme in lieu of an HTML-enabled text viewer, which was tested and used by just a single user on a single platform. For example, its single temporary file model means only the last such email viewed is ever stored at any one point in time, regardless of how many are opened. Such oddness does not occur for HTML parts opened on demand, because their files are resaved and closed correctly in mailtools each time an open is requested. As for much of this system, impove as desired.


[Feb-1-11] Using timeouts for poplib and smtplib connections

The email-based code in Chapters 13 (and its clients in Chapters 14 and 16) should probably pass timeout arguments to the poplib and smtplib connection calls, to avoid waiting indefinitely. I've recently seen the book's PyMailGUI desktop client get stuck waiting for a poplib connection call that never returns, for example. This is rare, but without a timeout argument, your only recourse is to wait seemingly forever, or kill and restart the GUI. See Python's library manuals for more about passing timeout arguments in these modules.

Update, Feb-22-11: add timeouts patch for next reprint

I've added the timeout arguments as patches to be made in the next reprint and examples package version: see the patch below. This became a priority when I started seeing the email server at my ISP suddenly failing to respond to connect requests on a very regular basis; probably a temporary problem at the ISP, but killing and restarting the email client was much more painful than patching to use timeouts for connects.


[Feb-1-11] Shelves close() calls for non-update script in Example 1-19?

A reader posted an errata report for this book on O'Reilly's site, which claimed that a db.close() call is required to avoid file corruption at the end of a Chapter 1 script that displays but does not update a shelve. Per the report, the shelve file triggers errors later, after it is updated by the next script and then displayed again. Here is the post's text:

Type: Minor technical mistake
Description: There are no page numbers when using the Kindle version of 
the text.

In the discussion of building dictionaries using classes, Example 1-19 
(dump_db_classes.py) requires closing the db at the end of the script.  
The final line of the code example should be:

db.close()

If the db is not closed, the subsequent updating (Example 1-20) and then 
re-printing of the db (using dump_db_classes.py again) will fail, giving 
an error code:

Traceback (most recent call last):
  File "C:\Python31\dump_db_classes.py", line 6, in 
    print(key, '=>\n', db[key].name, db[key].pay)
  File "C:\Python31\lib\shelve.py", line 113, in __getitem__
    value = Unpickler(f).load()
EOFError

I have not been able to reproduce this error, and suspect that the poster's kindle cut-and-paste simply dropped the close() call that appears at the end of the update script in Example 1-20. In my testing, the scripts in question work without error and as shown, both in Python 3.1 (using the "dumb" DBM file interface default in 3.X), as well as in Python 2.7 (using the bundled "bsddb" file interface default in 2.X). Moreover, these examples worked fine under Python 2.5 for the prior edition of this book, and similar scripts have been used successfully in earlier editions dating back to 1995. In general, shelves should not require close() calls unless the shelve has been updated. By this rule, a close() is not required in Example 1-19 (though it wouldn't hurt if added); however, all shelve update scripts in the book do call close() before exiting as required.

Because I can't reproduce this issue, I'm posting this as a clarification instead of a correction for now, barring a more detailed reader report. If you do see the same error, please email me with full context; the platform, Python version, and underlying DBM interface in use may factor into behavior too, and there are too many possible combinations for me to test exhaustively. In the poster's defense, shelves are notoriously error-prone; it's not impossible that file paths may have differed between scripts (the current working directory can sometimes change unexpectedly in IDLE), or that a bad cut-and-paste dropped the close() call in either the creation script of Example 1-18 or the update script of Example 1-20.

Update: The reader who posted this note was later unable to reproduce the error in question by repeating the steps which led to it, but believes it existed initially. That may close the case, though there are too many variables related to shelves to be certain.


Book Corrections

In a book this large there are bound to be a significant number of typos, and I don't plan on listing them all here. Instead, this section will collect those typos that seem most grievous to me on purely subjective grounds (of course, your subjective grounds may vary). See O'Reilly's webpage for this book for the full list of typos collected and patched in reprints over time.



  1. Page xxviii, line 3 from page top: two typos in same sentence
    This text's "larger and more compete example" should be "larger and more complete examples".



  2. Page 678 in Chapter 11, line 3 of last paragraph on page, figure description off
    The text misstates Figure 11-4's content here: it does not show a Clone window (the original version of this screenshot did, but was retaken very late in the project to show Grep dialogs with different Unicode encodings). To fix, change this line's "a window and its clone" to read "a main window".



  3. Page 702 and 704, PyEdit: add text.focus() calls after askstring() Unicode popups
    For convenience, and per the detailed description above, we should add a call to reset focus back to the text widget after the Unicode encoding prompt popups which may be issued on Open and Save/SaveAs requests (depending on texconfig settings). As is, the code works, but requires the user to click in the text area if they wish to resume editing it immediately after the Unicode popup is dismissed; this standard popup itself should probably restore focus, but does not. To fix, add focus calls in two places. First, on page 702, at code line 21 at roughly mid page, change:
                if askuser:
                    try:
                        text = open(file, 'r', encoding=askuser).read()
    
    to the following, adding the new first line (the rest of this code is unchanged):
                self.text.focus() # else must click
                if askuser:
                    try:
                        text = open(file, 'r', encoding=askuser).read()
    
    Second, on page 704, at code line 8 near top of page, similarly change:
                if askuser:
                    try:
                        text.encode(askuser)
    
    to the following, again just adding the new first line:
                self.text.focus() # else must click
                if askuser:
                    try:
                        text.encode(askuser)
    
    Reprints: please let me know if there is not enough space for the inserts; I'd rather avoid altering page breaks in the process. This patch will also be applied to future versions of the book's examples package; in the package, the code in question is in file PP4E\Gui\TextEditor\textEditor.py, at lines 298 and 393.

    Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip).



  4. Page 963 line 9, and page 970 line 4: add timeout arguments to email server connect calls
    For robustness, and per the detailed description above, add "timeout=15" arguments to the POP and SMTP connect calls, so that email clients don't hang when email servers fail to respond. In the book, change code line 9 on page 963 from the first of the following to the second:
            server = smtplib.SMTP(self.smtpServerName)           # this may fail too
            server = smtplib.SMTP(self.smtpServerName, timeout=15)  # this may fail too
    
    Similarly, change code line 4 on page 970 from the first of the following to the second:
            server = poplib.POP3(self.popServer)
            server = poplib.POP3(self.popServer, timeout=15)
    
    In the book examples package, these changes would be applied to line 153 of mailSender.py, and line 34 of file mailFetcher.py, both of which reside in directory PP4E\Internet\Email\mailtools. They'll be patched in a future examples package version.

    Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip). I made the timeout 20 seconds in the examples package, to allow for slower email servers; 15 is more than enough to detect a problem with mine, but tweak this as desired.



  5. Page 1072, code line 10 from top of page, PyMailGUI: add a close() for HTML mail files
    For portability, and per the detailed description above, we should add an explicit close() call to flush the temporary file of an HTML-only email before starting a web browser to view it, so that this code works in all contexts. As is, it works on the test platform used for the book, and likely works on most others, because the method in question exits and thus reclaims, closes, and flushes the file before the spawned web browser gets around to reading it. However, this is timing and platform dependent, and may fail on some machines that start browsers more quickly; its been seen to fail on a fast Vista machine. To fix in the book, change the middle line of the following three current code lines:
                            tmp = open(tempname, 'wb')      # already encoded
                            tmp.write(asbytes)
                            webbrowser.open_new('file://' + tempname)
    
    to read as follows, adding the text that starts with the semicolon (I'm combining statements to avoid altering page breaks):
                            tmp = open(tempname, 'wb')      # already encoded
                            tmp.write(asbytes); tmp.close() # flush output now
                            webbrowser.open_new('file://' + tempname)
    
    In the book's examples package, this code is located at line 209 in file PP4E\Internet\Email\PyMailGUI\ListWindows.py; it will be patched there too in a future examples package release (version 1.2, date TBD).

    Update Feb-24-11: Patched in version 1.2 of the book examples package (PP4E-Examples-1.2.zip).



  6. Page 1226, two filename typos in same sidebar
    This will probably be obvious to most readers who inspect the external example files referenced here, but in this sidebar: "test-cgiu-uploads-bug*" should read "test-cgi-uploads-bug*", and the bullet item text "test-cgi-uploads-bug.html/py saves the input stream" should read "test-cgi-uploads-bug2.html/py saves the input stream".



  7. Page 1555, top of page, quotes are misplaced in heading line
    A typo inherited from the prior edition: the quotes and question mark in the heading line at the very top of this page are slightly off. Change the heading line: So What's "Python: The Sequel"? to read as: "So What's Python?": The Sequel. Quotes are angled in the original and revision. This header refers back to the sidebar in the Preface titled "So What's Python?". Arguably trivial, as this sidebar was 1500 pages (and perhaps a few months) ago by this point in the book, but it would be better to get this right. This header was broken by a copyedit change on the prior edition, and fell through the cracks on this one.


Back to this book's main page