Learning Python 3rd Edition: General Book Notes

Latest update on this page: February 5, 2009

Please see the main updates page for more details about this page's content, as well as links to the book corrections list, and Python 2.6/3.0 information.

This page collects notes about the text, designed to augment or clarify the published material. These are not errata, and require no patches; they are just supplements for readers of this book. Some of this page's material may also ultimately appear in future editions of this book, so this page is also provided as something of a very early draft preview of upcoming additions.

Contents

This list is roughly ordered by date of addition, from older to newer, not by page number. Items here:


Using time.time() versus time.clock() (page 367)

On this page, the iteration alternatives timing script uses time.time to compute elapsed time. This works fine for this example, but on Windows, using time.clock instead may give better timer precision than time.time (time.clock is microsecond granularity). On Linux, however, time.time is the preferred alternative. See the library manuals for more details, or use the suggested timeit module to finesse such details altogether.


The "+=" statement is not explained in preview chapter (page 79)

A reader filed an errata report with O'Reilly as serious, stating that the "D['quantity'] += 1" on the 3rd line from the bottom of this page should be "D['quantity'] + 1". This is not correct -- it must be "+=" here, not "+". The suggested change would break this example. Admittedly, the "+=" statement has not been explained yet at this point in the book, but as stated clearly on page 68, this chapter is a preview that does not explain most of its content in any sort of depth.

Please keep in mind that Chapter 4 is a preview intended to whet readers' appetites for later details, and deliberately avoids explaining much of its content. In this specific case, the "+=" means: add to the item in place, which is why it is different when printed immediately after this line. The statement shown is essentially shorthand, and equivalent to the longer: "D['quantity'] = D['quantity'] + 1", as covered in detail later in the book on pages 223-225.

To clarify, we might add a sentence at the very end of this page which reads: "Python's X += Y in-place addition statement used here is shorthand for X = X + Y." This seems spurious, though; if we tried to explain every unexplained item in this chapter, we'd wind up repeating the rest of the book!


In figure 9-3, why is set under Numbers, not Collections? (page 190)

Another reader wrote to ask about this. The choice is a bit ambiguous and subjective. Really, sets are both numeric in nature, and collections, so they could arguably be placed in either category. In this figure's tree, an item can't be in two categories at once, so the number choice is as good as the other. In fact, sets are covered in the Numbers chapter of this book for this reason, not in the collections parts. Sets will become more collection-like in 3.0 (with comprehensions and such), but they still have a dual-mode nature. To most people, though, set intersection, union, and difference, have a strong mathematical basis.

Speaking of 3.0, keep in mind that Figure 9-3 must be revised for the new types introduced in 3.0: bytes and bytearray are new string types with sequence behavior, dictionary methods return "view" objects that are akin to both iterators and sets, and so on. See the Python 2.6 and 3.0 notes links at the top of this page for more on 3.0 changes, and consult 3.0's manuals for the new types hiearchy.


Enclosing scopes, state retention, and lambda (page 345)

A reader wrote to suggest that the first line of the example at the end of this page should read "def knights(name)", and the fourth line should read "return action(name)". This is incorrect, but underscores the subtleties of nested function scopes in general. Here is the example code, and a few additional words of explanation.

>>> def knights1():
        title = 'Sir'
        action = (lambda x: title + ' ' + x)
        return action

>>> act = knights1()
>>> act('robin')
'Sir robin'

This example is similar to that on page 324. Here, the point is to use enclosing scope references to remember the current value of "title", for use when the function assigned to "action" is later called. "action" is not called by "knights", but is created and returned by it. Notice that "knights" is called with no arguments at the top of the next page; the function it creates and returns is assigned to "act". When "act" is finally called, string "robin" matches the lambda's "x" argument, but the value of "title" was remembered by the function object created during the "knights" call. Hence, "Sir" is tacked onto the front of the string returned by the lambda function.

If we make the changes suggested by the reader, we would need to pass an argument to "knights", and the "Sir robin" string would be passed back from the "knights" call, not the "act" call. And that's the larger point of the example: enclosing scope references are retained by nested functions, even after the call to the enclosing function has returned. See page 322 for a similar, and more deeply explained, example.

Confusing, perhaps, but that's what enclosing scope references are mostly for -- state retention from enclosing scopes. Lambdas in general are largely used for deferring execution of code, and for retaining state to be used in a later call. If the lambda in this example makes it more confusing than it need be, you can always achieve the same behavior with a nested def instead:

>>> def knights2():
        title = 'Miss'
        def action(x):
            return title + ' ' + x
        return action

>>> act = knights2()
>>> act('demeanor')
'Miss demeanor'

See also the discussion of the 3.0 nonlocal statement on the Python 2.6/3.0 notes page; it allows enclosing scope names to be assigned and thus changed, not just referenced.


Why mention file.xreadlines() at all? (page 259)

In the "Why You Will Care: File Scanners" box, second to last code example, xreadlines() is presented very briefly as an option. One reader wrote to point out to me that, according to the Python Library Reference, xreadlines() is "...Deprecated since release 2.3. Use "for line in file" instead", stating that it shouldn't be mentioned at all.

I know about the deprecation, of course, but I disagree with the argument. It doesn't matter that the library manual labels this deprecated; it is still used in much 2.X code that people have to use and maintain today. As of 2.6, in fact, xreadlines() is still available, and does not issue a deprecation warning. It is still part of standard Python.

Not to pick on this particular reader, but I get quite a few comments about omitting things like this, and they seem, frankly, a bit controlling. My job as author is to teach what people need, not what I personally believe they should do. Best practice does matter, of course, but this is one of many cases where common practice is just as important. For example, the fact that the file iterator version is preferred today is stated clearly in the immediately following section of this chapter. It's one thing to suggest better alternatives, though, and quite another to try and blot out history altogether. This is especially true when people still need to deal with that history today.

xreadlines() will certainly go away in a future 3.0 edition of this book, but because it is still present in existing code, it merits at least a sentence fragment in the current one. Ditto for xrange(), the memory-efficient alternative to range() until 3.0. For now, the very brief mention they get is justified.

And if you still aren't buying this, it might help you to know that I still occasionally teach Python to people who, for various sad reasons, are still compelled to use Python 1.5.2, where xreadlines() was a Good Thing. Although shiny new releases are always more fun to focus on, these people are Python programmers too.


Closing output streams in Processor classes (page 535)

A reader wrote to suggest that a "self.writer.close()" call should be inserted after the loop in the Processor.process method on page 525, in order to properly close the output stream file object, and make the interaction at the top of page 526 work. The examples on page 526 do work as shown in the book without this change, but this raises some important points about files.

First of all, there are some subtle issues in this code, which, as the reader found, make explicit close calls tricky. Adding the close call won't work as is for the HTMLize class (you would need to add a close method to that class that does nothing but pass), and probably isn't what you want when sys.stdout is the writer on page 525 (you won't be able to print anymore).

More importantly, close calls are not generally required -- Python file objects automatically close themselves when garbage collected, and flush their output buffers in the process. Hence, an output file should be automatically flushed and finalized after the last reference to the file object is lost. This always happens when you exit Python or a Python script, and should happen if you don't save a file by assigning it to a variable.

The only place where you might notice exceptions to this rule is when working interactively -- to support debugging, some shells like IDLE may hold onto file objects longer than expected, thus preventing garbage collection and auto-close. This doesn't happen in the interactive session shown in the book. If that is an issue for you when testing interactively, though, try assigning the output file to a variable, and run the file close method through that variable after the process call returns:

>>> temp = open('spamup.txt', 'w')
>>> prog = converters.Uppercase(open('spam.txt'), temp)
>>> prog.process()
>>> temp.close()

It's probably better to handle this in your interactive session this way, rather than in the class itself, since this is only an issue in certain interactive shells. It's not an issue for Python itself.


Completing the Tkinter GUI code snippet (page 349)

Someone wrote to say that the first code snippet in sidebar "Why You Will Care: Callbacks" did not work when they typed it on their machine. This code is not intended to be a complete working program; the "...use message..." in the other listing in this sidebar attempts to imply as much.

To make the snippet actually work, though, you also need to import Tkinter, pack the button to arrange it with the geometry manager, and kick off the GUI event loop (unless your IDE is already running one). Here's the complete version:

import sys
from Tkinter import Button, mainloop
x = Button(
        text ='Press me',
        command=(lambda:sys.stdout.write('Spam\n')))
x.pack()
mainloop()

This still won't work if you're on a machine without Tk GUI support installed (it should be on Mac, Windows, and most Linux). If you're really interested in Tkinter GUIs, though, that is largely the realm of the larger book Programming Python; it's fun stuff, but there's more to it then Learning Python can or should get into.

The real point of the sidebar was that lambdas defer execution; without the lambda in this example, the code would write to stdout when the Button is being created, instead of when it is later pressed. Lambdas also serve to save state information for later use; the lambda in this example defers the write call, but also effectively "remembers" both the function to be called, and the text to be printed.


The rest of the solution to Part IV exercise #4 (page 379)

A reader asked if I could provide the second part of the solution to #4 on page 379 -- the part that asks you to generalize adder for any number of keyword arguments. This isn't given in its entirety in the solutions appendix. It's straightforward to iterate over dictionary keys, but more difficult to get the values to sum or concatenate.

Complete and alternative solutions for the "**" keywords problem are given in the script below. This is actually a fairly difficult problem due to the nested indexing requirement, unless you "cheat" by converting the dictionary to a list of values and fall back on the positional version. Run this on your machine to see its output.

# expanded solutions to Part IV #4

def adder1(*args):                  # sum any mumber positional args
    tot = args[0]
    for arg in args[1:]:
        tot += arg
    return tot

def adder2(**args):                 # sum any number keyword args
    tot = args[args.keys()[0]]
    for key in args.keys()[1:]:
        tot += args[key]
    return tot

def adder3(**args):                 # same but convert to list of values
    args = args.values()
    tot = args[0]
    for arg in args[1:]:
        tot += arg
    return tot

def adder4(**args):                 # same, but reuse positional version
    return adder1(*args.values())

if __name__ == '__main__':
    print adder1(1, 2, 3),       adder1('aa', 'bb', 'cc')
    print adder2(a=1, b=2, c=3), adder2(a='aa', b='bb', c='cc')
    print adder3(a=1, b=2, c=3), adder3(a='aa', b='bb', c='cc')
    print adder4(a=1, b=2, c=3), adder4(a='aa', b='bb', c='cc')


How do lists and dictionaries work internally? (pages 153, 161)

Someone wrote to ask for clarification on the internal implementation of lists and dictionaries. The book touches very briefly on these: on page 153, it explains that lists are implemented as C arrays of pointers instead of linked lists; and on page 161, it states that dictionaries are implemented as expandable hashtables.

There's more to it than this, of course, and these are low-level internal details that most programmers don't need to care about. Further, this varies in alternate Pythons. Jython and IronPython, for instance, may use very different techniques, and even standard Python is free to change the details of how this works over time (in fact, it has). To underscore how carefully Python has been optimized, though, here are a few more words on the subject as of Python 2.6.

List implementation

Basically, lists are stored as arrays of pointers to other objects, and the arrays used to implement lists are overallocated. The array's allocated block of memory includes extra space at the end to allow for future expansion. That way, most additions don't require making a new array and copying over -- most appends simply store a pointer near the end of the block, and most insertions and deletions require just a quick memory copy to shift some items within the block.

Eventually, if the list grows large enough to overflow its array's block of memory, a new, larger block is created, overallocated again; all items in the old block are copied over; and a header in the list object is set to point to the new, larger block. The list's array is also shrunk if it becomes less than half full, by copying all items to a new, smaller block. Copies can be expensive, but by including space in the arrays for future expansion, and delaying contraction until the arrays are half empty, the need to reallocate and copy over is relatively rare.

This scheme turns out to be better on space and time than a linked-list structure, at least for typical Python code. Python core developers actually did some heavy-duty analysis of how both lists and dictionaries are commonly used, in order to come up with the optimal data structures for Python. For lists, the cost of occasionally shifting and copying items in arrays is less than the memory management overheads associated with a linked-list structure, where items are kept in individual blocks.

If you want to see how this works for lists, check out the listobject.c file in Python's source code distribution (one of the advantages of open source). Although more complex in the past, today Python uses a fairly simple algorithm to compute the size of over-allocation in the blocks: (newsize >> 3) + (newsize < 9 ? 3 : 6). Per this file in Python 2.6: "This over-allocates proportional to the list size, making room for additional growth. The over-allocation is mild, but is enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly performing system realloc()." In other words, they've done a great job at handling low-level details, so we don't have to.

Dictionary implementation

Dictionaries use a similar expandable model, though they are hashtables, and their structure is thus a bit more complex. Essentially, Python dictionaries today use table probing instead of chains of items at hash table slots, along with hashing algorithms tailored for common Python usage patterns. According to Python 2.6's dictobject.c file: "This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead)."

In terms of memory, dictionary tables also begin small or presized, and may grow or shrink over time; they double or quadruple in size when they become 2/3 full, and may shrink as items are removed. Also from dictobject.c: "If fill >= 2/3 size, adjust size. Normally, this doubles or quaduples the size, but it's also possible for the dict to shrink [...] Quadrupling the size improves average dictionary sparseness (reducing collisions) at the cost of some memory and iteration speed (which loops over every possible entry). It also halves the number of expensive resize operations in a growing dictionary. Very large dictionaries (over 50K items) use doubling instead. This may help applications with severe memory constraints." In other words, dictionaries are already more efficient than you or I could probably make them.

Reusing objects

As another optimization, Python 2.6 also maintains tables of up to 80 empty lists and dictionaries, to be reused. Object deallocations add to a table if it's not full, and new object requests take from a table if it isn't empty, thereby minimizing the number of expensive memory allocation/release operations. This is similar in spirit to the caches of reused small integers and strings that Python keeps internally, except that integers and strings can be referenced many times while being retained in the cache; lists and dictionaries, being mutable, cannot be shared by multiple references, and so must be removed from the table when in use. That is, a given integer or string in the cache can be reused any number of times, whereas the list and dictionary tables are just temporary staging areas for recently freed and soon-to-be-reallocated objects. (Python's string and number reuse caches are described in the book, first on page 114, and again on pages 119-121).

See these Python source files for more details. Again, keep in mind that this is all prone to change over time, and specific to Python implementations. In general, Python programmers aren't supposed to have to care about the underlying C implementation details, though they can help you understand performance implications, and are instructive to study in general.


More on two scope subtleties (pages 311, 312)

I get quite a few questions about scope issues. To address some confusion, I think two points that already are made in the book could probably stand to be highlighted a bit more:

Also see the discussion of the new nonlocal statement in 3.0 on the Pythnon updates page; 3.0 allows not only references to names defined in eclosing functions' local scopes as in 2.X, it also allows them to be assigned and changed if declared nonlocal.


New OOP tutorial chapter (pages 512-514)

In the next edition of this book, the example sketched on pages 512-514 of the current edition, "A More Realistic Example," will be expanded into a full chapter, tentatively titled the same as the shorter example currently is. This new chapter will appear where the current example does -- between the current chapters "Class Coding Details" and "Designing With Classes", to break up some of the more heavily detailed chapters.

This new chapter is going to be based directly upon a demonstration I go through in all the live classes I teach. It will build up a set of classes gradually, one step at a time, to show how they are constructed from scratch. The example will cover all the basics of OOP in Python, and introduce object persistence along the way. In some sense, this new tutorial is in the same spirit as the current "OOP: The Big Picture" chapter, because it presents OOP basics without getting mired in syntax details. Its end result is practical working code, though.

Although the final content of this chapter is prone to change arbitrarily, I've posted a very early draft of it as a supplement for readers of the current edition. It's available at the following page (note: if this link doesn't work for you, it probably means I am in the process of modifying the tutorial substantially, so please check back later):


Back to this book's main updates page



[Home page] Books Code Blog Python Author Train Find ©M.Lutz