Below are recent book clarifications: notes which provide additional coverage
of language topics, and are intended as supplements to the book. The items
on this page were posted after the first reprint of this book, which was dated
January 2010. Any book changes they propose do not appear in either the first
printing or the first reprint.
To make this page's content easier to apply in reprints, I've divided
its notes into 2 sections--those that merit book changes,
and those that do not. These lists are ordered in no
particular way, though mostly by page number and/or date of addition.
- [Jan-4-12] Page 902: more on Unicode internal storage models
I inserted a short footnote at the bottom of Page 902 in reprints to describe
the internal storage of Unicode characters in Python 3.X (per a July 2010
note below). Because this is changing in 3.3, and because it looks like there
is space on this page for elaboration, I want to change the footnote's current text:
"It may help to know that Python internally stores decoded
strings in UTF-16 (roughly, UCS-2) format, with 2 bytes per
character (a.k.a. Unicode "code pont"), unless compiled for
4 bytes/character. Encoded text is always translated to and
from this internal form, in which text processing occurs."
to read as follows (reprints: please ask me how to shorten if this is too large
to fit at the bottom of the page, as it's not worth changing page breaks; this
ideally should have been a sidebar, but it's too late for that much change):
"It may help to know that Python always stores decoded text
strings in a encoding-neutral, multi-byte format in memory.
All text processing occurs in this uniform internal format.
Text is translated to and from an encoding-specific format
only when it is transferred to or from byte strings, external
text files, or APIs with specific ecoding requirements.
Through Python 3.2, strings are stored internally in
UTF-16 (roughly, UCS-2) format with 2 bytes per character,
unless Python is configured to use 4 bytes/character.
Python 3.3 and later will instead use a variable-length
scheme with 1, 2, or 4 bytes per character, depending on
a string's content.
Either way, encoding pertains mostly to files and transfers;
once loaded into a Python string, text in memory has no
notion of encoding, and is simply a sequence of Unicode
characters (a.k.a. "code points") stored generically."
- [Jul-29-11] Page 728: more on right-side operator overloading methods: __radd__ = __add__
At the empty space at the bottom of page 728, after the final code listing on
this page, add a new short paragraph that reads as follows with all
__radd__ and __add__ in literal font (it looks like there's ample room,
but please ask how to shorten of not):
"For truly commutative operations which do not require special-casing
by position, it is also sometimes sufficient to alias the right-side __radd__
to the left-side __add__, by simply assigning the former name to the latter
at the top-level of the class statement. Right appearances will then
trigger the single, shared __add__ method passing the right operand to self."
Discussion only follows:
On Pages 727-729 the book introduces right-side operator overloading
methods such as __radd__ in sufficient though somewhat cursory
fashion. As mentioned there, this was intentional, given that
most applications programmers and readers of this book will do
little operator overloading if any, and even fewer will need to
implement commutative expression operations for their objects.
As one example, __radd__ shows up nowhere in the follow-up book
Programming Python 4E, despite the fact that that book
constructs larger and fully functional programs including desktop
email clients, webmail sites, text editors, and image viewers, some
of which span thousands of lines of code. Right-side methods are
more important for implementing objects of truly numeric nature,
a task which is relatively rare in practice. Where required, the
full story is readily available in Python's manuals.
Still, Learning Python 4E omits a common coding pattern for
the right-side operator methods, which I thought was present in
earlier editions or other books, but is absent today. In short, for
operations that are truly commutative, it's somewhat common for a
class to simply alias a right-side appearance method such as __radd__
to the left-side appearance method such as __add__, by assigning the
former name to the latter at the class level. Abstractly:
class C:
def __add__(self, other):
....
__radd__ = __add__
Because self is actually on the right side of the operator when
__radd__ is invoked, the effect is to treat this case the same
as left-side appearances: __radd__ triggers __add__ with operand
order swapped. The following code illustrates, by tracing __add__
calls and arguments:
# radd.py
from __future__ import print_function # for 2.7
class C:
def __init__(self, value):
self.data = value
def __add__(self, other):
print(self, '+', other, '=', end=' ')
return self.data + other
__radd__ = __add__
def __str__(self):
return '[%s]' % self.data
x = C(1)
y = C(2)
print(x + 3) # [1] + 3 => left: __add__
print(3 + y) # 3 + [2] => right: __radd__==__add__
print(x + y) # [1] + [2] => both: __add__, then __radd__==__add__
When run, this code's print calls trace how every call is
routed into the single __add__ method, with operands swapped
for right-side appearances:
...>C:\Python32\python radd.py
[1] + 3 = 4
[2] + 3 = 5
[1] + [2] = [2] + 1 = 3
...>C:\Python27\python radd.py
[1] + 3 = 4
[2] + 3 = 5
[1] + [2] = [2] + 1 = 3
There's no reason to define __radd__ separately as shown
in the book's brief call-tracing examples, unless right-side
appearances require special-case processing. For instance,
consider the book's second Commuter class example:
class Commuter: # Propagate class type in results
def __init__(self, val):
self.val = val
def __add__(self, other):
if isinstance(other, Commuter): other = other.val
return Commuter(self.val + other)
def __radd__(self, other):
return Commuter(other + self.val)
def __str__(self):
return '' % self.val
This class works the same if it simply assigns __radd__ to __add__,
though it must still do some type testing to avoid nesting Commuter
objects in expression results (comment-out the "if" to see why):
class Commuter: # Propagate class type in results
def __init__(self, val):
self.val = val
def __add__(self, other):
if isinstance(other, Commuter): other = other.val
return Commuter(self.val + other)
__radd__ = __add__
def __str__(self):
return '' % self.val
Trace this to see why the equivalence works. The book's examples are
designed to trace calls or illustrate concepts, of course, but they
could use simpler patterns in real code.
Also notice that it's possible to achieve a similar effect by
adding in reverse -- the following works the same as the former -- but
name aliasing by simple assignment is more direct and does not incur
an extra call and operation:
class C:
def __add__(self, other):
....
def __radd__(self, other): # other + self (__radd__) => self + other (__add__)
return self + other # but __radd__ = __add__ more direct and quick
- [May-30-11] Page 778: The object superclass comes with some __X__ defaults
A minor clarification for new-style classes, and all classes
in 3.X: the built-in object class at the top of each
class tree in this model also comes with a small handful
of default __X__ operator-overloading methods. Run a
dir(object) to see what these are.
These defaults are described explicitly by the book's
diamond search order discussion (especially on Page 787);
are demonstrated by the book's Lister mix-in examples
(Pages 758-767); and are mentioned at various __str__ method
appearances (Pages 971 and 1031). Still, this might have
been called out more explicitly in the introductory bullet
lists too, and mentioned as a footnote in the operator
overloading chapter, though this would be a forward
reference there.
Because of this, I posted a minor insert for reprints at this
book's errata page at oreilly.com:
On page 778, 5th line from bottom, change:
"and all classes (and hence types) inherit from object."
by adding text at the end to read:
"and all classes (and hence types) inherit from object,
which comes with a small set of default operator overloading
methods."
This is described ahead on Page 787 and in the context of other
examples, but it seems important enough to mention in this
summary (and it looks like there is ample space on this page).
The default methods of object in new-style classes such as
__str__ can sometimes be problematic if not anticipated.
- [Sep-29-10] Page 27 and 534, note "magic" version number check for byte-code recompiles
On page 27, at the very end of the 2nd paragraph that begins "Python saves",
add this sentence: "Imports also check to see if the file must be recompiled
because it was created by a different Python version, using a "magic" number in
the byte-code file itself."
Also, on page 534, at the very end of the second last paragraph which begins
"Python checks", add another sentence: "As noted in Chapter 2, imports also
recreate byte code if its "magic" Python version number does not match."
Discussion only: Technically, in order to know if a recompile is required,
imports check both source/bytecode timestamps as well as the bytecode file's
internal version/implementation "magic" number The book describes the
timestamp check because it's fundamental to all users, but does not go into
further detail about the extra magic number test because this is arguably
more low-level and detailed than most Pyton newcomers using a single Python
require. It becomes more important to know once you start installing new or
alternative Pythons, of course, though it would be difficult to imagine how
an incompatible Python could work at all without such a mechanism. See also
the note about the upcoming
byte-code storage model changes in Python 3.2.
- [Nov-3-10] Pages 164 (200, 233): note the use of 3.X print function form for 2.X readers
A reader wrote with confusion about why a 3.X print call in one of the early
examples did not run under his Python 2.X. To minimize confusion, expand the
text of the comment on the second last line of page 164 to make the usage
explicit; change the first of the following lines to the second:
>>> for c in myjob: print(c, end=' ') # Step through items
>>> for c in myjob: print(c, end=' ') # Step through items (3.X print call)
Simlarly, extend two comments on Pages 200 and 233 to add the same text;
change these lines as follows (and make sure the indentation of both the code
and the "#" characters in all these lines is the same as it was originally):
... print(x, end=' ') # Iteration (3.X print call)
>>> for line in open('myfile'): # Use file iterators, not reads (3.X print call)
(Discussion only follows):
The example on page 164 uses the Python 3.X print function, instead of
the 2.X print statement. It doesn't say so explicitly, but the new
print function in 3.X is described in the Preface (see Table P-2
in particular), and the 3.X/2.X printing differences are covered
in depth later in the book (see page 298).
To run on 2.6, use the following -- the 2.X trailing comma syntax works
like the 3.X end=' ' keyword function argument to avoid a newline:
>>> myjob = "hacker"
>>> for c in myjob: print c, # versus 3.X print(c, end=' ')
This is an unfortunate byproduct of having to address 2 Python versions
in one book. Per its Preface, this book is primarily 3.X by default, with
coverage of 2.X divergences. In this case, the 3.X print form might have
been called out explicitly, and we'll expand the comment in reprints
as noted above to minimize confusion. In general, though, if the book
exhaustively noted every occurence of an incompatibility for 2.X readers,
it may have been much larger than it already is. In fact, the end=''
appears two more times before the book gets to print call/statement
details in Chapter 11, and there are many other instances of 3.X-only
usage, some of which are undoubtedly not explicitly noted as such.
When in doubt, refer to the tables of 3.X changes in the Preface
(ideally, you should at least scan the Preface up front), and check
the index for details on 3.Xisms that create unavoidable forward
dependencies like this in a dual-version book.
- [Oct-13-10] Page 792, third sentence: main point lost by edit made during production
In this sentence, change the clause:
"but they incur an extra method call for any accesses to names that require dynamic computation."
to read as worded in my original text:
"but they incur an extra method call only for accesses to names that require dynamic computation."
This clause describes how properties differ from tools like __getattr__, and the "only" in
my original wording is really the main point. As changed by editors, that main point (the
contrast that stems from their focus on a specific attribute instead of many) was lost.
While we're at it, please add page 792 to the Index entry for "property built-in function" --
this is a crucial first definition of them.
- [Jul-7-10] Assorted Unicode clarifications
Three related minior updates meant to clarify the scope of Python 3.X Unicode strings.
- Page 896, end of second last paragraph: Unicode -- clarify impacts (new sentence)
At the very end of the paragraph which begins "Even if you fall into",
add a new last sentence which reads: "Though applications are beyond our scope
here, especially if you work with the Internet, files, directories, network
interfaces, databases, pipes, and even GUIs, Unicode may no longer be an optional
topic for you in Python 3.X."
I'm adding this because the existing text seems a bit misleading, after
seeing firsthand how much Unicode permeates 3.X applications work. See
this note
for related discussion. Reprints: delete the first clause of this new sentence
of it won't fit as is; it looks like there is plenty of room.
(This and 6 other Unicode items on this page arose from a recent reread of the
Unicode chapter a year after writing it; it's fine as is, but a few key concepts
could be polished with simple inserts in the next printing.)
- Page 901, start of 1st paragraph on page: Unicode -- same policy for read, write (reword)
The start of this paragraph seems potentially misleading in retrospect--it's
not clear if writes work the same as reads. This is clarified later on (see page 920
and later), but it may be worth tightening up here.
Change:
"When a file is opened in text mode, reading its data automatically decodes
its content (per a platform default or a provided encoding name) and returns it
as a str; writing takes a str and automatically encodes it before transferring
it to the file."
to read as this (the parenthesized part has been pulled out):
"When a file is opened in text mode, reading its data automatically decodes
its content and returns it as a str; writing takes a str and automatically
encodes it before transferring it to the file. Both reads and writes translate
per a platform default or a provided encoding name."
- Page 936, last sentence of page: Unicode -- mention filename tools too (new text)
Change the last part of the text: "For more details on re, struct, pickle,
and XML tools in general, consult" to read: "For more details on re,
struct, pickle, and XML, as well as the impacts of Unicode on other library
tools such as filename expansion and directory walkers, consult".
The section here dealing with tools impacted by Unicode could also have
mentioned that os.listdir returns decoded Unicode str for str arguments,
and encoded raw binary bytes for bytes arguments, in order to handle
undecodable filenames. In short, pass in the directory name as a bytes
object to suppress Unicode decoding of filenames per the platform default,
or else an exception is raised if any filenames fail to decode. Passing
in a str invokes Unicode filename decoding on platforms where this matters.
By proxy, os.walk and glob.glob work the same way, because they use
os.listdir internally to generate filenames in directories. This was omitted
here because the section already encroaches on the language/applications line.
Instead, the impacts of Unicode on these and other tools are covered in depth
in the new 4th Edition of Programming Python, where application topics are
collected in general.
- [Jul-2-10] Assorted Unicode clarifications
Four related minor updates meant to clarify the scope of Python 3.X Unicode strings.
- Page 898: Unicode -- mention UTF-16 and UTF-32 in intro (new text)
Near the end of the second last paragraph on this page, expand the start of the
second last line by adding the parenthesized text in the following, to read:
"sets in similar ways (e.g., UTF-16 and UTF-32 format strings with 2 and 4 bytes
per each character, respectively), but all of these".
This is implied by later
UTF-16 examples, but UTF-16 is so common on Windows now that it merits a word here.
- Page 899: Unicode -- bytes is for encoded str too (new text)
At the second bullet item in the second bullet list on this page,
add the following text in parenthesis at the end, so that the bullet item reads:
"* bytes for representing binary data (including encoded text)".
This is shown and implied in later examples, but this seems like a key link concept.
- Page 900: Unicode -- internal str format (new footnote)
I avoided internals discussion in this chapter on purpose, using terms such as
"character" instead, but in retrospect some readers might find a more tangible
model useful too. Add a footnote at the bottom of page 900, with its star at the
very end of the last paragraph before header "Text and Binary Files", which reads:
"It may help to know that Python internally stores decoded strings in UTF-16
(roughly, UCS-2) format, with 2 bytes per character (a.k.a. Unicode "code pont"),
unless compiled for 4 bytes/character. Encoded text is always translated to and
from this internal string form, in which text processing occurs.".
Reprints: if this doesn't fit at the bottom of this page as is, please ask me how
it could be shortened
- Page 909: Unicode -- "conversion" means encoding differently (new sentence)
At the very end of the last paragraph on this page, add the following new sentence:
"Either way, note that "conversion" here really just means encoding a text string
to raw bytes per a different encoding scheme; decoded text has no encoding type, and
is simply a string of Unicode code points (a.k.a. characters) in memory.".
- [Aug-1-10] Page 1139 and entire Index: Index additions list with commments
A reader posted a nice list of Index additions on
O'Reilly's errata site
for this book, and I replied there with a handful of clarifications and additions (I won't repeat
the details here). The indexes which O'Reilly creates have improved much over the years, and this
book is primarily tutorial rather than reference. Instead, Python Pocket Reference provides
a quick-reference supplement in a more concise format. Still, we should try to pick up as many of
these additions in a future reprint as space allows; this is exactly the sort of reader feedback
needed to make improvements in this department.
- [Sep-1-10] Page 976 and 954: Note that descriptor state cannot vary per client class instance
Add a new sentence at the very end of paragraph 3 on page 976, which reads
"The downside of this scheme is that state stored inside a descriptor itself is
class-level data which is effectively shared by all client class instances, and so
cannot vary between them.".
Also, after the first sentence of the last paragraph on
page 954, add a new sentence which reads "Unlike data stored in the descriptor itself,
this allows for data that can vary per client class instance.". It looks like there
is space for both inserts, but please ask me how to shorten if not.
(Discussion only follows):
There is an implication of descriptor state options which might have been called
out more explictly than it was. Crucially, storing state in the descriptor instance instead
of the owner (client) class instance means that the state will be effectively shared by
all owner class instances. That is, because descriptors are class-level data, their content
cannot vary per instance of client classes. To see this at work, in the descriptor-based
CardHolder example on page 976-977, try printing attributes of the "bob" instance after creating
the second instance, "sue". The values of sue's managed attributes ("name", "age", and "acct")
effectively overwrite those of the earlier object bob, because both share the same,
single descriptor instance attached to their class:
class CardHolder: ...as is...
bob = CardHolder('1234-5678', 'Bob Smith', 40, '123 main st')
print(bob.name, bob.acct, bob.age, bob.addr)
sue = CardHolder('5678-12-34', 'Sue Jones', 35, '124 main st')
print(sue.name, sue.acct, sue.age, sue.addr) # addr differs: cardholder instance data
print(bob.name, bob.acct, bob.age, bob.addr) # name,acct,age same: descriptor data!
...> C:\Python31\python test.py
bob_smith 12345*** 40 123 main st
sue_jones 56781*** 35 124 main st
sue_jones 56781*** 35 123 main st
There are valid uses for descriptor state, of course (to manage descriptor
implementation, for example), and this code was implemented to illustrate the
technique. Moreover, the state scope implications of class versus instance attributes
should be more or less a given at this point in the book. However, in this particular
use case, attributes of CardHolder objects are probably better stored as per-instance
data instead of descriptor instance data, perhaps using the same __X naming convention
as the property-based equivalent to avoid name clashes in the instance.
- [Sep-27-10] Chapter 23: The 3.X package-relative import model precludes using directories as both program and package
If we have space, on Page 569, 3rd paragraph from the end, extend the second sentence with
the following's parenthesized text to read:
"
..., they can keep saying just import utilities and expect to find their own files
(when they are run as top-level programs, at least; per the next section, when
used as a package in 3.X, their same directory inter-package imports may need to
be changed to use absolute directory paths or package-relative imports).
".
Also if we have space, add a paragraph to the end of the note on page 580,
which reads:
"
Python 3.X's package-relative import model today also complicates using
a directory of code used as both program and library. To import
a file from the same directory, an inter-package importer generally must use
package-relative syntax when it is being used in package mode, but cannot use
this syntax when it is being used in non-package mode. Hence, you may need to
either isolate externally visible files in their own package subdirectory;
use fully specified package path imports instead; extend the import search
path; or special-case imports per usage mode via the __name__ variable
described in the next chapter. See the interactive prompt imports run earlier
for equivalent cases.
"
(Discussion only follows):
There is a bit of gotcha to the Python 3.0 package-relative
import model change for inter-package imports, which is implied
by the examples and narrative of this chapter, but isn't called
out or illustrated as explicitly as it might have been. I ran
into it first-hand when updating some exmples for Programming Python
4th Edition. In short, because 3.X:
- Does not search a package's own directory when it's used
in package mode unless "from ." package-relative syntax is used, and
- Does not allow "from ." syntax to be used unless
the importer is being used as part of a package,
you can no longer directly create directories that serve as
both standalone programs and importable packages--because
import syntax can vary per usage mode, importers in such directories
may need to pick between package relative import syntax (and assume
use as package only) or normal import syntax (and assume non-package
usage only). The workarounds are as follows:
-
Always use fully specified "dir.dir.mod" absolute package imports
instead of "from ." package relative imports,
-
Specialize your import statements according to their
usage context (package or program) by testing __name__,
-
Add the package's directory to the sys.path module search path directly, or
-
Move all files meant to be visible outside a directory
into a nested subdirectory package so they are always used in package mode
The latter may be the ultimate solution, but it implies
substantial program restructuring for existing code meant to
be used as both program and importable library.
This cropped up in multiple cases in the PP4E book, but as a
simple case, the PyEdit text editor is meant to be both run
standalone, but also to be imported as attachable component
classes. Since this system is nested in the PP4E package, it
is referenced with absolute package import syntax by clients
outside the package:
from PP4E.Gui.TextEditor import textEditor # component and pop up
In Python 2.X, PyEdit's own files imported files in its own
directory with simple imports, relying on 2.X's implied package
directory relative imports model:
import textConfig # startup font and colors
This worked in 2.X for both package and top-level program usage modes.
However, unless this module is also located elsewhere on the import
search path, this fails for package-mode in 3.X because the package
directory itself is no longer searched. Simply using package-relative
imports:
from . import textconfig
suffices when PyEdit is imported externally, but then fails when it is
run standalone, because "from ." is allowed only for code being used
as a package. To workaround for cases where the text config file
had to be imported from the package directory, I specialized the
imports per usage mode:
if __name__ == '__main__':
from textConfig import ( # my dir is on the path
opensAskUser, opensEncoding,
savesUseKnownEncoding, savesAskUser, savesEncoding)
else:
from .textConfig import ( # always from this package
opensAskUser, opensEncoding,
savesUseKnownEncoding, savesAskUser, savesEncoding)
Other cases instead run a top-level script one level
up from the package subdirectory to avoid the conflict.
Restructuring PyEdit as a top-level script plus a package
subdirectory may be arguably better, but seems like too much
of a change to existing code just to accomodate the new model.
Moreover using full absolute paths from the PP4E root in
every import seems to be overkill in the cases I observed,
and is prone to requiring updates if directories are moved.
I'm not sure if such a dual program/library role was taken
into account in the 3.X inter-package import model change
(indeed, package-relative import semantics is being discussed
anew on the Python developers list as I write this note), but
it seems to be a primary casualty.
- [Nov-22-10] Pages 767, 786: more on new-style inheritance method resolution order (MRO)
Two inserts in the name of completeness.
First, on page 767, at the end
of the very last paragraph before the note box on this page, add the following
new sentence ("class.mro()" in both of the text inserts should be literal font):
"
For more ideas, see also Python manuals for the class.mro() new-style class object
method, which returns a list giving the class tree search order used by inheritance;
this could be used by a class lister to show attribute sources.
".
Second, at the very end of the last paragrph on page 786, add a new sentence
which reads:
"
To trace how new-stye inheritance works by default, see also the class.mro()
method mentioned in the preceding chapter's class lister examples.
".
[Discussion only follows]
I resisted a formal description of new-style class method
resolution order (MRO -- the order in which inheritance searches
classes in a class tree), partly because most Python programmers
don't care and probably don't need to care (this really only
impacts diamonds, which are relatively rare in real-world code);
partly because it differs between 2.X and 3.X; and partly because
the details of the new-style MRO are a bit too arcane and academic
for this book. As a rule, this book avoids formal, rigid description,
and prefers to teach informally by example; see its treatment of
function argument matching for another example of this approach.
Having said that, some readers may still have an interest in
the formal theory behind new-style MRO. If this set includes
you, it's described in detail online at:
this web page.
Apart from such formalities, if you just want to see how Python's
new-style inheritance orders supserclasses in general, new-style
classes (and hence all classes in 3.X) have a class.mro() method
which returns a list giving the linear search order. Here are some
illustrative examples:
>>> class C: pass
>>> class A(C): pass # diamonds: order differs for newstyle
>>> class B(C): pass # breadth-first across lower levels
>>> class D(A, B): pass
>>> D.mro()
[<class '__main__.D'>, <class '__main__.A'>, <class '__main__.B'>,
<class '__main__.C'>, <class 'object'>]
>>> class C: pass
>>> class A(C): pass # nondiamond: order same as classic
>>> class B: pass # depth-first, then left-to-right
>>> class D(A, B): pass
>>> D.mro()
[<class '__main__.D'>, <class '__main__.A'>, <class '__main__.C'>,
<class '__main__.B'>, <class 'object'>]
>>> class X: pass
>>> class Y: pass
>>> class A(X): pass # nondiamond: depth-first then left-to-right
>>> class B(Y): pass # though implied "object" always forms a diamond
>>> class D(A, B): pass
>>> D.mro()
[<class '__main__.D'>, <class '__main__.A'>, <class '__main__.X'>,
<class '__main__.B'>, <class '__main__.Y'>, <class 'object'>]
The mro method is only available on new style classes (it's not present
in 2.X unless classes derive from "object"). It might be useful to
resolve confusion, and in tools that must imitate Python's inheritance
search order. For instance, tree climbers such as the book's class tree
lister (Chapter 30,pages 757-767) might benefit, though climbers might
also need to map this linear list to the structure of the tree being traced.
- [Jan-4-12] Python 3.2.0 breaks scripts using input() on Windows [LP4E]
[No fix required]
If a book example which uses the input() built-in seems to be failing,
and you are using Python 3.2.0 in a Windows console window, see
this post
on this book's Notes page. This built-in was apparently broken temporarily in 3.2.0
(3.2) in Windows console mode, but has been fixed in later Python releases.
The quickest fix is to upgrade to 3.2.1 or later, or try a different environment;
the book examples work fine in all other Pythons and most other contexts such as IDLE.
- [Oct-17-11] More on pickle module constraints: bound methods
Python's pickle object serialization module is mentioned a few times in this book: in Chapter 9
for flat files; in Chapter 27 to store an object database during a classes demo; in a Chapter 30
sidebar to describe storing a composite object; and in Chapter 36 in conjunction with string tool
changes in 3.X (see the index for page numbers). Though really an application tool in the realm of the
book Programming Python which covers it in more depth, pickle has very broad utility, and is even
at the heart of some newer distributed computing libraries such as
Pyro -- a system which implements remote procedure
calls by pickling function arguments and return values across network sockets, providing a Python-focused
alternative to web service protocols such as XML-RPC and Soap. Pickled data is also the transport medium
in the newer multiprocessing
module in Python itself -- a portable threading API implemented with processes.
Learning Python doesn't go into much detail about the rules for what can and cannot be pickled,
but only lists common types that can, and defers to Python's manuals and other books for more details.
As described in those other resources, in general most built-in types and class instances can be pickled,
but objects with system state such as open files cannot. Moreover, pickled functions, classes, and by
proxy classes of pickled instances, must all be importable -- they must live at the top of
a module file on the import search path, because they are saved and loaded by name only. For such an
object, pickle data records just the names of the file or class and its enclosing module, not its
bytecode; unpickling reimports and fetches by name to recreate the original object. This applies to
classes of pickled instances too: pickles saves the instance's attributes, and relinks them to the
automatically imported class on loads.
One noteable item that cannot be pickled, which is implied but not mentioned explicitly in most other
resources is bound methods: callable method/instance pairs described explicitly on Pages 752-758.
Python could not recreate the bound method properly if pickled. Technically, these fail because they
do not conform to the importability rule for functions: class methods are not directly importable at the
top of a module. More subtly, Python cannot pickle function objects except by name, and cannot
assume that the function object referenced inside a bound method object originated from any particular
name's binding. For instance, the original method name may have been reassigned in a class or instance
between the time a bound method is created and pickled, and may thus reference an object different than
the bound method's function if fetched anew.
The net effect of all this is that you cannot serialize or store bound methods themselves, though you
might devise other similar schemes that make assumptions reasonable for a given program. For example,
a program may pickle an instance along with the desired method's name string, and fetch the method by
name with getattr() after unpickling to call immediately or create a new bound method. In some
cases it may also suffice to pickle a simple top-level function along with an instance to be passed into
it after unpickling. The pickle module doesn't directly support such schemes itself, however.
Here's an illustration of this limitation in code run with Python 3.1. The following creates, pickles,
and unpickles an instance of an importable class. In this test the class lives in an importable module
file, but the test works the same if this class is instead typed at the interactive shell where all this
code runs, because the shell's namespace is then equivalent to the top of a module file (when typed
interactively, the class is named __main__.C in object displays):
>>> print(open('test.py').read())
class C:
def __init__(self, data):
self.state = data
def spam(self):
print(self.state)
>>> from test import C
>>> X = C(99)
>>> X.spam()
99
>>>
>>> X
<test.C object at 0x02695310>
>>>
>>> import pickle
>>> pickle.dump(X, open('test.pkl', 'wb'))
>>> pickle.load(open('test.pkl', 'rb'))
<test.C object at 0x02695350>
>>>
>>> Y = pickle.load(open('test.pkl', 'rb'))
>>> Y.spam()
99
As described in the book, bound methods allow us to treat an instance's methods as though they were
simple callable functions -- especially useful in callback-based code such as GUIs to implement functions
with state to be used while processing an event (see Pages 729-730 and the sidebar on Page 758 for more
on this bound method role, as well as its __call__ alternative coding):
>>> X
<test.C object at 0x02695310>
>>> X.spam()
99
>>>
>>> X.spam
<bound method C.spam of <test.C object at 0x02695310>>
>>>
>>> T = X.spam
>>> T()
99
You won't be able to pickle bound (or unbound) methods directly, though, which precludes using them in
roles such as persistently saved or transferred callback handlers without extra steps on unpickles:
>>> pickle.dump(X.spam, open('test.pkl', 'wb'))
Traceback (most recent call last):
...more...
_pickle.PicklingError: Can't pickle <class 'method'>: attribute lookup builtins.method failed
>>> pickle.dump(C.spam, open('test1.pkl', 'wb'))
Traceback (most recent call last):
...more...
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
Of course, pickling things like bound method callback handlers may not work in some cases anyhow,
because the instance may contain state information that is valid in the pickling process only;
references to GUI objects in callback handlers, for example, are likely invalid in an unpickling
program. Unpickled state information might be less transient in other applications.
I'm not marking this as a book update because this book doesn't go into this level of detail
on pickling. See Programming Python and Python's Library Manual for
more on
pickle, as well as the related
shelve
module which adds access to objects by key.
As described elsewhere, there is additional pickler protocol for providing and restoring object
state which may prove useful in this case. For instance, the pickler's __getstate__ and
__setstate__ methods can be used for purposes such as reopening files on unpickling, and
might be used to recreate a bound method when loading a pickled instance of a suitable wrapper
class.
- [May-30-11] Page 711-718: Using yield within the __iter__ method (or not!)
(Note: I'm going to have more to say on this technique in the 5th Edition of
this book; it is as implicit as this describes, but also does have some advantages
in code size which are not described here as well as they might be.)
[No fix required]
I recently saw an iterator coding technique in Python standard
library code which is described only tersely and abstractly in
Python's own manuals, and implied but not covered explicitly in
the book itself. Given that understanding this technique at all
requires two big leaps of faith in the implicit and the magic,
I'm not sure I would recommend it in general. Still, a brief
look might help if you stumble onto it in code too.
On Pages 711-718, the book teaches user-defined iterator objects
by coding their classes to either return self, for a single-pass
iteration:
class C:
def __iter__(self, ...): # called on iter()
...configure state
return self
def __next__(self): # called on next()
...use state # use .next() in 2.X
...return next or raise StopIteration
or return a different object, to support multiple active
iterations:
class C:
def __iter__(self, ...):
return Citer(state)
class Citer:
def __init__(self, ...):
...configure state
def __next__(self):
...use state
...return next or raise StopIteration
This part of the book also compares such classes to generator functions
and expressions, as well as simple list comprehensions, to show how the
classes better support state retention and minimize memory requirements.
Though not shown explicitly in the book, as implied directly by its coverage
of generator functions on Pages 492-505 it's also possible to achieve similar
effects by yielding values from the __iter__ method itself:
class C:
def __iter__(self, ...): # __iter__ returns obj with __next__
...configure state # yield makes this a generator
for loop...: # generators make objs with __next__
yield next # return raises StopIteration
This technique works too, but seems like too deep magic to me.
To understand this at all, you need to know two very implicit
things:
-
First, that __iter__ is invoked as a first step in iteration,
and must return an object with a __next__ method (next in 2.X)
to be called on each iteration. This is the iteration protocol
in general, discussed in multiple places in the book; see the two
iteration chapters especially.
-
Second, that this coding scheme only works because calling a
generator function (a def statement containing a yield statement)
automatically creates and returns an iterable object which has
an internally created __next__ method, which automatically raises
StopIteration on returns. This is the definition of generator
functions, discussed in detail on Pages 492-505.
In other words, this sort of __iter__ does return an object with a
__next__ to be run later too, but only because that's what generator
functions do automatically when they are first called. The combined
effect is therefore the same as explicitly returning an object with
an explicit __next__ method as in the book's examples, but there seems
a magic multiplier factor at work here which makes the yield-based
scheme substantially more obscure.
I would even suggest that this qualifies the __iter__/yield scheme
as non-Pythonic, at least by that term's original conception.
Among other things, it soundly violates Python's longstanding EIBTI
motto -- for Explicit is better than implicit, the second rule
listed by the "import this" statement of Python's underlying philosophies.
(Run this command yourself at an interactive Python prompt to see what
I mean; it's as formal a collection of goals and values as Python has.)
Of course, the Python world and time are the final judges on such
matters. Moreover, one could credibly argue that the very meaning
of the term Pythonic has been modified in recent years to incorporate
much more feature redundancy and implicit magic than it originally did.
Consider the growing prominence of scope closure state retention in
recent Python code, instead of traditional and explicit object attributes.
The __iter__/yield iterator coding scheme is ultimately based on the
former and more implicit of these, and reflects a growing shift in the
language from object-oriented towards functional programming patterns.
All of which is to me really just another instance of a general property
I've observed often in the last two decades: Despite their many advantages,
open source projects like Python sometimes seem to stand for no more
than what their current crop of developers finds interesting. Naturally,
whether you find that asset, liability, or both is up to you to decide.
As a rule, though, and as underscored often in the book, code like
this that requires the next programmer to experience "moments of
great clarity" is probably less than ideal from a typical
software lifecycle perspective. Academically interesting though
such examples may be, magic and engineering do not generally mix
very well in practice.
- [Feb-3-11] More concise coding option for transitive reloads example, page 596
[No fix required]
I was recently reviewing the transitive module reloading utility example on
page 596, and noticed that it may be a bit more verbose than needed (a year's
time has a way of affording fresh perspectives on such things). If I were
to recode this today, I'd probably go with the version that follows -- by
moving the loop to the top of the recursive function, it eliminates one of
the two loops altogether. Compare this with the original in the book; it
works the same, but is arguably simpler, and comes in at 4 lines shorter:
"""
reloadall.py: transitively reload nested modules
"""
import types
from imp import reload # from required in 3.0
def status(module):
print('reloading ' + module.__name__)
def transitive_reload(objects, visited):
for obj in objects:
if type(obj) == types.ModuleType and obj not in visited:
status(obj)
reload(obj) # Reload this, recur to attrs
visited[obj] = None
transitive_reload(obj.__dict__.values(), visited)
def reload_all(*args):
transitive_reload(args, {})
if __name__ == '__main__':
import reloadall # Test code: reload myself
reload_all(reloadall) # Should reload this, types
Also keep in mind that both this and the original reload only
modules which were loaded with "import" statements; since names
copied with "from" statements do not cause a module to be nested
in the importer's namespace, their containing module is not
reloaded. Handling "from" importers may require either source
code analysis, or customization of the __import__ operation.
If the recursion used in this example is confusing, see
the discussion of recursive functions in the advanced
function topics of Chapter 19; here is a simple example
which demonstrates the technique:
>>> def countdown(N):
if N == 0:
print('stop') # 2.X: print 'stop'
else:
print(N, end=' ') # 2.X: print N,
countdown(N-1)
>>> countdown(20)
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 stop
For more on Python recursion, see also the recursive
stack limit tools in the sys module (Python has a fixed
depth limit on function calls, which you increase for
pathologically deep recursive use cases):
>>> import sys
>>> help(sys.setrecursionlimit)
Help on built-in function setrecursionlimit in module sys:
setrecursionlimit(...)
setrecursionlimit(n)
Set the maximum depth of the Python interpreter stack to n. This
limit prevents infinite recursion from causing an overflow of the C
stack and crashing Python. The highest possible limit is platform-
dependent.
>>> sys.getrecursionlimit()
1000
- [Feb-5-11] More on using the super() built-in function in Python 3.X and 2.X ... or not!
(Note: I'm going to have more to say on this call in the 5th Edition of
this book; it has a valid use case--cooperative method dispatch in multiple
inheritance trees--which is not given here, although this is still a rare
and obscure role, relies on the esoteric MRO ordering of classes, and
generally requires universal deployment in all classes of a tree to be used
reliably--something that seems highly unrealistic in the many millions of
lines of existing Python code.)
[No fix required]
This book very briefly mentions Python's super() built-in function on page 787,
but this call probably merits further elaboration given its increase in popularity.
Frankly, in my classes it seems to be most often requested by Java programmers
starting to use Python anew, because of its conceptual origins in that language
(for better or worse, many a new Python feature owes its existence to programmers
of other languages bringing their old habits to a new tool). It was given limited
coverage in this introductory-level book on purpose because it's arguably not
best Python practice today, but it might help some readers to explain the
rationale for that choice.
Traditional form: portable, general
In general, the book's examples prefer to call back to superclass methods when
needed by naming the superclass explicitly, because this technique is traditional
in Python; because it works the same in both Python 2.X and 3.X; and because it
sidesteps limitations and complexities related to this call in both 2.X and 3.X,
especially its weak support of multiple inheritance trees.
For reasons I'll outline here, super() is not broadly used today, and might even
be better avoided altogether, in favor of the more general and widely applicable
traditional call scheme. As shown in the book, to augment a superclass method,
the traditional superclass method call scheme works as follows:
[Python 2.7 and 3.1]
>>> class C:
... def act(self):
... print('spam')
...
>>> class D(C):
... def act(self):
... C.act(self) # 2.X and 3.X: name superclass explicitly, pass self
... print('eggs')
...
>>> X = D()
>>> X.act()
spam
eggs
This form works the same in 2.X and 3.X, follows Python's normal method
call mapping model, applies to all inheritance tree forms, and does not
lead to confusing behavior when operator overloading is used. To see
why these distinctions matter, let's see how super() compares.
Use in Python 3.X: a magic proxy
One of the two goals of the super() built-in (see Python's manuals for the other)
is to allow superclasses to be named generically in single-inheritance trees instead,
in order to promote simpler code maintenance, and to avoid having to type long superclass
reference paths at calls. In Python 3.X, this call seems at least on first glance to
achieve this purpose well:
[Python 3.1]
>>> class C:
... def act(self):
... print('spam')
...
>>> class D(C):
... def act(self):
... super().act() # 3.X: reference superclass generically, omit self
... print('eggs')
...
>>> X = D()
>>> X.act()
spam
eggs
>>> super # a "magic" proxy object that routes later calls
<class 'super'>
This works, but perhaps the biggest potential downside with this call in 3.X
is its reliance on deep magic: it operates by inspecting the call stack in
order to automatically locate the self argument and find the superclass, and pairs
the two in a proxy object which routes the later call to the superclass version of
the method. If that sounds complicated and strange, it's because it is.
Really, this call's semantics resembles nothing else in Python -- it's neither
bound nor unbound itself, and somehow finds a self even though you omit one
in the call. In single inheritance trees, a super is available from self
via the path self.__class__.__bases__[0], but the heavily implicit
nature of this call makes this difficult to see, and even flies in the face
of Python's explicit self policy that holds true everywhere else. That
is, this call violates a fundamental Python idiom for a single use case.
It also flies in the face of Python's general EIBTI rule at large (see
earlier on this page for more on this rule).
Limitation in Python 3.X: multiple inheritance
Besides its unusual semantics, even in 3.X this super() role really applies only
to single inheritance trees, not to multiple inheritance. This is a major
limitation of scope; due to the utility of mix-classes in Python, multiple inheritance
is probably more the norm than the exception in realistic code. If your classes use
more than one superclass, or may in the future, super() is essentially unusable -- it
does not raise an exception for multiple inheritance trees, but will pick just the
leftmost superclass, which may or may not be the one what you want, and may silently
mask the fact that you should really select superclasses explicitly in this case:
[Python 3.1]
>>> class A:
... def act(self): print('A')
...
>>> class B:
... def act(self): print('B')
...
>>> class C(A):
... def act(self):
... super().act() # super applies to single-inheritance only
...
>>> X = C()
>>> X.act()
A
>>> class C(A, B):
... def act(self):
... super().act() # doesn't fail on multi, but picks just one!
...
>>> X = C()
>>> X.act()
A
>>> class C(B, A):
... def act(self):
... super().act() # if B is listed first, A.act() is no longer run!
...
>>> X = C()
>>> X.act()
B
>>> class C(A, B): # traditional form
... def act(self): # you probably need to be more explicit here
... A.act(self) # this form handles both single and multiple inher
... B.act(self) # and works the same in both Python 3.X and 2.X
... # so why use the super() special case at all?
>>> X = C()
>>> X.act()
A
B
Here's a real world example of a case where super() does not apply, taken from the PyMailGUI
case study in Programming Python 4th Edition --
the following very typical Python classes use multiple inheritance to mix in both application
logic and window tools, and hence must invoke both superclass constructors
explicitly with direct calls by name, because super() does not apply:
class PyMailServerWindow(PyMailServer, windows.MainWindow):
"a Tk, with extra protocol and mixed-in methods"
def __init__(self):
windows.MainWindow.__init__(self, appname, srvrname)
PyMailServer.__init__(self)
class PyMailFileWindow(PyMailFile, windows.PopupWindow):
"a Toplevel, with extra protocol and mixed-in methods"
def __init__(self, filename):
windows.PopupWindow.__init__(self, appname, filename)
PyMailFile.__init__(self, filename)
The crucial point here is that using super() for just the single inheritance cases where it
applies means that programmers must remember two ways to accomplish the same goal,
when just one, direct calls, would suffice for all cases. Which begs the question of super()
advocates: Wasn't such feature creep one of the main things that Python originally sought
to avoid?
Even more fundamentally, it's also not clear that the trivial amount of code maintenance
that super() is envisioned to avoid fully justifies its presence. In Python
practice, superclass names in headers are rarely changed; when they are, there
are usually at most a very small number of superclass calls to update within the
class. And consider this: if you do use super() in a single-inheritance
tree, and then add a second superclass in the future to leverage multiple
inheritance (as in the example above), you may very well have to change all
the super() calls in your class to use the traditional explicit call scheme
instead -- a maintenance task which seems just as likely and tedious as the one
that super() is supposed to address!
Limitation in Python 3.X: operator overloading
As mentioned briefly in Python's library manual, super() also doesn't quite
work in the presence of __X__ operator overloading methods. If you study the
following code, you'll see that direct named calls to overload methods in the superclass
operate normally, but using the super() result in an expression fails to dispatch to the
superclass's overload method:
[Python 3.1]
>>> class C:
... def __getitem__(self, ix): # index overload method
... print('C index')
...
>>> class D(C):
... def __getitem__(self, ix): # redefine to extend here
... print('D index')
... C.__getitem__(self, ix) # traditional call form works
... super().__getitem__(ix) # direct name calls work too
... super()[ix] # but operators do not! (__getattribute__)
...
>>> X = C()
>>> X[99]
C index
>>>
>>> X = D()
>>> X[99]
D index
C index
C index
Traceback (most recent call last):
File "", line 1, in
File "", line 6, in __getitem__
TypeError: 'super' object is not subscriptable
This behavior is apparently due to the same new-style (and 3.X) class change described
at numerous places in the book (see the sidebar on Page 662 for the first) -- because
the proxy object returned by super() uses __getattribute__ to catch and dispatch later
method calls, it fails to intercept the automatic __X__ method invocations run by
expression operators, as these begin their search in the class instead of the instance.
This may seem less severe than the multiple-inheritance limitation, but operators should
generally work the same as the equivalent method call, especially for a built-in like
this, and not supporting this adds another exception for super() users to confront.
Your Java mileage may have varied, but in Python, self is explicit, multiple
inheritance and operator overloading is common, and superclass name updates
are rare. Frankly, the super() call seems intended more to placate Java
programmers than to address real Python problems. Because it adds an odd
special case to the language -- one with strange semantics, limited scope,
and questionable reward -- most Python programmers may be better served by
the more broadly applicable traditional call scheme.
Use in Python 2.X: verbose calls
Just as bad for current 2.X users as well as this dual-version book, the super()
technique is not portable between Python lines. To make this call work in Python
2.X, you must first use new-style classes. Worse, you must also explicitly pass
in the immediate class name and self to super(), making this call so complex and
verbose that in most cases it's probably easier to avoid it completely, and simply
name the superclass explicitly per the traditional code pattern above (and for
brevity, I'll leave it to readers to consider what changing a class's own name
means for code maintenance when using the 2.X super() form!):
[Python 2.7]
>>> class C(object): # for new-style classes only
... def act(self):
... print('spam')
...
>>> class D(C):
... def act(self):
... super(D, self).act() # 2.X: call format seems too complex
... print('eggs') # "D" may be just as much to type as "C"!
...
>>> X = D()
>>> X.act()
spam
eggs
>>> class D(C):
... def act(self):
... super().act() # simpler 3.X call format fails in 2.X
... print('eggs')
...
>>> X = D()
>>> X.act()
Traceback (most recent call last):
File "", line 1, in
File "", line 3, in act
TypeError: super() takes at least 1 argument (0 given)
>>> class D(C):
... def act(self):
... C.act(self) # but traditional pattern works portably
... print('eggs') # and may be simpler in 2.X code
...
>>> X = D()
>>> X.act()
spam
eggs
Summary
Like all new Python language features, you should be the judge on
this one too, of course, but because this call:
- Differs between 2.X and 3.X
- In 3.X, relies on arguably non-Pythonic magic, and does not
fully apply to multiple-inheritance and operator overloading
- In 2.X, seems so verbose in this intended role that it
may make code more complex instead of less
- Claims code maintenance benefits which may be more
hypothetical than real in Python practice
even ex-Java programmers should also consider the book's preferred
traditional technique of explicitly naming superclasses in calls
to be at least as valid a solution as Python's super() -- a call
which seems an unusual and limited answer to a question which was
not being asked for most of Python's history.
Having said that, I recently found myself finding a use for this call
in code that would only run on 3.X, and which used a very long superclass
reference path (through a module package -- see the parser class in
this code). As usual,
time will tell if such limited contexts lead to broader adoption for this call.
Update Aug-27-11:
For other opinions on Python's super() which go into further details both
good and bad, see also:
Python's Super Considered Harmful, as well as
Python’s super() considered super!.
You can find additional positions near or between these two with a simple web search.
- [Feb-5-11] Page 86, paragraph 2: punctuation inside quotes in non-code text
[No fix required]
A reader wrote to suggest that the "Hello," in the first line of this paragraph
be changed to "Hello",, with the comma move after the closing quote to match
the pattern's substring. This isn't an errata, though it is an interesting point.
I agree with the poster in principle, but this text has to follow writing style
conventions. The text in question is not code, it simply quotes a word in the
narrative (if this had been code, it would be in literal font). In non-code text
like this, a comma, or other punctuation which would normally follow quoted text,
is by standard moved inside the quotes, just before the closing quote. The same
thing happens to "world." later in this paragraph. This doesn't exactly match
the pattern, of course, but English isn't Python.
- [Feb-10-11] Chapter 38, decorators: annotations, aspects, and (not) macros
[No fix required]
Python's function and class decorators are covered in depth in the book, especially
in Chapters 31 and 38. In a prior clarification which I posted on the
first printing's page,
I noted that Python's function decorators are similar to what is sometimes called
aspect-oriented programming in some other languages -- code inserted to run
automatically before or after a function call runs. Python's decorators also very
closely resemble Java's annotations, even in their "@" syntax, though Python's
model is usually considered more flexible and general.
Recently, though, I've also heard some comparing decorators to macros, but I don't
think this is entirely apt, and might even be misleading. Macros (e.g., C's #define
preprocessor directive) are typically associated with textual replacement and expansion,
and are designed for generating code. By contrast, Python's decorators are a runtime
operation, based upon name rebinding, callable objects, and often, proxies. While the
two may have use cases that sometimes overlap, decorators and macros are fundamentally
different in scope, implementation, and coding patterns. Comparing the two seems akin
to comparing Python's import operation with a C #include, which similarly confuses a
runtime object-based operation with text insertion.
Of course, the term "macro" has been a bit diluted over time (to some, it now can
also refer to any canned series of steps or procedure), and some might find the analogy
to descriptors useful anyhow. But they should probably also keep in mind that decorators
are about callable objects managing callable objects, not text expansion. Python tends
to be best understood and used in terms of Python idioms.
- [Oct-27-10] Notes on using example code cut-and-paste from PDF or HTML
[No fix required]
A reader wrote with questions on using book example code obtained from
HTML (online) and PDF (ebook) forms of the book. Indentation matters in
Python code, and some formatting protocols support this better than others.
In short, indentation in example code displays correctly when viewed
in both formats, but copying the code may require special handling:
line breaks formatting may be lost when copying from HTML to Windows-only
text editors; indentation is lost altogether when copying from the PDF;
and the text files in the example distribution
package
avoid such
issues altogether. In more detail:
- When code is cut from an HTML display, indentation whitespace is usually
retained, but lines may follow the Unix end-line convention. Hence, pasting it
into NotePad on Windows loses its original formatting (it becomes one long line).
To avoid this, try pasting code into something like WordPad or Python's own IDLE
text edit windows; both retain the original indentation and line breaks correctly.
Note that some combinations of browsers, editors, and platforms might still
mangle formatting in cut HTML--jump to the third bullet below if this applies
to you.
- When code is cut from the PDF format, its indentation whitespace is
always lost in every text editor I've pasted it into (every line starts in
column 1). I raised this problem with O'Reilly in the past, and their
position is that this is an unfortunate byproduct of PDF formatting. The
best you can do here is to restore indentation manually, or read the next
bullet.
- To sidestep the issue completely and retain indentation in all cases,
you can always obtain the code examples from the text files in the book
examples distributon package, available at
this location.
Since these are simple text, they're generally immune to the HTML and PDF
formatting issues of the prior two bullets.
- [Jul-7-10] Page 139: more on implementation of the bool type
[No fix required]
A reader wrote to ask how the bool type is actually implemented in
Python. I mentioned in Chapters 5 and 31 that bool is really just a
subclass of int with two predefined instances, True and False. This is
true, but the implementation is actually a bit more subtle. For example,
the following almost works, but not quite--the initialization-time value
passed is consumed by the int type's __new__ method to set integer state,
and is used in later math regardless of the self.val of this class:
>>> class myBool(int):
... def __init__(self, value):
... self.val = 1 if value else 0 # val goes to int.__new__ first!
... def __repr__(self):
... return 'True' if self.val else 'False'
... __str__ = __repr__
...
>>> myTrue = myBool(1)
>>> myFalse = myBool(0)
>>> myTrue
True
>>> myFalse
False
>>> myTrue + 8 # really uses int's state, not self.val
9
>>> myFalse - 3 # really uses int's state, not self.val
-3
>>> myOther = myBool(9) # but doesn't use self.val==1 here!
>>> myOther
True
>>> myOther + 3 # really a int(9) with a __repr__
12
>>> myOther.val
1
To see how bool really works, you need to study its C source code in Python's
boolobject.c file. One possible emulation in Python code is the following --
define an int subclass whose __new__ operation always returns True or False,
which are really just int objects but have a __class__ pointer referring
to bool in order to obtain its __repr__ behavior:
(Footnote: as mentioned on Page 707 and in Chapter 39, __new__ is a
rarely used overloading method called to create an instance, before __init__
is run to initialize the new instance which __new__ returns. Most classes define
just __init__ and allow __new__ to default to built-in code which creates and
returns a new instance, but __new__ has some advanced roles in metaclasses, and
can be used for some coding patterns such as singletons: classes that make at
most one instance, and return it on later construction calls.)
>>> class myBool(int):
... def __new__(self, value):
... return myTrue if value else myFalse
... def __repr__(self):
... return 'True' if self else 'False'
... __str__ = __repr__
... # plus __and__, __or__, __xor__ redefines here to retain type
...
>>> myTrue = int(1)
>>> myTrue.__class__ = myBool
Traceback (most recent call last):
File "", line 1, in
TypeError: __class__ assignment: only for heap types
As you can see, this doesn't work in pure Python code in
3.X, though; the C implementation gets away with this in
lower-level terms A pure Python solution might look like
the following, but requires overriding the obscure __new__
method, and redefining with a factory function to ensure
that at most two instances are ever created:
>>> class myBool(int):
... def __new__(self, value):
... return int.__new__(self, 1 if value else 0)
... def __repr__(self):
... return 'True' if int(self) == 1 else 'False'
...
>>> myTrue = myBool(1)
>>> myFalse = myBool(0)
>>>
>>> myTrue, myFalse
(True, False)
>>> myTrue + 3, myFalse + 3
(4, 3)
>>>
>>> def myBool(value): # factory
... return myTrue if value else myFalse # at most these two
...
>>> myBool(1), myBool(0)
(True, False)
>>> myBool('spam'), myBool('')
(True, False)
>>>
>>> myBool('spam') == myTrue, myBool('spam') is myTrue
(True, True)
Of course, this still isn't the same because Python uses its own True
and False internally in its C-language code for operations like
the last line here. Experimenting further (e.g., see the builtins
module) is left as a suggested exercise.
- [Jul-7-10] Page 238 and exceptions part: more on files and "with"
[No fix required]
The book mentions that the "with" context manager statement can save 3 lines
of code compared tp the more genrally applicable "try/finally" when you need to
guarantee file closures in the face of possible exceptions. It's also true that
with" can even save 1 line of code when no exceptions are expected at all
(albeit at the expense of further nesting and indenting file processing logic):
myfile = open(filename, 'w') # traditional form
...process myfile...
myfile.close()
with open(filename) as myfile: # context manager form
...process myfile...
If you really need to close your file, though, you should generally allow
for an exception for unexpected system conditions with the longer
try/finally alternative to the first of these as shown in the book:
myfile = open(r'C:\misc\data.txt')
try:
...process myfile...
finally:
myfile.close()
- [Sep-1-10] Page 333: code is deliberately abstract and partial
[No fix required]
A reader wrote:
> I looked on your website for corrections to the book Learning Python 4th
> edition for page 333 but did not find any. I am working through your
> book on my own and found the program example on page 333 unclear and
> broken, i.e. the "match(x[0])" is undefined. Can you explain this
> example a bit more and give me an example definition for "x" and "match"
> that will make this sample code run?
Thanks for your note. You have a valid point: this code snippet
is intended to be abstract and partial, but it's not explicitly
described as such. In fact, the prior page's primes code is
similarly abstract, though this fact is better noted there.
The abstract code snippet in the book strips off successive items
at the front of "x" and passes each into this function in turn.
To make it work as real, live code, "x" would have to be a sequence
such as a list or string, and "match()" would have to be a function
which checks for a match against an object passed in. As a
completely artificial example:
x = list(range(100))
def match(I):
return I > 50
A better example might make x a list of dictionary "records",
and match() a test against a dictionary key's "field" value
(e.g., looking in a database list for a record with a name
value matching one selected by match()).
I couldn't show a function like match() at this point in the
book, though, without yet another forward dependency (functions
are not covered until the next part). The goal here was to
illustrate the loop else by itself. I also chose not to elaborate
here because in practice a "for" loop is probably better than a
"while" for this code, and iteration tools such as filter() and
comprehensions might be better than both:
x = [....]
for item in x:
if match(item):
print('Ni')
break
else:
print('Not found')
print('Ni' if [item for item in x if match(item)] else 'Not found')
print('Ni' if list(filter(match, x)) else 'Not found')
print('Ni' if any(item for item in x if match(item)) else 'Not found')
print('Ni' if any(filter(match, x)) else 'Not found')
Try running these on your own to see what I mean. Despite their
conciseness, the downside of some of the latter of these is that
they may wind up calling match() more times than required (for items
after a match)--possibly costly if match() is expensive to run.
- [Jul-7-10] Page 355/357 footnote/text: popen iteration, call iter() first
[No fix required]
These sections describe the way that popen iterators fail in 3.X for certain
use cases. The discussion is correct, but not complete. Technically, popen
objects support I.__next__() but not next(I) directly, unless I = iter(I)
is called first. Automatic iterations work because they do call iter() first,
not simply because they run I.__next__() instead of next(I).
In effect, the initial iter() call triggers the wrappr's own __iter__ which
returns the wrapped object that actually has a __next__ itself. Without the
initial iter(), clients instead rely on the wrapper's __getattr__ to intercept
the call on next() and delegate to the wrapper, which no longer works in 3.X.
Regardless, this is still a change, and an arguable regression, from Python 2.6.
This is a subtle issue which is described in more detail in Programming Python 4th
Edition, and can be studied in Python's os.py file, but as a quick summary:
>>> import os
>>> for line in os.popen('dir /B *.py'): print(line, end='')
...
helloshell.py
more.py
>>> I = os.popen('dir /B *.py')
>>> I
<os._wrap_close object at 0x0148C750>
>>> I = os.popen('dir /B *.py')
>>> I.__next__()
'helloshell.py\n'
>>> next(I)
TypeError: _wrap_close object is not an iterator
>>> I = os.popen('dir /B *.py')
>>> I = iter(I)
>>> I.__next__()
'helloshell.py\n'
>>> next(I)
'more.py\n'
- [Jul-7-10] Page 475: enclosing scopes lambda, common confusion
[No fix required]
A reader wrote the following about an example which been asked about enough
times to warrant posting the interchange here:
> The definition of knights() is shown as
>
> def knights():
>
> However, I think that it should be
>
> def knights(x):
>
> Because 2 lines below refers to x in
>
> action = (lambda x: title + ' ' + x)
>
> I am not sure how the value of x is defined without being passed in.
No, this isn't an error (try running this example's code yourself--it works
exactly as shown in the book). This example does confuse, though; I believe
I answered the same question for the 3rd Edition.
The critical point here is that the lambda makes a new function which is
returned (and not called) by knights. The function created by the lambda
has the required "x" argument; when the knights return value is later called
(by the code "act('robin')), knights is not called again--instead, the argument
is passed to the "x" in the lambda function. The name "title" is fetched from
the lambda function's enclosing scope, but "x" is the lambda function's own
argument.
If that's difficult to grasp, remember that lambdas can always be replaced by
the name of a function previously defined with a def; here's the original and
a def-based equivalent:
def knights():
title = 'Sir'
action = (lambda x: title + ' ' + x)
return action
act = knights()
print(act('robin')) # both print 'Sir robin'
def knights():
title = 'Sir'
def action(x): return title + ' ' + x
return action
act = knights()
print(act('robin'))
- [Jul-7-10] Page 594: exec() in functions requires eval() or ns['x']
[No fix required]
The bottom part of this page describes how to import a module given
its name as a string. It uses exec() to import, and then uses the
module's name as a simple variable; this works because the code is
typed at the interactive prompt, and the module's name thus becomes
a global variable on the fly.
Note, however, that if you use exec() to import amodule by namestring
within a function, you must also use eval() to reference the
imported module, since its name is not recognized as an assigned
local when Python creates the function. Passing an explicit namespace
dictionary to exec() and later indexing it can have the same effect:
>>> def f():
... exec("import string")
... print(string)
...
>>> f()
Traceback (most recent call last):
File "", line 1, in
File "", line 3, in f
NameError: global name 'string' is not defined
>>> def f():
... exec("import string")
... print(eval("string"))
...
>>> f()
<module 'string' from 'c:\python31\lib\string.py'>
>>> def f():
... ns = {}
... exec("import string", ns)
... print(ns["string"])
...
>>> f()
<module 'string' from 'C:\Python31\lib\string.py'>
- [Sep-27-10] Page 638, middle of page: clarification on object attribute paths semantics
[No fix required]
A reader wrote with a question about the externally defined method on this page:
> I have a question/request for clarification for self.name.upper() in the context
> of the text below:
>
> "Even methods, normally created by a def nested in a class, can be created completely
> independently of any class object. The following, for example, defines a simple
> function outside of any class that takes one argument:
>
> >>> def upperName(self):
> ... return self.name.upper() # Still needs a self
>
> There is nothing about a class here yet it's a simple function, and it can be called
> as such at this point, provided we pass in an object with a name attribute (the name
> self does not make this special in any way)"
>
> My question: I am lost about self.name.upper(). Why is this self.name.upper() instead
> of simply self.upper()?
>
> From the context, 'name' is an attribute of object x and also an attribute of class rec.
> How can this 'name' attribute have an attribute (the upper() function) of its own? Is
> it a "nested attribute"? Is there even such a thing in Python?
Well, the code is correct as shown, but the "self" in it might be a bit confusing (it's
just a simple variable name here). I'd call this nested objects, not nested attributes.
To understand it fully, you must evaluate it the way Python does--from left to right,
and one expression/operation at a time. Given "self.name.upper()", and adding parenthesis
to emphasize the order of operations:
- (self.name) fetches the value of a "name" attribute from whatever object variable "self" happens to reference.
- ((self.name).upper) then fetches the value of an "upper" attribute from whatever object was returned by step 1.
- ((self.name).upper)() finally calls the function object that "upper" is assumed to reference, with no arguments.
The net effect is that "self" references an object, whose "name" attribute references a
(string) object, whose "upper" attribute references a (callable) object. It's object
nesting; in general, that's what class instance state information always is--nested
objects assigned to instance attibutes.
And that's why it works to pass "x" to this function directly: "x" is a class instance
object, whose "name" attribute references a string object with an "upper"; "x" has no
"upper" attribute itself. The "self" function argument is just a reference to the same
object referenced by vatiable "x", whether the function is attached to a class or not.
- [Sep-27-10] Page 974 1st paragraph: more on property use case example
[No fix required]
A reader wote with two questions about one of the property examples in the
advanced managed attributes coverage of Chapter 37:
> re: "To understand this code, it's crucial to notice that the attribute
> assignments inside the __init__ constructor method trigger property
> setter methods too."
>
> (Using python 2.6.5, linux) Stepping with pydev debugger through
> Attribute_validation_w_properties it appears instance attribute
> assignments are only intercepted by properties for re-assignments,
> eg. bob.name = 'Bob Q. Smith' but not during instatiation since
> self._name remains 'Bob Smith" not "bob_smith" as setter implies.
> Correct ???
>
> Also: inside __init__ "name mangling" missing out and perhaps just
> one leading underscore ?
>
> self._acct
> self._name
> self._age
> self.addr
No, the example does work as shown and described, and the "__name" attributes format
is intended. However, this is arguably one of the most subtle examples in the book,
so I'll try to clarify a bit here.
To see that the setter is indeed called for assignments in __init__ at instance creation
time, try adding a print() in the setter methods, and either run the self-test code or
import the class and create an instance interactively:
class CardHolder...
def setName(self, value):
print('in setName')
>>> CardHolder('11111111', '25', 3, '44')
in setName
<test.CardHolder object at 0x01410830>
The setter is called from __init__ when the instance is first created and the attribute
is assigned, under both Python 3.X and 2.X. Also make sure that you derive the class
from "object" under 2.X to make it a new-style class. As explained earlier in this chapter
(and in Chapter 31), property setters don't quite work under 2.X without including "object"
in the superclass list; once an attribute name is mistakenly assigned directly on an instance,
it hides the property getter in the class too (perhaps this was the entire issue here?):
class CardHolder(object): # required in 2.X
With this change results under 2.6 and 3.1 are identical. You'll also need to use 2.X-style
print statements or a from __future__ for 3.X-style print calls, of course; see earlier in
the book for print() in 2.X:
from __future__ import print_function
The other oddness in this example (which is covered earlier in the book but perhaps not
explained as explicitly for this example itself as it could have been) is that names
beginning with 2 underscores like "__name" are pseudo-private attributes: Python expands
them to include the enclosing class's name, in order to localize them to the creating class.
They are used intentionally here to avoid clashing with the real attribute names such as
"name" that are part of the class's external client API. Python mangles each in-class
appearance of the attribute like this:
__name ...becomes... _CardHolder__name
The single underscore naming pattern "_name" used elsewhere in this chapter is a weaker
convention that informally attempts to avoid name collisions, but "__name" truly forces
the issue, and is especially useful for classes like this one which manage attribute
access but also need to record real state information in the instance. Clients use
"name" (the property), and the expanded version of "__name" (the data) where state is
actually stored is more or less hidden from them. Moreover, unlike "_name", it won't
clash with other normal instance attributes if this class is later extended by a subclass.