Learning Python 4th Edition: Recent Notes

Below are book notes which either span printings or were posted after or just before the first reprint (January, 2010).

Contents

  1. Can I learn programming? [Aug-17-10]
  2. What about prerequisites for beginners? [Sep-24-10]
  3. Python 2.7: new features, and the end of the 2.X line (?) [Jul-2-10]
  4. Python 3.2: the 3.X line continues its evolution [Sep-28-10]
  5. Reflections on Python 3.X after updating Programming Python [Jul-2-10]
  6. Recent book reviews [Apr-30-10]
  7. PSF newsletter author interview [Feb-11-10]
  8. Focus, "2.0" [Dec-28-09]
  9. What's been added in the 500 new pages of this edition? [Dec-16-09]
  10. What's up with the rat on the cover? [Dec-15-09]

Can I learn programming? [Aug-17-10]

I get a lot of email from people asking if they can learn to program, and whether or not Python is a good choice in this role. Since this is a frequently asked question, below is a recent response on this topic to a Sports Medicine physician in Uruguay, who also asked how Python compares to AppleScript.

Hello,

Some of what you're asking is subjective, of course, and I
can't give you any absolute answers.  Learning to program
well in any language requires a substantial amount of focus 
and effort, and is not for everyone.  It depends as much upon
the individual as the language and resources.

Having said that, I can say that Python still seems relatively 
simple when compared to larger languages such as C++ and Java,
and tends to be more broadly functional and applicable than 
most domain-specific languages such as AppleScript.

I think Python's a good language to get started in for these 
two reasons -- you can pick up a useful subset quicker, and can
move on to learn how to apply it in specific application domains
as required over time.  Python tends to be useful for just about 
anything you can do with computers; search the web for links 
to Python resources for the specific tasks you listed.  As an
added bonus, Python tends to promote a quality of code that 
becomes more valuable as your programs become larger.

Again, though, much depends on your goals, so your experience
might vary arbitrarily.  Software in general is a challenging 
domain, and Python itself has grown substantially advanced 
features over the years which are not always entirely optional.
In fact, the sizes of the current Learning Python and the
upcoming Programming Python are testaments to both these points.  

Though not always obvious to those engaged in simple scripting
(or, regrettably, to publishers of books which promise expertise
overnight), full-blown software development is as tangible a field
as your own, and mastering it requires a similar investment in time.
That's not meant to discourage, of course: learning to program can 
be both fun and rewarding.  But you should scale your expectations 
accordingly.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)

For a related post, see also the section below: Focus, "2.0".


What about prerequisites for beginners? [Sep-24-10]

On a point related to the prior note, people also often ask about prerequisites to learning Python, especially for those new to programming. While this can vary much per individual, the following is another recent reply to a potential reader on this topic.

Hello,

It helps to have some programming background, of course; but a
basic knowledge of math (especially algebra) or logic can often
suffice for those new to programming.  Programming, after all, 
is largely just algebra or symbolic logic, with procedural flow 
and data structures mixed in.  

Python does introduce some more advanced programming concepts 
such as object-oriented programming and functional tools which
you may or may not be familiar with (and can sometimes seem 
daunting to beginners on first glance), but these can often be 
picked up gradually as your skill set evolves.

Technical features aside, motivation also tends to matter as much
as anything when it comes to learning Python, or programming in 
general.  In the end, a logical approach to problem solving and 
the ability to focus are the true keys to success in the software
field.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


Python 2.7: new features, and the end of the 2.X line (?) [Jul-2-10]

The fourth edition of Learning Python is based on Python 2.6 and 3.1, but applies to the entire 2.X and 3.X lines (as noted in the Preface, "3.0" in the book really means "3.X" in general). A new and most likely final release in the 2.X line, Python 2.7, is due to be released shortly.

Per current plans, at least, 2.7 will be the last major 2.X series release, but will have a long maintenance period in which it will continue to be used in production work. After 2.7, new development is to shift to the Python 3.X line. Besides being the last major release in the 2.X line, 2.7 incorporates as backports a handful of features described in the book that were formerly available only in the 3.X line. Among them:

The book presents these as 3.X topics, but they now apply equally to 2.7. For the full story on 2.7, check out its release notes online, at this page, as well as its "What's New" changes overview document at this page.

[Update Nov-3-2010] More on 2.X's future

When I say that 2.7 is the end of the 2.X line, I mean that it's the last 2.X version that will be officially developed by the current python-dev core developers group (a group that also originally promised a 2.8, and possibly more). Given Python's open source nature, it's not impossible that 2.X will be forked and managed by a different group of developers; in fact, this was recently discussed on the python-dev email list.

The reality today is that although 2.7 is the last 2.X release per 3.X-focused python.org developers, some 2 years after 3.X's initial release 2.X still constitutes the vast majority of the Python user and code bases. As one metric, at this writing 2.X is still downloaded from python.org 3 to 4 times more often than 3.X, at least based on current web download statistics for Windows installers (this ignores other platforms that typically ship with 2.X today, and is probably biased towards new users normally inclined to pick newer versions; other measures tend to skew 2.X's dominance even higher).

Or, to borrow a Monty Python line, "I'm not dead yet..."; because it's impossible to know 2.X's future today, stay tuned to the web for developments.


Python 3.2: the 3.X line continues its evolution [Sep-28-10]

As mentioned in the prior note, the fourth edition of Learning Python is based on Python 2.6 and 3.1. The next version in the 3.X line, Python 3.2, is now in alpha form and should be released by early 2011. Although its feature set is still being hashed out, it introduces a handful of noteable changes to (and potential incompatibilites with) both 2.X and earlier 3.X releases. Among these:

Threading implementation change: time slices

The internal implementation of threading has changed to reduce contention. In particular, the GIL has been changed to use absolute time slices instead of a virtual machine instruction counter. Since threading is not covered in any sort of depth in Learning Python, this has more impact on the book Programming Python 4th Edition.

Byte-code files storage model change: __pycache__

The way that byte-code files are stored is being changed to avoid the potential for collisions when multiple Python interpreters are installed. In Python 2.6 and 3.1 (and hence in Learning Python), ".pyc" bytecode files are stored in the same directory as the corresponding ".py" source file. In Python 3.2, bytecode files will instead be stored in a "__pycache__" subdirectory created under the package directory, and have names that designate the creating interpreter (e.g., "mymodule.cpython-32.pyc"). This way, source directories are not cluttered with byte-code files, and different Python implementations do not step on each other's work.

Technically, multiple Python interpreters work today in 3.1 and earlier, but this change will avoid requiring each different interpreter to recreate the byte-code file anew when the file's internal "magic" version number does not match that of the importing interpreter (this number is tested in addition to file timestamps to know when a recompile is necessary). The new model allows a single source directory to record cached byte-code for multiple Python versions and implementations, and thus avoids extra start-up time for module recompiles. Note that this does not apply to Python's own standard library, as each version generally installs its own copy of the library.

This change seems a generally good thing, but its downside is that some scripts which use a module's imported file name to infer its source file name may need to be changed to work under 3.2 if coded to be dependent on directory structure. Similarly, makefiles, custom importers, and programs that search for ".pyc" files and simply strip off the last letter to derive the source file's name are also prone to requiring updates for this change.

In terms of module-to-source mapping, though, this is mostly an issue for scripts written for 2.X, and then only if they are not robust. In all 3.X, a module's __file__ normally names its original ".py" source-code file, not its ".pyc" byte-code file, even if the byte-code file was loaded diectly. Hence, most 3.1 programs that derive source names from __file__ should continue to work in 3.2, even if they allow for legacy "pyc" __file__ names too. The __file__ may name the ".pyc" file's path in 2.X, but as long as the 2.X script allows for __file__ to name either a ".py" or ".pyc" as it should, it will likely work as is in 3.2; the 3.X ".py" __file__ will suffice.

As a concrete example, at least one program in the upcoming book Programming Python 4th Edition, the source-code viewer option in PyMalGUI's non-HTML help window, uses a module's __file__ to infer its source file, assuming the pre-3.2 storage model. Since __file__ is still the ".py" in 3.2, this example should work as is (though this is to be verified -- hopefully, this change will not have made the book out of date between its composition and its publication!). For more on this 3.2 change in general, see the PEP description at python.org.

3.X str/bytes split supported better by Python itself

The str/bytes (Unicode text/binary data) string types dichotomy in 3.X is now better supported by Python itself. Ironically, Python's own library did not fully resolve this split. Email content and many Internet support modules, for example, were especially affected (in fact, the email package and CGI uploads are still broken for some use cases in 3.1's library). Python 3.2 will include a number of fixes on this front, though a full rewrite of the email package is still pending.

Again, this change bears primarily upon the applications- and library-focused book Programming Python 4th Ed. In contrast, Learning Python deals primarily with the core language fundamentals of the 3.X string model, not library implications. There has also been a minor change to filesystem Unicode encoding support, which I'll omit here.

Other 3.2 changes

For more on 3.2, be sure to see its "What's New" document, available on Python's web site and packaged with the release itself.

[Update Dec-31-10] For additional notes on 3.2 changes, especially on the way its Unicode enhancements impact the applications book Programming Python 4E, see also that book's 3.2 updates note.


Reflections on Python 3.X after updating Programming Python [Jul-2-10]

I recently finished writing the new 4th Edition of Programming Python--the follow-up book to Learning Python which focuses on applying Python to real domain tasks (systems programming, GUIs, the Web, databases and text, and so on). It should be available sometime this fall. Like the Learning Python update, this was a 6 month project which went through line-by-line to refresh and polish the book in general. Among other things, this edition's examples include a Unicode-aware text editor, and a feature-rich 5K-line desktop email client which supports Internationalized message content and headers. We also trimmed some material along the way, or made it external, to avoid new growth this time around. For instance, 4 prior chapters are gone in the new edition, having been absorbed in part elsewhere or cut altogether.

Unicode is more pervasive than you might expect

The main change in this book's new edition, though, is that it has been updated to cover and use Python 3.X only. After going through this update, the good news is that Python 3.X is a viable systems development language today. And as expected, core language changes were relatively straightforward to accommodate. However, the update also underscored that the 3.X Unicode model is more likely to impact programmers than I formerly assumed. Once you enter the realm of realistic programming and the standard libraries it employs, Unicode seems to pop up everywhere--in file content, file names, directory walkers, GUI text displays, network interfaces, Internet protocols, web content, and even in IPC and persistence interfaces such as pipes, DBM files, shelves, and pickles.

In many cases, this requires bringing a new mindset to bear on the notion of data. Essentially, anywhere that you're used to processing "text", you may now be required to amend your thinking to clarify "which kind". For example, any code that opens a text file must now also consider its Unicode encoding if it might ever contain text other than the platform's default. The opening application will have to arrange to know what that encoding may be, or ask its user. Moreover, many contexts and APIs which formerly dealt with "strings" now deal in terms of "byte strings", which cannot be treated as text directly--they print differently, do not support formatting, and don't mix with text strings at all.

Unicode example: the PyEdit text editor

As a concrete example, the PyEdit text editor example in this book must take Unicode into account when opening and saving files; when displaying text in a GUI; and when searching files in directories. For opens, it must ask the user for an encoding (suggesting the platform default) if one is not provided by the client application. In addition, it must rely on the GUI toolkit's own support for Unicode text, and its new threaded "grep" directory search tool must also ask for an encoding to apply to all files in the tree before it begins, and skip files which fail to decode--it's not impossible that text files in the tree may be encoded in a variety of formats.

In fact, it's common on Windows to have files in ASCII, UTF-8, and UTF-16 mixed in the same tree (Notepad's "ANSI", "Utf-8", and "Unicode"), and even others in trees that contain content obtained from the Web or email. Opening all these with UTF-8 would trigger exceptions in Python 3.X, and opening all these in binary mode yields encoded text that will likely fail to match a search key string. Technically, to compare at all, we'd still have to decode the bytes read to text, or encode the search key string to bytes, and the two would only match if the encodings used both succeed and agree.

Really, opening in binary mode to read raw byte strings in 3.X mimics the behavior of text files in 2.X, and underscores why forcing programs to deal with Unicode is sometimes a good thing--binary mode avoids decoding exceptions, but probably shouldn't, because the still-encoded text might not work as expected. The names of files searched might fail to decode as well, and the book's PyMailGUI email client is even more Unicode dependent (text parts and headers of messages may all be subject to both MIME and Unicode encodings, and full text must also be decoded for parsing today), but I'll leave this to the book to explain.

Unicode issues in the standard library

Paradoxically, Python 3.X's own libraries are still coming to grips with the implications of Unicode as well, especially in the Internet sector. For instance, the email package in Python 3.1 has significant issues (really, bugs) introduced by 3.X's str/bytes split. By proxy, these issues also impact CGI uploads and perhaps more (in short, the parser still lives in the land of str, but data streams can have mixed text/bytes content).

Some of this is being addressed--Python 3.2 will include a set of patches for the email package to fix many such issues, and a full email package rewrite is in the works, though it may not be entirely backward compatible with the package in 3.1. Other dark Unicode corners lurking in the standard library, however, may require some further honing with time.

The bottom line

There are still some domains in which Unicode can be safely ignored, of course. In numeric programming, for example, Python is typically used for lightweight scripting of nontext libraries, not for full-scale systems development or general programming tasks. However, especially if your work touches files, networks, the Internet, GUIs, databases, or text processing at large, Unicode is probably no longer an optional topic for you in Python 3.X. At least arguably, it shouldn't have been optional before either. Indeed, this book's own email client example was simpler in the prior edition largely because it was too narrow--it handled ASCII messages only, not the full spectrum of Internet message content.

Still, by forcing the issue Python 3.X makes Unicode optional to far fewer than it was in Python 2.X. If you're relatively new to I18N, you should be prepared for a learning curve when you take the leap to 3.X. Given the prominence of Unicode in the software world today, though, you'll probably be glad you did.


Recent book reviews [Apr-30-10]

A couple recent reviews of this book that I've been informed of: on Dr. Dobbs Journal, and Slashdot.


PSF newsletter author interview [Feb-11-10]

I recently gave a written interview for the PSF's first newsletter, to be distributed at PyCon 2010. It discusses the new books, Python 3.X, and the software field at large. The newsletter included an abbreviated version, but the full interview is available online at this page.


Focus, "2.0" [Dec-28-09]

I trust that most readers will understand that the earlier notes about this book's size on this site are intended to be informational, not apologetic. Python is a complex subject, and has grown more so in recent years, and learning to program in Python well requires substantial time and effort. The size of this book reflects that requirement, as well as the common needs of the thousands of students I've taught over the last 15 years.

I don't want to discourage beginners, but programming is more demanding, but also more rewarding, than most newcomers usually expect. True competence is a multiyear process, not a two-hour skim. If you're not willing to put in the required time and effort to learn to program well, you probably won't be happy in the software development field in general. On the other hand, if you do invest the effort required to master a tool like Python with an in-depth book like this one, you'll likely find that you are better prepared than many of this field's current practitioners.

There is a broader perspective worth noting here too. Many computer book publishers seem to have been competing in recent years to dumb-down their titles, in order to appeal to a perceived new market demographic. The examples are readily available -- books for "Dummies"; books that promise mastery of complex topics "in 24 Hours"; a "Complete Idiot's" guide to just about everything under the sun; even "Manga Guides" to databases, statistics, and physics (no, really).

My own publisher is guilty as much as any with its "Head First" series: books that by design come with all the distraction of a web page, and sometimes seem heavier on cartoons than technical content. O'Reilly's marketing for Head First Programming, for example (a Python-based book which might be very good, despite some of its PR) tries to woo readers by conceding that "your time is too valuable to waste struggling with new concepts", and promising that you won't be asked to read text that "puts you to sleep." (See the full description here; these parts were also later cut-and-paste into the Head First Python page.)

Personally, I find such statements to be horrifying. Books are supposed to elevate their readers, not pander to them. By lowering the intellectual bar this way, we leave readers ill-equipped for the real world, and virtually guarantee the demise of quality in the software development field at large. The broader dumbed-down trend that such statements reflect seems cynical at best; at worst, it underscores a systemic failure of the gatekeepers of human culture and knowledge. Significant achievements require significant effort. Implying otherwise does a great disservice to everyone involved, and just might help to imperil a generation that's already been oversold the idea that focus is optional to success.

Your mileage may vary, of course; there are many types of learners, and no one book can satisfy every possible audience. Indeed, some of the books mentioned earlier might seem a lot more palatable if they were marketed as books for children and teenagers -- a legitimate audience in its own right. Apart from such niches, though, there is a core philosophical value at stake here: to me, depth should never be sacrificed in the name of shortcuts to proficiency that serve neither readers nor disciplines. Both parties deserve better from authors of technical books, regardless of the impact on page-count.

Entertainment matters in technical books too (Python is named for a comedy group, after all), but it shouldn't be elevated to the extent that it trumps education. If spending 14 years teaching Python to thousands of students has taught me anything, it's that what beginners need is not necessarily the same as what they want. Complex subjects are complex. It helps no one to sugar-coat them so much as to hide their true nature. That position may not open vast new markets for technical books, of course, but maybe that shouldn't have been the goal in the first place.

[Update Jan-8-10]: Related article

For a related opinion, check out the great essay by Peter Norvig, Google Director of Research, titled "Teach Yourself Programming in Ten Years." Not to diminish the potential consequences of this trend, but it's also encouraging to note that recent reader feedback sides much more with depth than speed. Maybe publishers should give programmers a bit more credit than they sometimes do...

[Update Jul-2-10]: CP4E: Computer Programming is not for Everybody

For another case in point on this thread, see the documents pertaining to an initiative promoted by the early Python world in the 1990s known as CP4E-- Computer Programming for Everybody. I stumbled onto this again just recently myself. It's a dated but interesting and relevant read, and captures some of the mindset of its time.

In retrospect, this initiative's entire premise seems flawed. Programming is not for everybody, and suggesting that it is does harm both to the field, and to those who hope to enter it. Depending on how you choose to define it, simple scripting tasks may be within the reach of many, but in most cases this qualifies as full-scale programming no more than inflating a car's tires qualifies as automotive mechanics or engineering. Software development is a professional technical field. It's not a field that can be opened up to novice masses any more than is medicine, law, or physics.

Again, I don't mean to discourage beginners here. Software is a very rewarding field for those who are willing to take time to learn it well. But there's a tangible misperception today that programming, unlike any other engineering discipline, is somehow simple enough to pick up overnight. This is arguably absurd, and seems embarrassingly naive in hindsight. As only those who have actually worked in this field could attest, it's also just plain wrong. Programming is challenging, time-consuming, and based upon complex underlying concepts that take years to master. It may indeed be within the grasp of many, but not without much more time, effort, and motivation than we somehow seem to have become comfortable implying.

And if you're still not buying this, the next time you're in your doctor's office, imagine how you'd react to find "Surgery for Dummies" or "Teach Yourself Medicine in 24 Hours" on the bookshelf. Software engineering may be a less meaty endeavor, but the same principles apply (despite what you may have heard).

[Update Nov-3-10]: Related books

From a broader perspective, it can be argued that the rise of the distraction ecosystem that is today's Internet has been a major factor behind the dumbed-down trend in technical publishing. If you're interested in considering the deeper social implications of the world which us computer folk have wrought, I suggest the following books (among many others), which address the topic in much better depth than I can attempt here:

As documented in these and elsewhere, and despite the fully ungrounded claims of some book marketeers, the actual research done to date shows definitively that distraction does not aid learning; it radically impairs it.


What's been added in the 500 new pages of this edition? [Dec-16-09]

A reader who had just purchased a copy of the 3rd Edition of this book wrote to ask what he would be missing in the 500 additional pages present in the 4th Edition (which grew from roughly 700 to 1200 pages). Given the prevalence of the 3rd Edition, this seems like a general question worth addressing here.

First off, if you're using Python 2.5 or older, and are sure you will be for some time to come, the 3rd Edition is probably okay for getting started. If you care about either Python 2.6 or 3.X, though, or might in the near future, the 4th Edition is a better choice.

The 4th Edition has been updated to cover both Python 2.X and 3.X (technically, 2.6 and 3.1). It is designed to serve readers using either Python line, and be a resource for those transitioning between them. As I mention in this Edition's Preface, the 2.X/3.X split could very well be permanent, given Python's large existing user base (in a world where a company like Google still uses Python 2.4 internally, 2.X could very well endure as long as Fortran77 has). I didn't want this book to leave either group out in the cold.

If you are too new to Python to know which version you must care about, you probably want to start with Python 3.X and the 4th edition of this book, unless you know that you'll be using third-party software that is available for 2.X only. Many such packages still support 2.X only today (including numeric programming libraries and popular web frameworks), but this is changing, and is expected to improve more over time. Python 3.X is the most likely future of Python, despite its current lack of dominance. Even if you are stuck in the 2.X world today, though, the 4th Edition's dual 2.X/3.X coverage allows you to hedge your bets for the future.

As for specific additions, some of this is difficult to reconstruct on a page count basis. Books evolve organically while they are being written, and there isn't a 1-to-1 relationship between pages in the 3rd and 4th Editions. But in terms of what I can quantify, the 500 extra pages in the 4th Edition break down this way:

New chapters: 226 pages

There are 5 new chapters, 4 of which are in a new "Advanced Topics" part at the end. Here are the chapters added:

The first of these appears in the classes/OOP part, and provides a new and much needed OO tutorial. The last 4 appear in the new advanced topics part which is 189 pages long. The decorators chapter's larger size owes to the fact that it includes some larger case studies that are more satisfying than many of the other examples in the book, which are generally small and narrowly focused (what some would call didactic).

All together, these 5 new chapters add 226 pages as printed. Among these, today I'd classify only the last 3 as truly "optional reading" for many people; the coverage of Unicode strings in Chapter 36 is more widely relevant, and the OO tutorial is core material. That means the 5 new chapters account for 82 pages of new fundamentals material, plus just 144 pages of potentially optional reading at the end.

Python 2.6 + 3.X changes and additions: 200 pages? (most of the rest)

As mentioned, the 5 new chapters account for 226 of the 500 new pages, roughly half. In other words, even without the new chapters, this book would have still grown to be 1,000 pages long in the new 4th edition (and a bit longer if just 144 new pages are truly optional reading).

The remainder of the added size mostly stems from coverage of new or changed features in Python 2.6 and 3.X, along with the need to cover two incompatible versions of Python in one book. For an overview of what was addressed in the language changes/additions category, see the tables at the end of the draft Preface excerpt which I've posted here: http://www.rmi.net/~lutz/lp4e-preface-preview.html

Beyond specific language changes, covering two versions implies some extra size by itself. I discuss this further in relation to the book's size later on this page. In short, we opted to cover both Python lines instead of requiring readers to buy two books; until the Python world moves beyond its current 2.X/3.X split state, widely-used Python books like this one don't really have many other good options.

For instance, Python 3 comes with changed syntax (e.g., raise, except, and print); new tools (the nonlocal statement, set and dictionary comprehensions, and extended sequence unpacking); and subtly different semantics (the new string model, the wider role of generators and iteration, and new-style classes). This is in addition to new tools added to both Python 2.6 and 3.X (class decorators, the new string format method, and so on). All these imply book growth, of course, especially when they must be presented in tandem with 2.X variations to a user community divided between Python lines.

Extended coverage of some existing topics: negligible

Some minor growth owes to extended coverage for topics already present in the 3rd Edition. This includes new material and examples for tools such as operator overloading, new-style classes, decorators, and iterators and generators. However, most of the growth in this category stems either from new features in 2.6/3.X, or formerly obscure features that have grown to become common practice today.


What's up with the rat on the cover? [Dec-15-09]

Someone recently wrote to ask about the story behind the wood rat (no, it's not a mouse) on the cover of this book. I've been asked this many times over the years. Mostly, it's because O'Reilly didn't ask what our cover preference was when the first edition came out a decade ago.

Seriously, their rationale back in 1999 was that wood rats are a common food of pythons. This book would become a common tool of Python programmers, hence the tie-in was born. In retrospect, a parrot or bunny might have been better given the Monty Python namesake, but at the time this choice was left to professional graphic artists to decide, not to authors. I can't speak for other software developers, but given the state of my own drawing skills, this policy was probably for the best...


Older notes: see the first printing notes page

Back to this book's main updates page