When Pythons Attack: Common Mistakes of Python Programmers (2)

This is a reconstruction of a page originally located at oreilly.com. For the story behind its resurrection, see the note at the top of this article's first page.

When Pythons Attack:
Common Mistakes of Python Programmers

Pages: 1, 2

Programming Mistakes

Finally, here are some of the problems you may come across when you start working with the larger features of the Python language — datatypes, functions, modules, classes, and the like. Because of space constraints, this section is abbreviated, especially with respect to advanced programming concepts; for the rest of the story, see the tips and "gotchas" sections of Learning Python, 2nd Edition.

File-Open Calls Do Not Use the Module Search Path

When you use the open() call in Python to access an external file, Python does not use the module search path to locate the target file. It uses an absolute path you give, or assumes the filename is relative to the current working directory. The module search path is consulted only for module imports.

Methods Are Specific to Types

You can't use list methods on strings, and vice versa. In general, methods calls are type- specific, but built-in functions may work on many types. For instance, the list reverse method only works on lists, but the len function works on any object with a length.

Immutable Types Can't Be Changed in Place

Remember that you can't change an immutable object (e.g., tuple, string) in place:

T = (1, 2, 3)
T[2] = 4          # Error

Construct a new object with slicing, concatenation, and so on, and assign it back to the original variable if needed. Because Python automatically reclaims unused memory, this is not as wasteful as it may seem:

T = T[:2] + (4,)  # Okay: T becomes (1, 2, 4)

Use Simple `for` Loops Instead of `while` or `range`

When you need to step over all items in a sequence object from left to right, a simple for loop (e.g., for x in seq:) is simpler to code, and usually quicker to run, than a while- or range-based counter loop. Avoid the temptation to use range in a for unless you really have to; let Python handle the indexing for you. All three of the following loops work, but the first is usually better; in Python, simple is good.

S = "lumberjack"

for c in S: print c                   # simplest

for i in range(len(S)): print S[i]    # too much

i = 0                                 # way too much
while i < len(S): print S[i]; i += 1

Don't Expect Results From Functions That Change Objects

In-place change operations such as the list.append() and list.sort() methods modify an object, but do not return the object that was modified (they return None); call them without assigning the result. It's not uncommon for beginners to say something like:

mylist = mylist.append(X)

to try to get the result of an append; instead, this assigns mylist to None, rather than the modified list. A more devious example of this pops up when trying to step through dictionary items in sorted-key fashion:

D = {...}
for k in D.keys().sort(): print D[k]

This almost works — the keys method builds a keys list, and the sort method orders it — but since the sort method returns None, the loop fails because it is ultimately a loop over None (a nonsequence). To code this correctly, split the method calls out into statements:

Ks = D.keys()
Ks.sort()
for k in Ks: print D[k]

Conversions Happen Only Among Number Types

In Python, an expression like 123 + 3.145 works — it automatically converts the integer to a floating point, and uses floating point math. On the other hand, the following fails:

S = "42"
I = 1
X = S + I        # A type error

This is also on purpose, because it is ambiguous: should the string be converted to a number (for addition), or the number to a string (for concatenation)?. In Python, we say that explicit is better than implicit (that is, EIBTI), so you must convert manually:

X = int(S) + I   # Do addition: 43
X = S + str(I)   # Do concatenation: "421"

Cyclic Datastructures Can Cause Loops

Although fairly rare in practice, if a collection object contains a reference to itself, it's called a cyclic object. Python prints a [...] whenever it detects a cycle in the object, rather than getting stuck in an infinite loop:

>>> L = ['grail']  # Append reference back to L
>>> L.append(L)    # Generates cycle in object
>>> L
['grail', [...]]

Besides understanding that the three dots represent a cycle in the object, this case is worth knowing about because cyclic structures may cause code of your own to fall into unexpected loops if you don't anticipate them. If needed, keep a list or dictionary of items already visited, and check it to know if you have reached a cycle.

Assignment Creates References, Not Copies

This is a core Python concept, which can cause problems when its behavior isn't expected. In the following example, the list object assigned to the name L is referenced both from L and from inside of the list assigned to name M. Changing L in place changes what M references, too, because there are two references to the same object:

>>> L = [1, 2, 3]        # A shared list object
>>> M = ['X', L, 'Y']    # Embed a reference to L
>>> M
['X', [1, 2, 3], 'Y']

>>> L[1] = 0             # Changes M too
>>> M
['X', [1, 0, 3], 'Y']

This effect usually becomes important only in larger programs, and shared references are normally exactly what you want. If they're not, you can avoid sharing objects by copying them explicitly; for lists, you can make a top-level copy by using an empty-limits slice:

>>> L = [1, 2, 3]
>>> M = ['X', L[:], 'Y']   # Embed a copy of L

>>> L[1] = 0               # Change only L, not M
>>> L
[1, 0, 3]
>>> M
['X', [1, 2, 3], 'Y']

Slice limits default to 0 and the length of the sequence being sliced. If both are omitted, the slice extracts every item in the sequence, and so makes a top-level copy (a new, unshared object). For dictionaries, use the dict.copy() method.

Local Names Are Detected Statically

Python classifies names assigned in a function as locals by default; they live in the function's scope and exist only while the function is running. Technically, Python detects locals statically, when it compiles the def's code, rather than by noticing assignments as they happen at runtime. This can also lead to confusion if it's not understood. For example, watch what happens if you add an assignment to a variable after a reference:

>>> X = 99
>>> def func():
...     print X      # Does not yet exist
...     X = 88       # Makes X local in entire def
... 
>>> func()           # Error!

You get an undefined name error, but the reason is subtle. While compiling this code, Python sees the assignment to X and decides that X will be a local name everywhere in the function. But later, when the function is actually run, the assignment hasn't yet happened when the print executes, so Python raises an undefined name error.

Really, the previous example is ambiguous: did you mean to print the global X and then create a local X, or is this a genuine programming error? If you really mean to print global X, you need to declare it in a global statement, or reference it through the enclosing module name.

Defaults and Mutable Objects

Default argument values are evaluated and saved once, when the def statement is run, not each time the function is called. That's usually what you want, but since defaults retain the same object between calls, you have to be mindful about changing mutable defaults. For instance, the following function uses an empty list as a default value and then changes it in place each time the function is called:

>>> def saver(x=[]):   # Saves away a list object
...     x.append(1)    # and changes it each time
...     print x
...
>>> saver([2])         # Default not used
[2, 1]
>>> saver()            # Default used
[1]
>>> saver()            # Grows on each call!
[1, 1]
>>> saver()
[1, 1, 1]

Some see this behavior as a feature — because mutable default arguments retain their state between function calls, they can serve some of the same roles as static local function variables in the C language. However, this can seem odd the first time you run into it, and there are simpler ways to retain state between calls in Python (e.g., classes).

To avoid this behavior, make copies of the default at the start of the function body with slices or methods, or move the default value expression into the function body; as long as the value resides in code that runs each time the function is called, you'll get a new object each time:

>>> def saver(x=None):
...     if x is None: x = []   # No arg passed?
...     x.append(1)            # Changes new list
...     print x
...
>>> saver([2])                 # Default not used
[2, 1]
>>> saver()                    # Doesn't grow now
[1]
>>> saver()
[1]

Other Common Programming Traps

Here's a quick survey of other pitfalls we don't have space to cover in detail:

Statement order matters at the top level of a file: because running or importing a file runs its statements from top to bottom, make sure you put unnested calls to functions or classes below the definition of the function or class.
reload doesn't impact names copied with from: reload works much better with the import statement. If you use from statements, remember to rerun the from after the reload, or you'll still have old names.
The order of mixing matters in multiple inheritance: because superclasses are searched left to right, according to the order in the class header line, the leftmost class wins if the same name appears in multiple superclasses.
Empty except clauses in try statements may catch more than you expect. An except clause in a try that names no exception catches every exception — even things like genuine programming errors, and the sys.exit() call.
Bunnies can be more dangerous than they seem.

Mark Lutz is the world leader in Python training, the author of Python's earliest and best-selling texts, and a pioneering figure in the Python community since 1992.

O'Reilly & Associates recently (in December 2003) released Learning Python, 2nd Edition.