This material, based on live teaching experiences, became a new chapter in the book Learning Python. Its content here was revised for the 4th Edition, and further polished in the 5th Edition. See the end of this page for links to related reading. This page has a toggled Contents display and floating Top button if JavaScript is enabled in your browser. |
Spring 2009
We'll dig into more class syntax details in the next chapter. Before we do, though, I'd like to show you a more realistic example of classes in action that's more practical than what we've seen so far. In this chapter, we're going to build a set of classes that do something more concrete—recording and processing information about people. As you'll see, what we call instances and classes in Python can often serve the same roles as records and programs in more traditional terms.
Specifically, in this chapter we're going to code two classes:
Person
—a class that creates and processes information about people
Manager
—a customization of Person that modifies inherited behavior
Along the way we'll make instances of both classes and test out
their functionality. When we're done, I'll show you a nice
example use-case for classes—we'll store our instances in a
shelve
object-oriented database, to make them permanent. That
way, you can use this code as a template for fleshing out a
full-blown personal database written entirely in Python.
Besides actual utility, though, our aim here is also educational: this chapter provides a tutorial on object-oriented programming in Python. Often, people grasp the last chapter's class syntax on paper, but have trouble seeing how to get started when confronted with having to code a new class from scratch. Towards this end, we'll take it one step at a time here, to help you learn the basics—building up the classes gradually, so you can see how their features come together in complete programs.
In the end, our classes will still be relatively small in terms of code, but will demonstrate all of the main ideas in Python's OOP model. Despite its syntax details, Python's class system really is largely just a matter of searching for an attribute in a tree of objects, along with a special first argument for functions.
Okay, so much for the design phase—let's move on to implementation.
Our first task is to start coding the main class, Person
. In your favorite
text editor, open a new file for the code we'll be writing. It's a fairly
strong convention in Python to begin module names with a lowercase letter,
and classes with uppercase; like the name of self
arguments, this is not
required by the language, but it's so common that deviating might be confusing
to people who later read your code. To conform, we'll call our new
module file person.py
, and our class within it Person
, like this:
# file person.py (start) class Person:
We can code any number of functions and classes in a single module file
in Python, and this one's person.py
name might not make much sense if we
add unrelated components to it later. For now, we'll assume everything in it
will be Person
-related. It probably should be anyhow—as we've learned,
modules tend to work best when they have a single, cohesive purpose.
Now, the first thing we want to do with our Person
class is record basic
information about people—to fill out record fields, if you will.
Of course, these are known as instance object attributes in Python-speak,
and generally are created by assignment to self
attributes in class
method functions. And the normal way to give instance attributes their
first value is to assign them to self
in the __init__
constructor method—code run automatically by Python each time
an instance is created. Let's add one to our class:
# add record field initialization class Person: def __init__(self, name, job, pay): # constructor takes 3 arguments self.name = name # fill out fields when created self.job = job # self is the new instance object self.pay = pay
This is a very common coding pattern—we pass in the data to be attached
to an instance as arguments to the constructor method, and assign them
to self
to retain them permanently. In OO terms, self
is the newly created
instance, and name
, job
, and pay
become state information—descriptive
data saved on an object for later use. Although other techniques such as enclosing
scope references can save details too, instance attributes make this very explicit,
and easy to understand.
Notice how the argument names appear twice here. This code might seem a
bit redundant at first, but it's not. The job
argument, for example,
is a local variable in the scope of the __init__
function, but self.job
is an attribute of the instance that's the implied subject of the method
call. They are two different variables, which happen to have the same name.
By assigning the job
local to the self.job
attribute with self.job = job
,
we save the passed-in job on the instance for later use. As usual in Python,
where a name is assigned, or what object it is assigned to, determine what it means.
Speaking of arguments, there's really nothing magical about __init__
,
apart from the fact that it's called automatically when an instance is
made, and has a special first argument. Despite its weird name, it's a normal
function, and supports all the features of functions you've already learned.
We can, for example, provide defaults for some of its arguments, so
they need not be provided in cases where their values aren't available or useful.
To demonstrate, let's make the job
argument optional—it will default to None
,
meaning the person being created is not (currently) employed. If the job
defaults
to None
, we'll probably want to default the pay
to zero too for consistency
(unless some of the people you know manage to get paid without having a job!).
Really, we have to, because arguments in a function's header after the first default
must all have defaults too, according to Python's syntax rules:
# add defaults for constructor arguments class Person: def __init__(self, name, job=None, pay=0): # normal function args self.name = name self.job = job self.pay = pay
What this code means is that we'll need to pass in a name
when making Persons,
but job
and pay
are now optional; they'll default to None
and 0
if omitted, respectively.
The self
argument, as usual, is filled in by Python automatically to refer
to the instance object.
Okay so far; this class doesn't do much yet—it essentially just fills out the fields of a new record—but it's a real working class. At this point, we could add more code to it for more features, but we won't. As you've probably begun to appreciate already, programming in Python is really a matter of incremental prototyping—you write some code, test it, write more code, test again, and so on. Because Python provides both an interactive session and nearly immediate turnaround after code changes, it's more natural to test as you go, than to write a huge amount of code to test all at once.
Before adding more features, then, let's test what we've got so far, by making a few instances of our class and displaying their attributes as created by the constructor. We could do this interactively, but as you've also probably surmised by now, interactive testing has its limits—it gets tedious to have to reimport modules and retype test cases each time you start a new testing session. More commonly, Python programmers use the interactive prompt for simple one-off tests, but do more substantial testing by writing code at the bottom of the file which contains the objects to be tested, like this:
# add incremental self-test code class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay bob = Person('Bob Smith') # test the class sue = Person('Sue Jones', job='dev', pay=100000) # runs __init__ automatically print(bob.name, bob.pay) # fetch attached attributes print(sue.name, sue.pay) # sue's and bob's attrs differ
Notice how we make two instances here: bob
and sue
each have their own set
of self
instance attributes, so each is an independent record of information.
The bob
object accepts defaults for job
and pay
,
but sue
provides them explicitly.
Also note how we use keyword arguments when making sue
; we could pass by position
instead, but the keywords may help remind us later what the data is (and allow
us to pass the arguments in any left-to-right order we like). Again,
despite its unusual name, __init__
is a normal function, supporting everything
you already know about functions—including both defaults and pass-by-name
keyword arguments. When this file runs, the test code at the bottom prints
attributes of our two objects:
C:\misc> person.py Bob Smith 0 Sue Jones 100000
You can also type this file's test code at Python's interactive prompt
(assuming you import the Person
class there first), but coding canned tests
inside the module file like this makes it much easier to rerun them in the future.
As is, the test code at the bottom of the file works, but there's a big catch—its
top-level print
statements run both when the file is run as a script and imported as
a module. As a result, if we ever decide to import the class in this file in order
to use it somewhere else (and we will later in this chapter), we'll see the output
of its test code every time the file is imported. That's not very good software citizenship:
client programs probably don't care about our internal tests, and won't want to see our
output mixed in with their own.
Although we could split the test code off to a separate file, it's often more convenient
to code tests in the same file as the items to be tested. It would be better to arrange
to run the test statements at the bottom only when the file is run for testing,
not when the file is imported. That's exactly what the __name__
check is designed for,
as you learned earlier in the modules part of the book. Here's what this addition
looks like:
# allow this file to be imported as well as run/tested class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay if __name__ == '__main__': # when run for testing only # self-test code bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob.name, bob.pay) print(sue.name, sue.pay)
Now, running the file as a top-level script tests it, but importing it as a library of classes later does not—exactly what we're after:
C:\misc> person.py Bob Smith 0 Sue Jones 100000 c:\misc> python Python 3.0.1 (r301:69561, Feb 13 2009, 20:04:18) ... >>> import person >>>
When imported, the file now defines the class, but does not use it.
When run, this file creates two instances of our class as before, and
prints two attributes of each. Because each instance is an independent
namespace object, the values of their attributes differ—bob
's name
is not sue
's, and sue
's pay is not bob
's.
Fine point: I'm running all the code in this chapter under Python 3.0,
and using the 3.0 print
function call syntax. If you run under 2.6 the
code will work as is, but you'll notice parenthesis around some output lines
because the extra parenthesis in prints turn multiple items into a tuple:
c:\misc>c:\python26\python person.py ('Bob Smith', 0) ('Sue Jones', 100000)
If this difference is the sort of detail that might keep you awake at nights,
simply remove the parenthesis to use 2.6 print
statements. You can also avoid
the extra parenthesis portably by using formatting to yield a single object to
print; either of the following work in both 2.6 and 3.0, though the method
form is newer:
print('{0} {1}'.format(bob.name, bob.pay)) # new format method print('%s %s' % (bob.name, bob.pay)) # format expression
All looks good so far—at this point, our class is essentially a record factory; it creates and fills out fields of records (attributes of instances, in more Pythonic terms). Even as limited as it is, though, we can still run some operations on its objects. Although classes add an extra layer of structure, they ultimately do most of their work by embedding and processing basic core data types like lists and strings. In other words, if you already know how to use Python's simple core types, you already know much of the Python class story; classes are really just a minor structural extension.
For example, the name
field of our objects is a simple string; we can
extract last names of our objects by splitting on spaces and indexing—all
core data type operations, which work whether their subjects are embedded
in class instances or not:
>>> name = 'Bob Smith' # simple string, outside class >>> name.split() # extract last name ['Bob', 'Smith'] >>> name.split()[-1] # or [1], if always just two parts 'Smith'
Similarly, we can give an object a pay raise by updating its pay
field—that
is, by changing its state information in-place with an assignment. This also
uses basic operations which work on Python's core objects, whether they are
stand-alone, or embedded in a class structure:
>>> pay = 100000 # simple variable, outside class >>> pay *= 1.10 # give a 10% raise >>> print(pay) # or pay = pay * 1.10, if you like to type 110000.0 # or pay = pay + (pay * .10), if you _really_ do
To apply these operations to the Person objects created by our script, simply
do to bob.name
and sue.pay
what we just did to name
and pay
.
The operations are the same, but the subject objects are attached to attributes
in our class structure:
# process embedded built-in types: strings, mutability class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob.name, bob.pay) print(sue.name, sue.pay) print(bob.name.split()[-1]) # extract object's last name sue.pay *= 1.10 # give this object a raise print(sue.pay)
We've added the last two lines here; when run, we extract bob
's last
name by using basic string and list operations, and give sue
a pay
raise by modifying her pay attribute in-place with basic number operations.
In a sense, sue
is also a mutable object—her state changes in-place
just like a list after an append()
call:
Bob Smith 0 Sue Jones 100000 Smith 110000.0
The preceding works as planned, but if you show its code to a veteran software developer, they should probably be able to tell you that its general approach is not a great idea in practice. Hard-coding operations like these outside of the class can lead to maintenance problems in the future.
For example, what if you've hard-coded the last-name extraction formula at many different places in your program? If you even need to change the way it works (to support a new name structure, for instance), you'll need to hunt down and update every occurrence. Similarly, if the pay-raise code ever changes (e.g., to require approval or database updates), you may have multiple copies to modify. Just finding all the appearances of such code may be problematic in larger programs—they may be scattered across many files, split into individual steps, and so on.
What we really want to do here is employ a software design concept known as encapsulation. Encapsulation means that we wrap up operation logic behind interfaces, such that each operation is coded only once in our program. That way, there is just one copy to update in the future as our needs change. Moreover, we're free to change the single copy's internals almost arbitrarily, without breaking the code that uses it.
In Python terms, we want to code operations on objects in class methods, instead of littering them throughout our program. In fact, this is one of the things that classes are very good at—factoring code to remove redundancy, and thus optimize maintainability. As an added bonus, by turning operations into methods, they can be applied to any instance of the class, not just those that they've been hard-coded to process.
This is all simpler in code than it may sound in theory. The following achieves encapsulation, by moving the two operations from code outside the class, into class methods. While we're at it, let's change our self-test code to use the new methods we're creating, instead of hard-coding operations:
# add methods to encapsulate operations for maintainability class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): # behavior methods return self.name.split()[-1] # self is implied subject def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) # must change here only if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob.name, bob.pay) print(sue.name, sue.pay) print(bob.lastName(), sue.lastName()) # use the new methods sue.giveRaise(.10) # instead of hard-coding print(sue.pay)
As we've learned, methods are simply normal functions attached to classes,
and designed to process an instance of the class. The instance is the subject of
the method call, and is passed to the method's self
argument automatically.
The transformation to methods in this version is straightforward. The new lastName()
,
for example, simply does to self
what the previous version hard-coded for bob
,
because self
is the implied subject when the method is called. lastName()
also
returns the result, because this operation is a called function now; it computes
a value for its caller to use, even if it is just to be printed. Similarly, the
new giveRaise()
just does to self
what we had done to sue
before.
When run now, our file's output is similar to before—we've mostly just refactored the code for easier changes in the future, not altered its behavior:
Bob Smith 0 Sue Jones 100000 Smith Jones 110000
A few coding details are worth pointing out here. First, notice how sue
's pay is now still
an integer after a pay raise—we convert the math result back to an integer by calling int()
in the method. Changing to a floating-point number is probably not a significant concern for most purposes
(int
and float
have the same interfaces and can be mixed within expressions), and we may
need to address rounding issues in the future, but we'll simply truncate any cents here
for this example.
Second, notice how we're also printing sue
's last name this time—because the last-name
logic has been encapsulated in a method, we get to use it on any instance of the class.
As we've seen, Python arranges to tell a method which instance to process, by automatically
passing it in to the first argument, usually called self
. Specifically:
bob.lastName()
, bob
is the implied subject passed to self
.
sue.lastName()
, sue
goes to self
instead.
Trace though these calls to see how the instance winds up in self
. The net effect
is that the method fetches the name of the implied subject each time. The same happens
for giveRaise()
. We could, for example, give bob
a raise by calling giveRaise()
for both
instances this way too; but unfortunately, bob
's zero pay will prevent him from getting a
raise as currently coded (something we may want to address in a future 2.0 release of our software).
Finally, notice how the giveRaise()
method assumes that percent
is passed in as a
floating-point number between 0 and 1. That may be too radical an assumption in the real
world (a 1000% raise would probably be a bug for most of us!), and we might want to test or at
least document this in a future iteration of this code; we'll let it pass for this prototype.
Also stay tuned for a rehash of this idea in a later chapter in this book, where we'll code
something called function decorators, and explore Python's assert
statement—alternatives
which can do the validity test for us automatically during development.
Let's review where we are at: we now have a fairly full-featured class, which generates and initializes instances, along with two new bits of behavior for processing instances, in the form of methods. So far so good.
As is, though, testing is still a bit less convenient than it need be—to
trace our objects, we have to manually fetch and print individual attributes
(e.g., bob.name
, sue.pay
). It would be nice if displaying an instance
all at once actually gave us some useful information. Unfortunately, the
default display format for an instance object isn't very good—it displays
the object's class name, and its address in memory (which is essentially useless
in Python, except as a unique identifier).
To see this, change the last line in our script to print(sue)
so it displays
the object as a whole; here's what you'll get (the output says sue
is an "object"
in 3.0, and an "instance" in 2.6):
Bob Smith 0 Sue Jones 100000 Smith Jones <__main__.Person object at 0x02614430>
Fortunately, it's easy to do better by employing operator overloading—coding
methods in a class which intercept and process built-in operations when run on the
class's instances. Specifically, we can make use of what is probably the second most
commonly used operator overloading method in Python behind __init__
: the __str__
method
introduced in the prior chapter. __str__
is run automatically every time an instance is
converted to its print string. Because that's what printing an object does, the net
transitive effect is that printing an object displays whatever is returned by the object's
__str__
method, if it either has one itself, or inherits one from a superclass
(double-underscore names are inherited just like any other).
Technically speaking, the __init__
constructor method we've already coded is operator
overloading too—it is run automatically at construction time, to initialize a newly created
instance. Constructors are so common, though, that they almost seem like a special case.
More focused methods like __str__
allow us to tap into specific operations, and
provide specialized behavior when our objects are used in those contexts.
Let's put this into code: The following extends our class to give a custom display that lists attributes when our class's instances are displayed as a whole, instead of relying on the less useful default display:
# add __str__ overload method for printing objects class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) def __str__(self): # added method return '[Person: %s, %s]' % (self.name, self.pay) # string to print if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue)
Notice that we're doing string %
formatting to build the display
string in __str__
here; at the bottom, classes use built-in type objects
and operations like these to get their work done. Again, everything
you've already learned about both built-in types and functions applies
to class-based code. Classes largely just add an additional layer of
structure that packages functions and data together, and supports
extensions.
We've also changed our self-test code to print objects directly, instead
of printing individual attributes. When run, the output is more coherent
and meaningful now; the "[...]" lines are returned by our new __str__
,
run automatically by print operations:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000]
Subtle point: as we learned in the prior chapter, a related overloading method,
__repr__
, provides an as-code low level display of an object when present.
Sometimes classes provide both a __str__
for user-friendly displays, and a __repr__
with extra details for developers to view. Because printing runs __str__
and the
interactive prompt echoes results with __repr__
, this can provide both target
audiences with an appropriate display. Since we're not interested in displaying
an as-code format, __str__
is sufficient for our class.
At this point, our class captures much of the OOP machinery in Python:
it makes instances, provides behavior in methods, and even does a bit of
operator overloading now to intercept print operations in __str__
.
It effectively packages our data and logic together into a
single and self-contained software component,
making it easy to locate code, and straightforward to change it in the future.
By allowing us to encapsulate behavior, it also has allowed us to factor that
code to avoid redundancy, and its associated maintenance headaches.
The only major OOP concept it does not yet capture is customization by inheritance. In some sense, we're already doing inheritance, because instances inherited methods from their class. To demonstrate the real power of OOP, though, we need to define a superclass/subclass relationship that allows us to extend our software and replace bits of inherited behavior. That's the main idea behind OOP, after all; by fostering a coding model based upon customization of work already done, it can dramatically cut development time.
As a next step, then, let's put OOP's methodology to use, and customize our
Person
class in some way by extending our software hierarchy. For the purpose
of this tutorial, let's define a subclass of Person
called Manager
that replaces the inherited giveRaise()
method with a more specialized version.
Our new class begins as follows:
class Manager(Person): # define a subclass of Person
This code means that we're defining a new class named Manager
, which
inherits from, and may add customizations to, superclass Person
. In
plain terms, a Manager
is almost like a Person
(admittedly, a very long
journey for a very small joke...), but Manager
has a custom way to give
raises.
For the sake of argument, let's assume that when a manager gets a raise,
they receive the passed in percentage as usual, but also get an extra bonus
which defaults to 10%. For instance, if a Manager
's raise is specified as
10%, it will really get 20%. (Any relation to Person
s living or dead is,
of course, strictly coincidental.) Our new method begins as follows;
because this redefinition of giveRaise()
will be closer to Manager
instances
than Person
's in the class tree, it effectively replaces, and thereby customizes,
the operation. That is, the lowest version of the name wins, by the
inheritance search rule:
class Manager(Person): # inherit Person attrs def giveRaise(self, percent, bonus=.10): # redefine to customize
Now, there are two ways we might code this Manager
customization:
a good way and a bad way. Let's start with the bad way, since it
might be a bit easier to understand. The bad way is to cut and paste
the code of giveRaise()
in Person
and modify it for Manager
, like this:
class Manager(Person): def giveRaise(self, percent, bonus=.10): self.pay = int(self.pay * (1 + percent + bonus)) # bad: cut-and-paste
This works as advertised—when we later call the giveRaise()
method of
a Manager
instance, it will run this custom version, which tacks on
the extra bonus. So what's wrong with something that runs correctly?
The problem here is a very general one: any time you copy code with cut and paste, you essentially double your maintenance effort in the future. Think about it: Because we copied the original version, if we ever have to change the way raises are given (and we probably will), we now have to change it in two different places, not one. Although this is a small and artificial example, it's also representative of a universal issue—any time you're tempted to program by copying code this way, you probably want to look for a better approach.
What we really want to do here is somehow augment the
original giveRaise()
, instead of replacing it altogether. And the
good way to do that in Python is by calling to the original
version directly, with augmented arguments, like this:
class Manager(Person): def giveRaise(self, percent, bonus=.10): Person.giveRaise(self, percent + bonus) # good: augment original
This code leverages the fact that a class method can always be called either
through an instance—the usual way, where Python sends the instance
to the self
argument automatically; or through the class—the less
common scheme, where you must pass the instance manually. In more symbolic
terms, recall that a normal method call like this:
instance.method(args...)
is automatically translated by Python into the equivalent form:
class.method(instance, args...)
where the class containing the method to be run is determined by the
inheritance search rule applied to the method's name. You can code
either form in your script, but there is a slight asymmetry between the
two—you must remember to pass along the instance manually if you call
through the class directly. The method always needs a subject instance one
way or another, and Python provides it automatically only for calls made through
an instance. For calls through the class name, you need to send an
instance to self
yourself; for code inside a method like giveRaise()
,
self
is the instance to pass along.
Calling through the class directly effectively subverts inheritance, and
kicks the call higher up the class tree to run a specific version. In our
case, we can use this technique to invoke the default giveRaise()
in Person
,
even though it's been redefined at the Manager
level. In some sense, we
must call through Person
this way, because a self.giveRaise()
inside
Manager
's giveRaise()
code would loop—since self
already is a Manager
,
self.giveRaise()
would resolve again to Manager.giveRaise()
; and so on;
and so on; until you quickly exhaust available memory!
This "good" version may seem like a small difference in code, but it can
make a huge difference for future code maintenance—because the giveRaise()
logic lives in just one place now (Person
's method), we have only one version
to change in the future as needs evolve. And really, this form captures
our intent more directly anyhow—performing the standard giveRaise()
operation, but simply tacking on an extra bonus. Here's our entire module
with this step applied:
# add customization of one behavior in a subclass class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) def __str__(self): return '[Person: %s, %s]' % (self.name, self.pay) class Manager(Person): def giveRaise(self, percent, bonus=.10): # redefine at this level Person.giveRaise(self, percent + bonus) # call Person's version if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue) tom = Manager('Tom Jones', 'mgr', 50000) # make a Manager tom.giveRaise(.10) # runs custom version print(tom.lastName()) # runs inherited method print(tom) # runs inherited __str__
To test our Manager
subclass customization, we've also added self-test code that
makes a Manager
, calls its methods, and prints it. Here's the new version's
output:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000]
All looks good here: bob
and sue
are as before, and when tom
the Manager
is given a 10% raise, he really gets 20% (his pay changes from $50K to $60k), because
the customized giveRaise()
in Manager
is run for him only. Also notice how printing
tom
as a whole at the end of the test code displays the nice format defined in
Person
's __str__
: Manager
objects get this, lastName()
, and the __init__
constructor
method's code "for free" from Person
, by inheritance.
If you want to make this acquisition of inherited behavior even more striking, add the following sort of code at the end of our file:
if __name__ == '__main__': ... print('--All three--') for object in (bob, sue, tom): # process objects generically object.giveRaise(.10) # run this object's giveRaise print(object) # run the common __str__
When added, here's the new output:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000] --All three-- [Person: Bob Smith, 0] [Person: Sue Jones, 121000] [Person: Tom Jones, 72000]
In the added code, object
is either a Person
or a Manager
, and Python runs
the appropriate giveRaise()
automatically—our original version in Person
for
bob
and sue
, and our customized version in Manager
for tom
. Trace the method
calls yourself to see how Python selects the right giveRaise()
for each object.
This is just Python's notion of polymorphism we met earlier in the book at
work again—what giveRaise()
does depends on what you do it too. Here it's made
all the more obvious when it selects from code we've written ourselves in classes.
The practical effect in this code is that sue
gets another 10% but tom
another 20%,
because giveRaise()
is dispatched based upon the object's type. As we've learned,
polymorphism is at the heart of Python's flexibility. Passing any of our three objects
to a function that calls a giveRaise()
method, for example, would have the same effect:
the appropriate version would be run automatically, depending on which type of object
was passed.
On the other hand, printing runs the same __str__
for all three objects, because it's
coded just once in Person
. Manager
both specializes and applies the code we originally
wrote in Person
. Although this example is small, it's already leveraging OOP's talent
for code customization and reuse. With classes, this almost seems automatic at times.
In fact, classes can be even more flexible than our example implies. In general,
classes can inherit, customize, or extend existing code in superclasses.
For example, although we're focused on customization here, we can also add unique methods to
Manager
that are not present in Person
, if Manager
s require something completely different
(Python namesake reference intended). The following snippet illustrates: giveRaise()
redefines a superclass method to customize, but someThingElse()
defines something
new to extend:
class Person: def lastName(self): ... def giveRaise(self): ... def __str__(self): ... class Manager(Person): # inherit def giveRaise(self, ...): ... # customize def someThingElse(self, ...): ... # extend tom = Manager() tom.lastName() # <= inherited verbatim tom.giveRaise() # <= customized version tom.someThingElse() # <= extension here print(tom) # <= inherited overload method
Extra methods like this code's someThingElse()
would extend the existing software,
and be available on Manager
objects only, not on Person
s. For the purposes of
this tutorial, however, we'll limit our scope to customizing some of Person
's
behavior by redefining it, not adding to it.
As is, our code may be small, but it's fairly functional. And really, it already illustrates the main point behind OOP in general. In OOP, we program by customizing what has already been done, rather than copying or changing existing code. This isn't always an obvious win to newcomers on first glance, especially given the extra coding requirements of classes. But as a net, the programming style implied by classes can cut development time radically, compared to other approaches.
For instance, in our example we could theoretically have implemented a
custom giveRaise()
operation without subclassing, but none of the other
options yield code as optimal as ours:
Manager
from scratch as new,
independent code, we would have had to reimplement all the behaviors
in Person
that are the same for Manager
s.
Person
class in-place
for the requirements of Manager
's giveRaise()
, doing so would probably break
the places where we still need the original Person
behavior.
Person
class code in its entirety,
renaming the copy to Manager
and changing its giveRaise()
, doing so would
introduce code redundancy that would double our work in the future—changes
made to Person
in the future would not be picked up automatically, but would
have to be manually propagated to Manager
's code. As usual, cut-and-paste may
seem quick now, but it doubles your work in the future.
Instead, the customizable hierarchies we can build with classes provide a much better solution for software that will evolve over time. No other tools in Python support this development mode. Because we can tailor and extend our prior work by coding new subclasses, we can leverage what we've already done, rather than starting from scratch each time, breaking what already works, or introducing multiple copies of code that may all have to be updated in the future. When done right, OOP is a powerful programmer's ally.
Our code works as is, but if you study the current version closely, you may be struck
by something a bit odd—it seems a bit pointless to have to provide a mgr
job name
for Manager
objects when we create them: this is already implied by the class itself.
It would be better if we could somehow fill this in automatically when a Manager
is made.
The trick we need to improve on this turns out to be the same as the one we
employed in the prior section: we want to customize the constructor logic for
Managers
in such as way as to provide a job name automatically. In terms of code,
we want to redefine an __init__
method at Manager
that provides the mgr
string
to job
for us. And just like the giveRaise()
customization, we also want to run the original
__init__
in Person
by calling through the class name, so it still initializes our
objects' state information attributes.
The following extension will do the job—we've coded the new Manager
constructor,
and changed the call that creates tom
to not pass in the mgr
job name:
# add customization of constructor in a subclass class Person: def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): return self.name.split()[-1] def giveRaise(self, percent): self.pay = int(self.pay * (1 + percent)) def __str__(self): return '[Person: %s, %s]' % (self.name, self.pay) class Manager(Person): def __init__(self, name, pay): # redefine constructor Person.__init__(self, name, 'mgr', pay) # run original with 'mgr' def giveRaise(self, percent, bonus=.10): Person.giveRaise(self, percent + bonus) if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue) tom = Manager('Tom Jones', 50000) # job name not needed: tom.giveRaise(.10) # implied/set by class print(tom.lastName()) print(tom)
Again, we're using the same technique to augment the __init__
constructor here
that we used for giveRaise()
earlier—running the superclass version by calling
through the class name directly, and passing the self
instance along explicitly.
Although the constructor has a strange name, the effect is identical. Because
we need Person
's construction logic to run too (to initialize instance attributes),
we really have to call it this way; otherwise, instances would not have any
attributes attached.
Calling superclass constructors from redefinitions this way turns out to be a
very common coding pattern in Python. By itself, Python uses inheritance to
look for and call only one __init__
method at construction time—the lowest
one in the class tree. If you need higher __init__
methods to be run at construction
time (and you usually do), you must call them manually through the superclass's name.
The upside to this is that you can be explicit about which argument to pass up to the
superclass's constructor, and can choose to not call it at all: by not calling the
superclass constructor, you can replace its logic altogether, rather than augmenting it.
The output of this file's self-test code is the same as before—we haven't changed what it does again, we've simply restructured to get rid of some logical redundancy:
[Person: Bob Smith, 0] [Person: Sue Jones, 100000] Smith Jones [Person: Sue Jones, 110000] Jones [Person: Tom Jones, 60000]
In this complete form, despite their sizes, our classes capture nearly all the important concepts in Python's OOP machinery:
In the end, most of these concepts are based upon three simple ideas: the inheritance search for attributes in object trees, the special self argument in methods, and operator overloading's automatic dispatch to methods.
Along the way, we've also made our code easy to change in the future, by harnessing the class's propensity for factoring code to reduce redundancy. For example, we wrapped up logic in methods, and called back to superclass methods from extensions, to avoid having multiple copies of the same code. Most of these steps were a natural outgrowth of the structuring power of classes.
And by and large, that's all there is to OOP in Python. Classes certainly can become larger than this, and there are some more advanced class concepts such as decorators and metaclasses which we will meet in later chapters. In terms of the basics, though, our classes already do it all. In fact, if you grasp the workings of the classes we've written, most OOP Python code should now be within your reach.
Having said that, I should also tell you that although the basic mechanics of OOP is simple in Python, some of the art in larger programs lies in the way that classes are put together. We're focusing on inheritance in this tutorial because that's the mechanism which the Python language provides, but programmers sometimes combine classes in other ways too. For example, objects are also commonly nested inside each other to build up composites—a technique we'll explore in more detail in the next chapter, and which is really more about design than about Python.
As a quick example, though, we could use this composition idea to code our Manager
extension by embedding a Person
, instead of inheriting from it. The following
alternative does so by using the __getattr__
operator overloading method we met in the
prior chapter to intercept undefined attribute fetches, and delegate them to the embedded
object with the getattr()
built-in. The giveRaise()
method here still achieves
customization, by changing the argument passed along to the embedded object. In
effect, Manager
becomes a controller layer, which passes calls down to the
embedded object, rather than up to superclass methods:
class Person: ...same... class Manager: def __init__(self, name, pay): self.person = Person(name, 'mgr', pay) # embed a person object def giveRaise(self, percent, bonus=.10): self.person.giveRaise(percent + bonus) # intercept and delegate def __getattr__(self, attr): return getattr(self.person, attr) # delegate all other attrs def __str__(self): return str(self.person) # must overload again here if __name__ == '__main__': ...same...
This works, but requires about twice as much code, and is less well suited than
inheritance to the kinds of direct customizations we meant to express (in fact,
no reasonable Python programmer would code this example this way in practice).
Manager
isn't really a Person
here, so we need extra code to manually dispatch
method calls to the embedded object; operator overloading methods must be redefined;
and adding new Manager
behavior is less straightforward, since state information
is one level removed.
Still, object embedding, and design patterns based upon it, can be a very good
fit when embedded objects require more limited interaction with the container
than direct customization implies. A controller layer like this alternative
Manager
, for example, might come in handy if we want to trace or validate calls
to another object's methods (you'll see this first-hand when we study class decorators
later in the book). Moreover, a hypothetical Department
class like the following
could aggregate other objects in order to treat them as a set; add this to
the bottom of our person.py
file to try this on your own:
... bob = Person(...) sue = Person(...) tom = Manager(...) class Department: def __init__(self, *args): self.members = list(args) def addMember(self, person): self.members.append(person) def giveRaises(self, percent): for person in self.members: person.giveRaise(percent) def showAll(self): for person in self.members: print(person) development = Department(bob, sue) # embed objects in a composite development.addMember(tom) development.giveRaises(.10) # runs embedded objects' giveRaise development.showAll() # runs embedded objects' __str__s
Interestingly, this code uses both inheritance and
composition—Department
is a composite that embeds and controls other objects to aggregate,
but the embedded Person
and Manager
objects themselves use inheritance to customize.
As another example, a GUI might similarly use inheritance to customize the behavior
or appearance of labels and buttons, but also composition to build up larger
packages of embedded widgets, such as input forms, calculators and text editors.
The class structure to use depends on the objects you are trying to model.
Design issues like composition are explored in the next chapter, so we'll postpone
further investigations for now. But again, in terms of the basic mechanics of OOP
in Python, our Person
and Manager
classes already tell the entire story. Having
mastered the basics of OOP, though, developing general tools for using it more
easily in your scripts is often a natural next step—and the topic of the next
section.
One final tweak before we throw our objects on a database. Our classes are complete as is, and demonstrate most of the basics of OOP in Python. They still have two remaining issues we probably should iron out, though, before we go live with them:
tom
the Manager
labels him as a "Person." That's
not technically incorrect, since Manager
is a kind of customized and specialized
Person
. Still, it would be better to display an object with the most specific
(that is, lowest) class we can, for more accuracy.
__str__
, and that might not account for future
goals. For example, we can't yet verify that tom
's job name has been set to
mgr
correctly by Manager
's constructor, because the __str__
we coded for Person
does not print this field. Worse, if we ever expand or otherwise change
the set of attributes assigned to our objects in __init__
, we'll have to remember
to also update __str__
for new names to be displayed, or it will become out of
synch over time.
__str__
will be reflected in the program's output, this redundancy may be more
obvious than the other forms we addressed earlier; still, avoiding extra work in
the future is a generally good thing.
We can address both issues with Python's introspection tools—special attributes and functions that give us access to some of the internals of objects' implementations. These tools are a bit advanced and generally used more by people writing tools for other programmers to use, than by the programmers developing applications. Even so, a basic knowledge of some is useful because they allow us to write code that processes classes in generic ways. In our code, for example, there are two hooks that can help us out, both of which were introduced in the preceding chapter:
instance.__class__
attribute provides a link from an instance to the class
it was created from. Classes in turn have a __name__
just like modules, and a __bases__
sequence which provides access to higher superclasses. We can use these here to print the
name of the class an instance is made from, instead of one we've hardcoded.object.__dict__
attribute provides a dictionary with one key/value pair for
every attribute attached to a namespace object (including modules, classes, and instances).
Because it is a dictionary, we can fetch its keys list, index by key, iterate over its keys,
and so on, to process all attributes generically. We can use this here to print every
attribute in any instance, not just those we hardcode in custom displays.
Here's what these tools look like in action at Python's interactive prompt. Notice how
we load Person
at the interactive prompt with a from
statement here—class names live
in and are imported from a module, exactly like function names and other variables:
>>> from person import Person >>> bob = Person('Bob Smith') >>> print(bob) # show bob's __str__ [Person: Bob Smith, 0] >>> bob.__class__ # show bob's class and its name <class 'person.Person'> >>> bob.__class__.__name__ 'Person' >>> list(bob.__dict__.keys()) # attributes are really dict keys ['pay', 'job', 'name'] # use list() to force list in 3.0 >>> for key in bob.__dict__: print(key, '=>', bob.__dict__[key]) # index manually pay => 0 job => None name => Bob Smith >>> for key in bob.__dict__: print(key, '=>', getattr(bob, key)) # obj.attr, but attr is a var pay => 0 job => None name => Bob Smith
We can put these interfaces to work in a superclass that displays accurate class names
and formats all attributes of an instance of any class. Open a new file in your text editor
to code the following—it's a new, independent module that implements just such a class.
Because its __str__
print overload uses generic introspection tools, it will work on any
instance, regardless of its attributes set. And because this is a class, it automatically
becomes a general formatting tool: by using inheritance, it can be mixed into any class that
wishes to use its display format. As an added bonus, if we ever wish to change how instances
are displayed, we need change this class only—every class that inherits its __str__
will
automatically pick up the new format when next run:
# file classtools.py "Assorted class utilities and tools" class AttrDisplay: """ provides an inheritable print overload method that displays instances with their class name, and a name=value pair for each attribute stored on the instance itself (but not attrs inherited from its classes); can be mixed into any class, and will work on any instance """ def gatherAttrs(self): attrs = [] for key in sorted(self.__dict__): attrs.append('%s=%s' % (key, getattr(self, key))) return ', '.join(attrs) def __str__(self): return '[%s: %s]' % (self.__class__.__name__, self.gatherAttrs()) if __name__ == '__main__': class TopTest(AttrDisplay): count = 0 def __init__(self): self.attr1 = TopTest.count self.attr2 = TopTest.count+1 TopTest.count += 2 class SubTest(TopTest): pass X, Y = TopTest(), SubTest() print(X) # show all instance attrs print(Y) # show lowest class name
Notice the docstrings here—as a general purpose tool, we want to add some
functional documentation for potential users to read. Docstrings work at the
start of classes and their methods, exactly like we've learned they do at the
top of simple functions and modules; the help()
function and the PyDoc tool
extracts and displays these automatically.
When run, this module's self-test makes two instances and prints them; the __str__
defined here shows the instance's class, and all its attributes names and values,
in sorted attribute name order:
[TopTest: attr1=0, attr2=1] [SubTest: attr1=2, attr2=3]
If you study the classtools module's self-test code long enough, you'll notice that its
class displays only instance attributes, attached to the self
object at the bottom of
the inheritance tree; that's what self
's __dict__
contains. As an
intended consequence, we don't see attributes inherited by the instance from classes
above it in the tree (e.g., count
in this file's self-test code). Inherited class
attributes are attached to the class only, not copied down to instances.
If you ever do wish to include inherited attributes too, you can climb the __class__
link
to the instance's class; use the __dict__
there to fetch class attributes; and then iterate
through the class's __bases__
attribute to climb to even higher superclasses, and repeat. If
you're a fan of simple code, running a dir()
call on the instance instead of using __dict__
and climbing would have much the same effect, since a dir()
's results include inherited names
in its sorted result list:
>>> from person import Person >>> bob = Person('Bob Smith') # in Python 2.6: >>> bob.__dict__.keys() # instance attrs only ['pay', 'job', 'name'] >>> dir(bob) # + inherited attrs in classes ['__doc__', '__init__', '__module__', '__str__', 'giveRaise', 'job', 'lastName', 'name', 'pay'] # in Python 3.0: >>> list(bob.__dict__.keys()) # in 3.0, keys() is a view, not list ['pay', 'job', 'name'] >>> dir(bob) # in 3.0, includes class type methods ['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', ...more lines omitted... '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'giveRaise', 'job', 'lastName', 'name', 'pay']
The output here varies between Python 2.6 and 3.0, because 3.0's dict.keys()
is not a list,
and 3.0's dir()
returns extra class type implementation attributes. In fact, you would
probably want to filter out most of the __X__
names in the 3.0 dir()
result, since they
are internal implementation details, and not something you'd normally want to display.
In the interest of space, though, we'll leave optional display of inherited class attributes
with either tree climbs or dir()
as a suggested experiment (see also the inheritance tree
climber and attribute lister scripts in Chapters XXX and XXX for more hints).
One last subtlety here: Because our AttrDisplay
class in the classtools
module is a
general tool designed to be mixed in to other arbitrary classes, we have to be aware
of the potential for unintended name collisions with client classes. As is,
I've assumed that client subclasses may want to use both its __str__
and gatherAttrs
,
but the latter of these may be more than a subclass expects—if a subclass innocently
defines a gatherAttrs
name of its own, it will likely break our class, because the
lower version in the subclass will be used instead of ours.
To see this for yourself, add a gatherAttrs
to TopTest
in the file's self-test code;
unless the new method is identical, or intentionally customizes the original, our
tool class will no longer work as planned:
class TopTest(AttrDisplay): .... def gatherAttrs(self): # replaces method in AttrDisplay! return 'Spam'
This isn't necessarily bad—sometimes we want other methods to be available
to subclasses, either for direct calls, or for customization. If we really
meant to provide a __str__
only, though, this is less than ideal.
To minimize the chances of name collisions like this, Python programmers often
prefix methods not meant for external use with a single underscore: _gatherAttrs
in our case. This isn't foolproof (what if another class defines _gatherAttrs
too?), but is usually sufficient, and a common Python naming convention for methods
internal to a class.
A better and less commonly used solution would be to use two underscores at the
front of the method name: __gatherAttrs
for us. Python automatically expands
such names to include the enclosing class' name, which makes them truly unique.
This is a feature usually called pseudo-private class attributes which we'll
expand on later in this book. For now, we'll make both our methods available.
Now, to use this generic tool in our classes, all we need to do
is import it from its module, mix it in by inheritance in our top-level
class, and get rid of the more specific __str__
we had coded before.
The new print overload method will be inherited by instances of Person
,
as well as Manager
—Manager
gets a __str__
from Person
, which now
obtains it from AttrDisplay
coded in another module. Here is the
final version of our person.py
file with these changes applied:
# file person.py (final) from classtools import AttrDisplay # use generic display tool class Person(AttrDisplay): """ create and process person records """ def __init__(self, name, job=None, pay=0): self.name = name self.job = job self.pay = pay def lastName(self): # assumes last is last return self.name.split()[-1] def giveRaise(self, percent): # percent must be 0..1 self.pay = int(self.pay * (1 + percent)) class Manager(Person): """ a customized person with special requirements """ def __init__(self, name, pay): Person.__init__(self, name, 'mgr', pay) def giveRaise(self, percent, bonus=.10): Person.giveRaise(self, percent + bonus) if __name__ == '__main__': bob = Person('Bob Smith') sue = Person('Sue Jones', job='dev', pay=100000) print(bob) print(sue) print(bob.lastName(), sue.lastName()) sue.giveRaise(.10) print(sue) tom = Manager('Tom Jones', 50000) tom.giveRaise(.10) print(tom.lastName()) print(tom)
As this is the final revision, we've added a few comments here to document
our work—docstrings for functional descriptions, and #
for smaller notes,
per best-practice conventions. When run now, we see all the attributes of our
objects, not just the ones we hard-coded in the original __str__
. And our final
issue is resolved: because AttrDisplay
takes class names off the self
instance
directly, objects are shown with the name of their closest (lowest) class—tom
displays as a Manager
now, not a Person
, and we can finally verify that his job
name has been filled in by the Manager
constructor correctly:
[Person: job=None, name=Bob Smith, pay=0] [Person: job=dev, name=Sue Jones, pay=100000] Smith Jones [Person: job=dev, name=Sue Jones, pay=110000] Jones [Manager: job=mgr, name=Tom Jones, pay=60000]
This is the more useful display we were after. From a larger perspective, though, our attribute display class has become a general tool, which we can mix in to any class by inheritance to leverage the display format it defines. Further, future changes in our tool will be automatically picked up by all its clients. Later in the book, we'll meet even more powerful class tool concepts such as decorators and metaclasses; along with Python's introspection tools, they allow us to write code that augments and manages classes in structured and maintainable ways.
At this point, our work is almost complete. We now have a two-module system that not only implements our original design goals for representing people, but also provides a general attribute display tool we can use in other programs in the future. By coding functions and classes in module files, they naturally support reuse. And by coding software as classes, they naturally support extension.
Although our classes work as planned, though, the objects they create are not real database records. That is, if we kill Python our instances will disappear—they're transient objects in memory, and not stored on a more permanent medium like a file, so they won't be available in future program runs. It turns out that it's easy to make instance objects more permanent, with a Python feature called object persistence—making objects live on after the program the creates them exits. As a final step in this tutorial, let's make our objects permanent.
Object persistence is implemented by three standard library modules, available on every Python:
pickle
—serializes arbitrary Python objects to and from a string of bytes
anydbm
—implements an access-by-key file system for storing strings
shelve
—uses the other two modules to store Python objects on a file by key
These modules provide powerful data storage options. The pickle
module is a sort
of super general object formatting and deformatting tool: given a nearly arbitrary
Python object in memory, it's clever enough to convert the object to a string of bytes,
which it can use later to reconstruct the original object in memory. pickle
can handle
almost any object you can create—lists, dictionaries, nested combinations thereof, and
class instances. The latter are especially useful things to pickle, because they provide
both data (attributes) and behavior (methods). Because pickle
is so general, it can
replace extra code you might otherwise write to create and parse custom text file
representations for your objects. By storing an object's pickle
string on a
file, you effectively make it permanent and persistent: simply load and unpickle
later to recreate the original object.
Although it's easy to use pickle
by itself to store objects on simple flat files
and load them from there later, the shelve
module provides an extra layer
of structure that allows you to store pickled objects by key. When storing,
shelve
translates an object to its pickled string, and stores that string
under a key in an anydbm
file; when later loading, shelve
fetches the pickled
string by key, and recreates the original object in memory with pickle
.
This is all quite a trick, but to your script a shelve of pickled objects
looks just like a dictionary—you index by key to fetch, assign to key
to store, and use dictionary tools such as len()
, in
, and dict.keys()
to
get information. Shelves automatically map dictionary operations to objects
stored on a file.
In fact, to your script the only coding difference between a shelve and a normal dictionary is that you must open shelves initially, and close them after making changes. The net effect is that a shelve provides a simple database for storing and fetching native Python objects by keys, and thus makes them persistent across program runs. It does not support query tools such as SQL, and lacks some advanced features found in enterprise-level databases such as true transaction processing, but native Python objects stored on a shelve may be processed with the full power of the Python language once they are fetched back by key.
Pickling and shelves are somewhat advanced topics, and we won't go into all their details here; you can read more about them in the standard library manuals, as well as application-focused books such as Programming Python. This is all simpler in Python than in English, though, so let's jump into some code.
Let's write a new script that throws objects of our classes onto
a shelve. In your text editor, open a new file we'll call makedb.py
.
Since this is a new file, we'll need to import our classes in order to
create a few instances to store. We used from
to load a class at
the interactive prompt earlier; really, there are two ways to load a class
from a file, exactly like functions and other variables—class names are
variables like any other, and not at all magic in this context:
import person # load class with import bob = person.Person(...) # go through module name from person import Person # load class with from bob = Person(...) # use name directly
We'll use from
to load in our script, just because it's a bit less to type.
Now, copy or retype code to make instances of our classes in the new script,
so we have something to store (this is a simple demo, so we won't worry about
test code redundancy here). Once we have some instances, it's almost trivial
to store them on a shelve—import the shelve
module, open a new shelve with
an external file name, assign the objects to keys in the shelve, and close
the shelve when we're done because we're making changes:
# makedb.py: store Person objects on a shelve database from person import Person, Manager # load our classes bob = Person('Bob Smith') # recreate objects to be stored sue = Person('Sue Jones', job='dev', pay=100000) tom = Manager('Tom Jones', 50000) import shelve db = shelve.open('persondb') # filename where objects stored for object in (bob, sue, tom): # use object's name attr as key db[object.name] = object # store object on shelve by key db.close() # close after making changes
Notice how we assign objects to the shelve using their own names as keys.
This is just for convenience; in a shelve, the key can be any string, including
one we might create to be unique using tools such as process IDs and time stamps
available in the os
and time
standard library modules. The only rule is that the
keys must be strings and should be unique, since we can store just one object per
key (though that object can be a list or dictionary containing many objects).
The values we store under keys, though, can be almost any Python object: built-in
types like strings, lists, and dictionaries, as well as user-defined class instances,
and nested combinations of all of these.
That's all there is to it—if this script has no output when run, it means it worked: we're not printing anything, just creating and storing objects.
C:\misc> makedb.py
At this point, there are one or more real files in the current directory whose names
all start with persondb
. The actual files created can vary per platform, and just like
the built-in open()
, the file name in shelve.open()
is relative to the current working
directory unless it includes an absolute directory path. But these files implement a
keyed-access file that contains the pickled representation of our three Python objects.
Don't delete these files—they are your database, and are what to copy or transfer
when you backup and move your storage.
You can look at the shelve's files if you want to, either from Windows explorer or the
Python shell, but they are binary hash files, and most of their content makes little
sense outside the context of the shelve module. With Python 3.0 and no extra software
installed, our database is stored in three files (in 2.6, it's just one file, persondb
,
because the bsddb
extension module is preinstalled with Python for shelves):
# directory listing module: verify files are present >>> import glob >>> glob.glob('person*') ['person.py', 'person.pyc', 'persondb.bak', 'persondb.dat', 'persondb.dir'] # type the file: text mode for string, binary mode for bytes >>> print(open('persondb.dir').read()) 'Tom Jones', (1024, 91) ...more omitted... >>> print(open('persondb.dat', 'rb').read()) b'\x80\x03cperson\nPerson\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00payq\x03K... ...more omitted...
This content isn't impossible to decipher, but can vary on different platforms, and does not exactly qualify as a user-friendly database interface! To verify our work better, we can write another script, or poke around our shelve at the interactive prompt. Because shelves are Python objects containing Python objects, we can process them with normal Python syntax and development modes; the interactive prompt effectively becomes a database client:
>>> import shelve >>> db = shelve.open('persondb') # reopen the shelve >>> len(db) # three 'records' stored 3 >>> list(db.keys()) # keys() is the index ['Tom Jones', 'Sue Jones', 'Bob Smith'] # list() to make a list in 3.0 >>> bob = db['Bob Smith'] # fetch bob by key >>> print(bob) # runs __str__ from AttrDisplay [Person: job=None, name=Bob Smith, pay=0] >>> bob.lastName() # runs lastName from Person 'Smith' >>> for key in db: # iterate, fetch, print print(key, '=>', db[key]) Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Bob Smith => [Person: job=None, name=Bob Smith, pay=0] >>> for key in sorted(db): print(key, '=>', db[key]) # iterate by sorted keys Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]
Notice that we don't have to import our Person
or Manager
classes here
in order to load or use our stored objects. For example, we can call
bob
's lastName()
method freely, and get his custom print display format
automatically, even though we don't have his Person
class in our
scope here. This works because when Python pickles a class
instance, it records its self
instance attributes, along with the name
of the class it was created from, and the module where the class lives.
When bob
is later fetched from the shelve and unpickled, Python will
automatically reimport the class, and link bob
to it.
The upshot of this scheme is that class instances automatically acquire all their class behavior when they are loaded in the future. We have to import our classes only to make new instance, not to process existing ones. Although a deliberate feature, this scheme has somewhat mixed consequences:
Shelves also have well-known limitations (the database suggestions at the end of this chapter mention a few of these). For simple object storage, though, shelves and pickles are remarkably easy-to-use tools.
One last script: Let's write a program that updates an instance (record)
each time it runs, to prove the point that our objects really are
persistent—their current values are available every time a Python
program runs. The following prints the database so we can trace,
and gives a raise to one of our stored objects each time. If you trace
through what's going on here, you'll notice that we're getting a lot of
utility "for free"—printing our objects automatically employs
the general __str__
overloading method, and we give raises by calling
the giveRaise()
method we wrote earlier. This all "just works" for
objects based on OOP's inheritance model, even when they live on a file:
# updatedb.py: update Person object on database import shelve db = shelve.open('persondb') # reopen shelve with same filename for key in sorted(db): # iterate to display database objects print(key, '\t=>', db[key]) # prints with custom format sue = db['Sue Jones'] # index by key to fetch sue.giveRaise(.10) # update in memory using class method db['Sue Jones'] = sue # assign to key to update in shelve db.close() # close after making changes
Because this script prints the database when it starts up, we have
to run it a few times to see our object change. Here it is in action,
displaying all records and increasing sue
's pay each time it's run
(it's a pretty good script for sue
...):
c:\misc> updatedb.py Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=100000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:\misc> updatedb.py Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=110000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:\misc> updatedb.py Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=121000] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000] c:\misc> updatedb.py Bob Smith => [Person: job=None, name=Bob Smith, pay=0] Sue Jones => [Person: job=dev, name=Sue Jones, pay=133100] Tom Jones => [Manager: job=mgr, name=Tom Jones, pay=50000]
Again, what we see here is a product of both the shelve
and pickle
tools
we get from Python, and the behavior we coded in our classes ourselves.
And once again, we can verify at the interactive prompt—the shelve's
equivalent of a database client:
c:\misc> python >>> import shelve >>> db = shelve.open('persondb') # reopen database >>> rec = db['Sue Jones'] # fetch object by key >>> print(rec) [Person: job=dev, name=Sue Jones, pay=146410] >>> rec.lastName() 'Jones' >>> rec.pay 146410
And that's a wrap for this tutorial. At this point, you've seen all the basics of Python's OOP machinery in action, and learned ways to avoid redundancy and its associated maintenance issues in your code. You've built full-featured classes that do real work. As an added bonus, you've made them real database records, by storing them in a Python shelve, so their information lives on persistently.
There is much more we could explore here, of course. For example, we could extend our classes to become more realistic, add new kinds of behavior to them, and so on. Giving a raise, for instance, should in practice verify that pay increase rates are between zero and one—an extension we'll add when we meet decorators later in this book. You might also mutate this example into a personal contacts database, by changing the state information stored on objects, as well as the class methods used to process it. We'll leave this a suggested exercise open to your imagination.
We could also expand our scope to use tools that either come with Python or are feely available in the open source world:
Tkinter
(tkinter
in 3.X)
standard library support, or third-party toolkits such as
WxPython and PyQt. Tkinter
ships with Python, lets you build simple GUIs quickly, and
is ideal for learning GUI programming techniques; WxPython and PyQt tend to be more
complex to use, but often produce higher-grade GUIs in the end.
While I hope this whets your appetite for future exploration, all of these topics are of course far beyond the scope of this tutorial and this book at large. If you want to explore any of them on your own, see the web, Python's standard library manuals, and application-focused books such as Programming Python. In the latter of these, for example, we'll pick up with our example where we're stopping here, to add both a GUI and a website on top of our database for browsing and updating instance records. I hope to see you there eventually, but first, let's return to class fundamentals, and finish up the rest of the core Python language story.
Since I did most of the work in this chapter, we'll close with just a few questions designed to make you trace through some of the code, and ponder some of the bigger ideas behind it.
Manager
object from the shelve and print it, where
does the display format logic come from?
Person
object from a shelve, how does it know that
it has a giveRaise()
method that we can call?
__dict__
that allow objects to be
processed generically, than to write more custom code for each type of class?
Manager
ultimately inherits its __str__
printing method from AttrDisplay
in the separate classtools
module. Manager
doesn't
have one itself, so inheritance climbs to its Person
superclass; because there is no
__str__
there either, inheritance climbs higher to find it in AttrDisplay
. The
class names listed in parenthesis in a class statement's header line provide
the links to higher superclasses.
pickle
module they use) automatically relink an instance
to the class it was created from when that instance is later loaded back into
memory. Python reimports the class from its module internally, creates an
instance with its stored attributes, and sets the instance's __class__
link
to point to its original class. This way, loaded instances automatically
obtain all their original methods (like lastName()
, giveRaise()
, and __str__
),
even if we have not imported the instance's class into our scope.
__str__
print method, for example, need not be updated each time a new
attribute is added to instances in __init__
. In addition, a generic print method
inherited by all classes only appears, and need be modified, in one
place—changes in the generic version are picked up by all classes that inherit
from the generic class. Again, eliminating code redundancy cuts future
development effort; that's one of the primary assets classes bring to the table.
name
, address
, birthday
, phone
,
email
, and so on for a contacts database, and methods appropriate for this.
A method named sendmail
, for example, might use Python's smptlib
module
to send an email to one of the contacts automatically, when called (see
Python's manuals or application-level books for more details on such tools).
The AttrDisplay
tool we wrote here could be used verbatim to print your
objects, because it is intentionally generic. Most of the shelve
database
code here can be used to store your objects too, with minor changes.
This article's material originally appeared in this book,
and was later expanded and republished in this book.
Along the way, its examples were updated to work in Python 3.X too, by using print()
,
list(dict.keys())
, new class repr
and instance dir()
results,
and new external files and sorted(db)
for shelves. See the books for the final
enhanced versions.
For additional reading, try these other articles popular at learning-python.com:
These and more are available on the blog page.