12. Databases and Persistence

Topics

♦ Persistent object shelves

♦ Storing class instances

♦ Pickling without shelves

♦ DBM-style files

♦ Shelve gotchas

♦ ZODB object-oriented database

♦ Python SQL database API

♦ Odds and ends

Object persistence: shelves

● Shelve = dbm file + object pickling (serialization)

● Stores arbitrary Python objects by string key

● Shelves are processed with normal Python code

● Pickler handles nested and circular objects

Basic usage

►Shelves are dictionaries that must be opened

import shelve

dbase = shelve.open("mydbase")

►Assigning to a shelve key stores an object

dbase['key'] = object

►Indexing a shelve fetches a stored object

value = dbase['key']

►Most dictionary operations supported

len(dbase) # number items stored

dbase.keys() # stored item key index

■ Creating a new shelve

● Creates dbm file(s) to store objects

● Manual ‘close’ needed for some file systems

% python

>>> import shelve

>>> dbase = shelve.open("mydbase")

>>> object1 = ['The', 'bright', ('side', 'of'), ['life']]

>>> object2 = {'name': 'Brian', 'age': 33, 'motto': object1}

>>> dbase['brian'] = object2

>>> dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'}

>>> dbase.close()

■ Using an existing shelve

% python

>>> import shelve

>>> dbase = shelve.open("mydbase")

>>> len(dbase) # entries

>>> dbase.keys() # index

['knight', 'brian']

>>> dbase['knight'] # fetch

{'motto': 'Ni!', 'name': 'Knight'}

>>> for row in dbase.keys():

... print row, '=>'

... for field in dbase[row].keys():

... print ' ', field, '=', dbase[row][field]

...

knight =>

motto = Ni!

name = Knight

brian =>

motto = ['The', 'bright', ('side', 'of'), ['life']]

age = 33

name = Brian

Storing class instances

● Classes defined at top-level of module

● No non-default constructor arguments (until 1.5.1!)

● Python stores instance __dict__, not class

● Changing class changes stored object behavior

file: person.py

# a person object: fields + behavior

class Person:

def __init__(self, name='', job='', pay=0):

self.name = name

self.job = job

self.pay = pay # real instance data

def tax(self):

return self.pay * 0.25 # computed on call

def info(self):

return self.name, self.job, self.pay, self.tax()

% python

>>> from person import Person

>>> bob = Person('bob', 'psychologist', 70000)

>>> emily = Person('emily', 'teacher', 40000)

>>>

>>> import shelve

>>> dbase = shelve.open('cast') # make new shelve

>>> for obj in (bob, emily): # store objects

>>> dbase[obj.name] = obj

>>> dbase.close() # need for bsddb

% python

>>> import shelve

>>> dbase = shelve.open('cast') # reopen shelve

>>> print dbase['bob'].tax() # call: bob's tax

17500.0

Changing classes changes behavior

● Classes = records + processing programs

● Example: changing the Person.tax method

file: person.py

# a person object: fields + behavior

# change: the tax method is now a virtual member

class Person:

def __init__(self, name='', job='', pay=0):

self.name = name

self.job = job

self.pay = pay # real instance data

def __getattr__(self, attr): # on person.attr

if attr == 'tax':

return self.pay * 0.30 # computed on access

else:

raise AttributeError # other unknown names

def info(self):

return self.name, self.job, self.pay, self.tax

% python

>>> import shelve

>>> dbase = shelve.open('cast') # reopen shelve

>>> print dbase['bob'].tax # no need to call tax()

21000.0

Pickling objects without shelves

● Shelve = dbm file + object pickler

● Pickler serializes Python objects into text streams

● Streams may be sent to flat file, dbm file, socket, etc.

● Also see: ‘marshal’ module (object limitations)

file: testpickle.py

import pickle

def saveDbase(filename, table):

file = open(filename, 'w')

pickle.dump(table, file) # pickle to file

file.close()

def loadDbase(filename):

file = open(filename, 'r')

table = pickle.load(file) # unpickle from file

file.close()

return table

% python

>>> from testpickle import *

>>> L = [0]

>>> D = {'x':0, 'y':L}

>>> table = {'A':L, 'B':D} # L appears twice (D)

>>> saveDbase('myfile', table) # serialize to file

% python

>>> from testpickle import *

>>> table = loadDbase('myfile') # reload/unpickle

>>> print table

{'B': {'x': 0, 'y': [0]}, 'A': [0]}

>>> table['A'][0] = 1 # change shared list

>>> saveDbase('myfile', table)

% python

>>> from testpickle import *

>>> print loadDbase('myfile') # both L’s updated!

{'B': {'x': 0, 'y': [1]}, 'A': [1]}

Using simple dbm files

● Shelve = dbm file + object pickler

● Dbm files can only store strings (not any object)

● ‘anydbm’ selects from dbhash, gdbm, dbm or dumbdbm

● Dictionary-like interface like shelve (shelve uses dbm)

% python

>>> import anydbm

>>> file = anydbm.open('languages', 'c') # create, in+out

>>> file['perl'] = "Text processing" # store

>>> file['tcl'] = "Simple glue"

>>> file['python'] = "OO scripting"

>>> file.close()

% python

>>> import anydbm

>>> file = anydbm.open('languages', 'c') # existing file

>>> file['python'] # fetch

'OO scripting'

>>> for lang in file.keys(): # index

... print lang + ':', file[lang]

...

perl: Text processing

tcl: Simple glue

python: OO scripting

>>> del file['tcl'] # delete

% python

>>> import anydbm

>>> anydbm.open('languages', 'c').keys() # sorry, tcl!

['python', 'perl']

Shelve gotchas

►Keys must still be strings

dbase[42] = value # fails

►Shared objects are only noticed within a given slot

dbase[key] = object # store parts just once

dbase[key] = object # two copies of object!

►Updates must treat as fetch-modify-store mappings

dbase[key].attr = value # shelve unchanged!

object = dbase[key] # fetch it

object.attr = value # modify it

dbase[key] = object # store back

►Doesn't support simultaneous updates by itself (e.g. CGI scripts)

Plus class pickling rules

►At the top-level of a module (importable)

►No non-default constructor arguments (until 1.5.1!)

►Class changes must be backward-compatible

►Classes can use special methods to break constructor rule

ZODB object-oriented database

● Full-blown OODB, 3^rd party open source add-on

● ZODB – Zope’s advanced OODB, available by itself

● Adds object identifiers to Python pickling to do write-thru on change (via a mixed-in superclass)

● Supports concurrent update, transaction commit/rollback, write-thru on changes

● Like shelve, but extra boilerplate code

● Concurrent updates for multi-threaded

● ZEO distributed objects server for multi-process

● Can map ZODB db to flat files, relation dbase, other

● Legacy from the web: faster at reads than writes

→ See also: newer SQLObject and SQLAlchemy Python relational mappers (more ahead)

Creating a ZODB database

…\PP3E\Database\ZODBscripts\>python

>>> from ZODB import FileStorage, DB

>>> storage = FileStorage.FileStorage(r'C:\Mark\temp\mydb.fs')

>>> db = DB(storage)

>>> connection = db.open()

>>> root = connection.root()

>>> object1 = (1, 'spam', 4, 'YOU')

>>> object2 = [[1, 2, 3], [4, 5, 6]]

>>> object2.append([7, 8, 9])

>>> object2

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

>>>

>>> object3 = {'name': ['Bob', 'Doe'],

'age': 42,

'job': ('dev', 'mgr')}

>>> root['mystr'] = 'spam' * 3

>>> root['mytuple'] = object1

>>> root['mylist'] = object2

>>> root['mydict'] = object3

>>> root['mylist']

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

>>> import transaction

>>> transaction.commit()

>>> storage.close()

Using a ZODB database

…\PP3E\Database\ZODBscripts\>python

>>> from ZODB import FileStorage, DB

>>> storage = FileStorage.FileStorage(r'C:\Mark\temp\mydb.fs')

>>> db = DB(storage)

>>> connection = db.open()

>>> root = connection.root() # connect

>>> print len(root), root.keys() # size, ix

4 ['mylist', 'mystr', 'mytuple', 'mydict']

>>>

>>> print root['mylist'] # fetch obs

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

>>> print root['mydict']

{'job': ('dev', 'mgr'), 'age': 42, 'name': ['Bob', 'Doe']}

>>> root['mydict']['name'][-1] # last name

'Doe'

>>> for key in root.keys():

print key.ljust(10), '=>', root[key]

mylist => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

mystr => spamspamspam

mytuple => (1, 'spam', 4, 'YOU')

mydict => {'job': ('dev', 'mgr'), 'age': 42, 'name': ['Bob', 'Doe']}

Classes under ZODB

# zodb_class_make

import time

mydbfile = 'data/class.fs' # dbase file

from persistent import Persistent

class BookMark(Persistent): # use zodb

def __init__(self, title, url):

self.hits = 0

self.updateBookmark(self, url)

def updateBookmark(self, title, url):

self.title = title # updates db

self.url = url

self.modtime = time.asctime()

def connectdb(dbfile):

from ZODB import FileStorage, DB

storage = FileStorage.FileStorage(dbfile)

db = DB(storage)

connection = db.open()

root = connection.root()

return root, storage

def addobjects():

root, storage = connectdb(mydbfile)

root['ora'] = BookMark('Oreilly', 'http://www.oreilly.com')

root['pp3e'] = BookMark('PP3E',

'http://www.rmi.net/~lutz/about-pp.html')

import transaction

transaction.commit()

storage.close()

# zodb_class_read

mydbfile = 'data/class.fs'

from zodb_class_make import connectdb

root, storage = connectdb(mydbfile)

# this updates db: attrs changed in method

print 'pp3e url:', root['pp3e'].url

print 'pp3e mod:', root['pp3e'].modtime

root['pp3e'].updateBookmark('PP3E',

'www.rmi.net/~lutz/about-pp3e.html')

# this updates too: attr changed here

ora = root['ora']

print 'ora hits:', ora.hits

ora.hits += 1

# commit changes made

import transaction

transaction.commit()

storage.close()

Python SQL database API

● Support for Oracle, Sybase, Informix, ODBC, Postgres, mySql,…

● Portable API Implemented by vendor-specific extension modules

Connection objects

represent a connection to a database

are the interface to rollback and commit operations

generate cursor objects

Cursor objects

represent a single SQL statement submitted as a string

can be used to step through SQL statement results

can execute DDL (create), DML (insert), and DQL (select) statements

Query results

SQL select results returned to scripts as Python data

Tables of rows are Python lists of tuples

Field values within rows are normal Python objects

An example query result: [('bob',38), ('emily',37)]

########################################

# example of portable database API usage

########################################

from cx_Oracle import Connect

connobj = Connect("user/password@system")

cursobj = connobj.cursor()

value1,value2 ='developer', 40

query = 'SELECT name, shoesize FROM empl WHERE job = ? AND age = ?'

cursobj.execute(query, (value1, value2))

results = cursobj.fetchall()

for (name, size) in results:

print name, size

Other DB API concepts

● Select results: fetchall(), fetchone(), fetchmany()

● Transactions: commit, rollback on connection object

● update, insert, delete statements: rowcount

● Stored procedure methods

● Blobs

● Thread Safety check

● Description field: column names, types

Other SQL DB concepts

● SQLObject 3^rd party extension object relational mapper: table/row/column → class/instance/attr

● SQLAlchemy: similar mapper system

● Full API at python.org: search Google for “Python Database API” -- PEP, 2.0

Sqlite – standard library in Python 2.5

● In-process C lib based SQL engine, no server

● Useful for internal program data, prototyping

import sqlite3

conn = sqlite3.connect('/tmp/example') # or “:memory:”

c = conn.cursor()

# Create table

c.execute('''create table stocks

(date text, trans text, symbol text,

qty real, price real)''')

# Insert a row of data

c.execute("""insert into stocks

values ('2006-01-05','BUY','RHAT',100,35.14)""")

# run a query

t = (symbol,)

c.execute('select * from stocks where symbol=?', t)

for t in (('2006-03-28', 'BUY', 'IBM', 1000, 45.00),

('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00),

('2006-04-06', 'SELL', 'IBM', 500, 53.00),

c.execute('insert into stocks values (?,?,?,?,?)', t)

MySQL

● Server-based database engine

● Install MySQL, msql-python

from MySQLdb import Connect

conn = Connect(host='localhost', user='root', passwd='python')

curs = conn.cursor()

try:

curs.execute('drop database testpeopledb')

except:

pass # did not exist

curs.execute('create database testpeopledb')

curs.execute('use testpeopledb')

curs.execute('create table people (name char(30), job char(10), pay int(4))')

curs.execute('insert people values (%s, %s, %s)',

('Bob', 'dev', 50000))

curs.execute('insert people values (%s, %s, %s)',

('Sue', 'dev', 60000))

curs.execute('select * from people')

for row in curs.fetchall():

print row

curs.execute('select * from people')

colnames = [desc[0] for desc in curs.description]

while True:

print '-' * 30

row = curs.fetchone()

if not row: break

for (name, value) in zip(colnames, row):

print '%s => %s' % (name, value)

conn.commit() # save inserted records

Informix

>>> import informixdb

>>> conn = informixdb.connect('mydatabase')

>>> cursor = conn.cursor()

>>> cursor.execute("SELECT * FROM names")

>>> cursor.fetchall()

[('donald', 'duck', 34), ('mickey', 'mouse', 23)]

>>> import informixdb

>>> conn = informixdb.connect('db@daniel', user='me', password='something')

>>> cursor.execute('INSERT INTO names VALUES(:1,:2,:3)', ('some', 'body', 56))

>>> cursor.execute('INSERT INTO names VALUES(?,?,?)', ('dagobert', 'duck', 77))

PostgreSQL: see interface, docs online

Persistence odds and ends

● Flat files

■ Objects and descriptors

● ‘marshal’ module

■ Serializes objects like pickler, but only simple types

● Performance

■ ‘cPickle’ module is 1 to 3 orders of magnitude faster than ‘pickle’ (auto selected by shelve)

● Database interfaces available

■ At python.org or search: Oracle, Sybase, PostgreSQL Informix, ODBC, MySQL, etc.

■ (See above) Portable database API: works on many systems

► Alternatives: JSON and MongoDB (structures)

>>> import json

>>> rec = {'name': 'Bob', 'job': ['dev', 'mgr'], 'age': 40.5}

>>> text = json.dumps(rec)

>>> text

'{"age": 40.5, "name": "Bob", "job": ["dev", "mgr"]}'

>>> rec2 = json.loads(text)

>>> rec2

{'age': 40.5, 'name': 'Bob', 'job': ['dev', 'mgr']}

► Alternatives: AppEngine, AWS, Azure (cloud)

► Alternative: Pandas (big data)

Lab Session 9

Click here to go to lab exercises

Click here to go to exercise solutions

Click here to go to solution source files

Click here to go to lecture example files