Introduction
In March of 2000, Digital Creations announced that they'd be releasing ZEO, Zope Enterprise Objects, as free software. ZEO builds on top of the Z Object Database (ZODB), which has been available as part of the Zope Web publishing system for a long time. While they were developed for use with Zope, neither the ZODB nor ZEO are tied to Zope, and I think Python programmers should be aware of the powerful capabilities that they offer. This article is a brief tutorial introduction to ZODB and ZEO, and shows an example application.
The Z Object Database
We'll begin by looking at the Z Object Database, since ZEO builds on the foundation laid by the ZODB. ZODB lets you add persistence to Python objects in an almost completely transparent way, giving Python programmers an object database that allows making Python objects persistent with very little effort. (The "very little effort" part should not be understated. Commercial object databases for C++ or Java often require that you jump through some hoops and avoid certain data types or code styles. In comparison the naturalness of the ZODB is astonishing, and testimony to both Python's flexibility and Jim Fulton's skills.)
There are 3 main interfaces in the ZODB: Storage, DB, and Connection classes.
- Storage classes are the lowest layer, and handle storing
and retrieving objects from some form of long-term storage. A few
different types of Storage have been written, such as
FileStorage, which uses regular files, andBerkeleyStorage, which uses Sleepycat Software's BerkeleyDB 2.7. You could write a new Storage that stored objects in a relational database or Metakit file, for example, if you needed to ensure some property useful to your application. Two example storages,DemoStorageandMappingStorage, are available to use as models if you want to write a new Storage. - The DB class sits on top of a storage, and mediates the
interaction between several connections. One
DBinstance is created per process. - Finally, the Connection class caches objects, and moves them into and out of object storage. A multi-threaded program can open a separate Connection instance for each thread.
Preparing to use a ZODB requires 3 steps: you have to open the Storage, then create a DB instance that uses the Storage, and then get a Connection from the DB instance. All this is only a few lines of code:
from ZODB import FileStorage, DB
storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
db = DB( storage )
conn = db.open()
Note that you can use a completely different data storage
mechanism by changing the first line that opens a Storage; the
above example uses a FileStorage. Soon you'll see how
ZEO uses this flexibility to good effect.
Using a ZODB
Making a Python class persistent is quite simple; it simply
needs to subclass from the Persistent class, as shown
in this example:
import ZODB
from Persistence import Persistent
class User(Persistent):
pass
(The apparently unnecessary import ZODB statement
is needed for the following from...import statement to
work correctly, since the ZODB code is doing some magical tricks
with importing.)
For simplicity, in the examples the User class will
simply be used as a holder for a bunch of attributes. Normally the
class would define various methods that add functionality, but that
has no impact on the ZODB's treatment of the class.
The ZODB uses persistence by reachability; starting from a set of root objects, all the attributes of those objects are made persistent, whether they're simple Python data types or class instances.
As an example, we'll create a simple database of users that
allows retrieving a User object given the user's ID.
First, we retrieve the primary root object of the ZODB; this object
behaves like a Python dictionary, so you can just add a new
key/value pair for your application's root object. We'll insert a
BTree object that will contain all the
User objects. (The BTree module is also
included as part of Zope.)
dbroot = conn.root()
# Ensure that a 'userdb' key is present
if not dbroot.has_key('userdb'):
import BTree
dbroot['userdb'] = BTree.BTree()
userdb = dbroot['userdb']
Inserting a new user is simple: create the User
object, fill it with data, insert it into the BTree, and commit
this transaction.
# Create new User instance newuser = User() # Add whatever attributes you want to track newuser.id = 'amk' newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling' ... # Add object to the BTree, keyed on the ID userdb[ newuser.id ] = newuser # Commit the change get_transaction().commit()
When you import the ZODB package, it adds a new function,
get_transaction(), to Python's collection of built-in
functions. get_transaction() returns a transaction
object, which has two important methods: commit() and
abort(). commit() writes out any objects
that have been modified to disk, making the changes permanent,
while abort() rolls back any changes that have been
made, restoring the original state of the objects. If you're
familiar with database transactional semantics, this is all what
you'd expect.
Because the integration with Python is so complete, it's a lot like having transactional semantics for your program's variables, and you can experiment with transactions at the Python interpreter's prompt:
>>> newuser <User instance at 81b1f40> >>> newuser.first_name # Print initial value 'Andrew' >>> newuser.first_name = 'Bob' # Change first name >>> newuser.first_name # Verify the change 'Bob' >>> get_transaction().abort() # Abort transaction >>> newuser.first_name # The value has changed back 'Andrew'
The ZODB uses various Python hooks to catch attribute accesses,
which cover most of the ways of modifying an object, but not all of
them. If you modify a User object by assigning to one
of its attributes, as in userobj.first_name =
'Andrew', the ZODB will mark the object as having been
changed, and it'll be written out on the following
commit().
The most common idiom that isn't caught by Zope is
mutating a list or dictionary. If User objects have a
attribute named friends containing a list, calling
userobj.friends.append( otherUser ) doesn't mark
userobj as modified; from the ZODB's point of view,
userobj.friends was only read, and its value, which
happened to be an ordinary Python list, was returned. The ZODB
isn't aware that the returned value was later modified.
This is one of the few quirks you'll have to remember when using
the ZODB; if you modify an attribute of an object in place, you
have to manually mark the object as having been modified, by
setting the _p_changed attribute to true:
userobj.friends.append( otherUser ) userobj._p_changed = 1
You can hide this implementation detail by not designing your
class's API to use direct attribute access; instead, you can use
the Java-style approach of accessor methods for everything, and
then set _p_changed within the accessor method. For
example, you might forbid accessing the friends
attribute directly, and add a get_friend_list()
accessor and an add_friend() modifier method to the
class. Alternatively, you could use a ZODB-aware list or mapping
type that sets _p_changed for you; the ZODB includes a
PersistentMapping class, and I've contributed a
PersistentList class that may make it into a future
release.
Introducing ZEO
The ZODB, as I've described it so far, can only be used within a single Python process running on one machine. ZEO, Zope Enterprise Objects, extends the ZODB machinery to provide access to objects over a network. The name "Zope Enterprise Objects" is a bit misleading. ZEO can be used to store Python objects, and access them in a distributed fashion, without Zope ever entering a picture; essentially the combination of ZEO and ZODB is a Python-specific object database.
ZEO consists of about 1400 lines of Python code. The code is
relatively small because it contains only code for a TCP/IP server,
and for a new type of Storage, ClientStorage.
ClientStorage doesn't use disk files at all; it simply makes
remote procedure calls to the server, which then passes them on a
regular Storage class such as
FileStorage. The following diagram lays out the system:
[ XXX insert diagram here ]
Any number of processes can create a ClientStorage
instance, and any number of threads in each process can be using
that instance. ClientStorage aggressively caches
objects locally, so, in order to avoid using stale data, the ZEO
server sends an invalidate message to all the connected
ClientStorage instances on every write operation. The
invalidate message contains the object ID for each object that's
been modified, so the ClientStorage instances can
delete the old data for the given object from their caches.
This design decision has some consequences you should be aware
of. First, while ZEO isn't tied to Zope, it was first written for
use with Zope, which stores HTML, images, and program code in the
database. As a result, reads from the database are far
more frequent than writes, and ZEO is therefore better suited for
read-intensive applications. If every ClientStorage is
writing to the database very frequently, this will result in a
storm of invalidate messages being sent, and this might take up
more processing time than the actual database operations
themselves.
On the other hand, for applications that have few writes in
comparison to the number of read accesses, this aggressive caching
can be a major win. Consider the job of writing a Slashdot-like
discussion forum, where you want to divide the load among several
Web servers. If news items and postings are represented by objects,
and accessed through ZEO, then the most heavily accessed objects --
the most recent or most popular postings -- will very quickly wind
up in the caches of the ClientStorage instances on the
front-end servers. The back-end ZEO server will do relatively
little work, only being called upon to return the occasional older
posting that's requested, and to send the occasional invalidate
message when a new posting is added. The ZEO server isn't going to
be contacted for every single request, so its workload will remain
manageable.
Running a ZEO Server
The first step is to download and install the ZEO code. The
latest version will eventually be made available from the ZEO product page; untar
it while in the top directory of your Zope installation. The files
will be unpacked into lib/python/ZEO. Consult the
README for further installation steps that may be required; you may
have to create a few symlinks.
Once the code has been unpacked, the next step is to run a ZEO server. Go to the Zope installation directory and run this command:
python lib/python/ZEO/start.py -p 9672 /tmp/storage.fs
This starts a ZEO server listening on TCP port 9672, and using a
FileStorage on top of the file
/tmp/storage.fs. If you want to use a storage other than
FileStorage, you'll have to manually hack the code in
start.py to create an instance of a different
class.
Connecting to a ZEO Server
Once a ZEO server is up and running, using it is just like using
ZODB with a more conventional disk-based storage. The only
difference is that you'll create a ClientStorage
instance instead of a FileStorage instance:
from ZEO import ClientStorage
from ZODB import DB
storage = ClientStorage.ClientStorage( ('localhost', 9672) )
db = DB( storage )
conn = db.open()
From this point onward, your ZODB-based code is happily unaware that objects are being retrieved from a ZEO server, and not from the local disk.
Sample Application: chatter.py
For an example application, we'll build a little chat application. What's interesting is that none of the application's code deals with network programming at all; instead, an object will hold chat messages, and be magically shared between all the clients through ZEO. I won't present the complete script here; you can download it or read the colourised source code. Only the interesting portions of the code will be covered here.
The basic data structure is the ChatSession object,
which provides an add_message() method that adds a
message, and a new_messages() method that returns a
list of new messages that have accumulated since the last call to
new_messages(). Internally, ChatSession
maintains a B-tree that uses the time as the key, and stores the
message as the corresponding value.
The constructor for ChatSession is pretty simple;
it simply creates an attribute containing a B-tree:
classChatSession(Persistent): def__init__(self, name): # Internal attribute: _messages holds all the chat messages. self._messages = BTree.BTree()
add_message() has to add a message to the
_messages B-tree. A complication is that it's possible that
some other client is trying to add a message at the same time; when
this happens, the client that commits first wins, and the second
client will get a ConflictError exception when it
tries to commit. For this application, ConflictError
isn't serious but simply means that the operation has to be
retried; other applications might treat it as a fatal error. The
code uses try...except...else inside a
while loop, breaking out of the loop when the commit works
without raising an exception.
defadd_message(self, message):
"""Add a message to the channel.
message -- text of the message to be added
"""
while 1:
try:
now = time.time()
self._messages[ now ] = message
get_transaction().commit()
except ConflictError:
# Conflict occurred; this process should pause and
# wait for a little bit, then try again.
time.sleep(.2)
pass
else:
# No ConflictError exception raised, so break
# out of the enclosing while loop.
break
# end while
new_messages() introduces the use of
volatile attributes. Attributes of a persistent object that
begin with _v_ are considered volatile and are never
stored in the database. new_messages() needs to store
the last time the method was called, but if the time was stored as
a regular attribute, its value would be committed to the database
and shared with all the other clients. new_messages()
would then return the new messages accumulated since any other
client called new_messages(), which isn't what we
want.
defnew_messages(self):
"Return new messages."
# self._v_last_time is the time of the most recent message
# returned to the user of this class.
if not hasattr(self, '_v_last_time'):
self._v_last_time = 0
new = []
T = self._v_last_time
for T2, message in self._messages.items():
if T2 > T:
new.append( message )
self._v_last_time = T2
return new
This application is interesting because it uses ZEO to easily share a data structure, more like a networking tool than a database. I can foresee many interesting applications using ZEO in this way:
- With a Tkinter front-end, and a cleverer, more scalable data structure, you could build a shared whiteboard using the same technique.
- A shared chessboard object would make writing a networked chess game easy.
- You could create a Python class containing a CD's title and track information, and make a CD database containing many objects available through a read-only ZEO server.
- A program like Quicken could use a ZODB on the local disk to store its data. This avoids the need to write and maintain specialized I/O code that reads in your objects and writes them out; instead you can concentrate on the problem domain, writing objects that represent cheques, stock portfolios, or whatever.
Conclusion
The Z Object Database is a powerful tool for Python programmers, and one that deserves to be better known outside the Zope community. It makes it possible to write applications that share objects across a number of different machines, and, once you've learned the few simple rules for programming with the ZODB, the job is relatively easy.
References
- ZODB
HOWTO, by Michel Pelletier
Goes into slightly more detail about the rules for writing applications using the ZODB. (http://www.zope.org/Members/michel/HowTos/ZODB-How-To) -
Introduction to the Zope Object Database, by Jim Fulton
Goes into much greater detail, explaining advanced uses of the ZODB and how it's actually implemented. A definitive reference, and highly recommended. (http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html) - ZEO home
page
(http://www.zope.org/Products/ZEO/) -
Zope-ZEO mailing list
Mailing list for discussion of ZEO and ZODB. (http://lists.zope.org/mailman/listinfo/zope-zeo/) - chatter.py (the colourised source code)