If you've heard of the database engine SQLite and would like to contribute... good luck. Its author Richard Hipp is very, very picky. He tells us why.
(From issue 73, December 2005)
Learning SQL and database
normalisation are difficult
topics for computer
students, but how much
harder is it to learn how to
develop your own database
management system then release
it to the close inspection of the
Dr Richard Hipp did
just that when he created SQLite
(pronounced Ess-cue-ell-ite), which
is one of the most successful public
domain software projects around.
We caught up with Richard at
OSCon and had a poolside chat
about sponsorship from AOL,
coding standards and life in the
LXF: Let's start at the beginning:
Dr Hipp before SQLite. When did
you start getting interested in
RH: When I came out of graduate
school with my fresh PhD in 1992 -
LXF: That was in computer science,
RH: Yes, I took a Master's degree in
electrical engineering and computer
science. So I came out of graduate
school and at that time there was this
enormous glut of people applying for
jobs as faculty, university. And so
there would be an opening for one
position and there would literally be
500 applications. And of those about
475 were more qualified than me, so
I quickly recognised where this was
going and I just decided to go in
business myself doing consulting
work. I developed a good practice
where I would work with a client for
years on end... they would have a
hard problem and I would develop
solutions to problems that really
hadn't been solved before, not your
IT kind of jobs, but engineering types
I was working on one of these
[problems], and you know, businesses
that buy services are never a
monolithic entity, there are always
factions within the customer
organisation, and one of them just
insisted that this thing be driven by a
database. And because of legacy
constraints it was an Informix
database, running on legacy
hardware and an HP-UX system.
Informix ran fine on HP-US, and did
everything it needed to do, but
sometimes they would power the
system down and bring it back up,
and the Informix database would not
come up correctly. Then the program
that I was writing, together with
another group, would get error
messages and couldn't connect to
Because I was doing
the user interface I got the blame
even though it was the Informix
server that didn't come up! We talked
about it and said, "What we really
need is an embedded SQL database
engine that just reads data out of the
file and still handles the SQL."
That project ended and we didn't
take action on it in the end, but
during a period when I didn't have
any work later on I said, "I'm going to
get busy and write that. I'm curious
how an SQL database engine works
so I think I'll just write up my own
database engine cause that's always
a great way to learn how things
work." And that's how SQLite started.
LXF: So that's how you got into it...
RH: Yes, I did not study database
theory, I had zero courses in
database. When I was at university I
was browsing the library once and
came across a copy of... the SQL-89
Standard I guess it was, and I read
over it and I thought, "Well, that's
nice and all, but I don't see any
application to what I'm doing," but
then I came back to it ten years later
and what I was doing had shifted and
it made some sense.
LXF: Was it always planned to
throw the first release away? Was
it pretty dire, or something, is that
what you're saying?
RH: No, it wasn't, I mean, version 1.0
used GDB for the storage so it had
multiple files, you had one file for
each table and index. And it didn't
have transactions because GDB
doesn't support that. Also there were
a lot of things I couldn't do because
GDB is a hash-based thing, it doesn't
keep things in sorted order, so I
couldn't take advantage of indices
using any quality constraints, there
were a lot of permutations there. I
really needed to go to a btree-based
system. I looked at BerkelyDB, and I
spent a day or two days just reading
the documentation, which went on
and on. After a couple of days of
reading that stuff I thought, "This is
really complicated"- you know, the
API - "I think I'll just write my own."
So I pulled Cummings off the shelf
and read what he had to say about it,
sat down and designed my own btree
layer. And that led to version 2.0.
You know, I'm still learning while
I'm doing all this, so version 2.0 was...
well, I did a lot of things wrong with it
initially, but it eventually got to be
pretty good - by about version 2.7,
2.8 it was pretty nice. Everything was
still a string. But it was good enough at
that point for people to notice it. PHP
picked up on it -
LXF: 2.8 it was then, wasn't it?
RH: Yeah, they got 2.8. Apple first
noticed when it when it was at version
2.7 or 2.8. A lot of people started
using it. But then AOL came along and
they wanted to use it for some of their
products, and they said, "Hey Richard,
we need to support international
languages and we need some other
features." I'd been planning to do
version 3.0 and fix all the problems
that I had, so I told AOL, "I'm gonna
do that, it's just a matter of time - it's
gonna be about a year, I'll have
something by June of this year"- this
was 2004. And they said, "Can we pay
you to expedite your schedule?" So I
said, "Sure," and gave them a number.
LXF: And did you hit the deadline?
RH: Well, Dan Kennedy worked with
me closely on it. Dan used to work for
a company in Brisbane that used
SQLite, and he had sent me some
patches, which were really, really good
- he does really good work - so I got
him to help me and together in a
couple of months we went from
nothing to version 3.0. We wrote the
btree back-end and rewrote all kinds
of things and worked around the clock
for two months pretty much but we
made the schedule. And that was
"People send me patches and things,
and I appreciate the idea, I really
do, but I think, 'You haven't
really thought this through'."
LXF: SQLite 3.0 was very, very
different. The file format changed,
which is probably a result of you
writing it from scratch, and AOL
obviously provided some funding -
RH: AOL permitted me to devote full
time to it, because I didn't have to do
other things to make a living, and it
also permitted me to pay Dan to work
with me on it.
LXF: Is that help still ongoing
RH: No, that was a one-time thing. In
fact AOL doesn't even pay for support,
though because they did give us that
funding, whenever they call I jump
through hoops to help them. But
technically that contract has long since
expired. I'm led to understand that
they are very happy with the product
and continue to use it, though I do not
know exactly what they're using it in.
LXF: So in terms of the file format
changing it's obviously quite a bit
different from version 2.0. Was it
really necessary to rewrite the
RH: Well, no. I just incorporated all the
lessons learned, is what it came down
to, and I wanted to relax the
constraints. In the old version I had a
hard limit of 16MB, and now there is a
practical limit of 264 [MB] or
something like that. There were a lot of
lessons learned and I had been
accumulating these over the previous
couple of years: "When I redo the
btree layer I'm gonna do this
differently." But I couldn't just do it
incrementally because each one of
those changes would have been
incompatible and I wanted to preserve
compatibility; you know, I thought,
"When I go to 3.0 and make that
clean break I might as well get this all
in here at once." And that was the plan.
LXF: It almost seems a shame that
Apple and PHP picked it up before
RH: Apple started working with it at
2.8 but it's three-point-something
that's in Tiger. They made the change
because they could do it in time. But it
didn't come out in time for PHP.
But version 3.0 is in the PDO stuff of PHP.
LXF: Tell us about releasing SQLite
into the public domain.
RH: Well, version 1.0 was GPL. And it
had to be because it depended on
GDB, which was GPL. So we were
coming up on version 2.0, and I
thought, "What licence should I use? I
could do LGPL, but it's kind of
restrictive, or I could do a BSD-style
licence." But I looked at that and
thought, "What's the point of the BSD
licence? What does it really give you?
Why not just put it in the public
domain and be done with it?" Because
with the BSD, anybody can take it and
use it for whatever they want, so just
give it to them, make it explicit...
renounce this whole copyright thing
altogether. So that's what I decided to
do. Now, since that time a lot of
lawyers have come to me and said,
"Well, there's liability, someone could
come and sue you," but I'm sceptical of
LXF: Well, it's in the public domain!
RH: Yes, that's not going to play well
to a jury in my mind. Someone pulls in
some public domain source code and
incorporates it in a larger application,
and that larger application fails and
somebody gets hurt and they're gonna
blame me for that? That's a real
stretch. Even in an American court
that's a real stretch. So I'm not real
worried about it.
LXF: So how did you find going
from GPL to public domain? Or
were you the only person really
contributing to the code?
RH: I was the only contributor, so I
could do what I liked. At that point
when I made the transition every
single line of code came from me.
LXF: How big was it at the time?
RH: Maybe 20,000 lines of code.
LXF: So people who submit code
for SQLite now sign something?
RH: I've got a form that they have to
fill out, dedicating it to the public
domain. It just goes on and on about
how they renounce all future rights to
this and all of that. It's based on the
Creative Commons licence.
LXF: Do you get many contributors
to the project?
RH: I went in to the CVS tree two
weeks ago and did a study to see how
much code had been contributed by
whom. I counted lines of code, and
69% of the code came from me.
Thirty per cent came from Dan
Kennedy. The other 1% was divided
among I think six other contributors,
the largest of which was half a per
cent. So it's really mostly me there.
LXF: Would you say that perhaps
the public domain licence turns
RH: From contributing? I don't know. I
can see how the BSD licence seems
to really hurt OpenBSD. My theory is
that Linux took off when OpenBSD did
not simply because Linux is GPL and
people feel free to contribute to it with
the certainty that no one was going to
abuse their work.
But I think the real reason that not
many people have contributed to
SQLite is because I'm so incredibly
fussy about the code that goes into
the core. And that's more to do with
me than anything.
People send me
code all the time and I really do
appreciate it, I really do, but I have
very, very specific ideas of what the
code should look like, how it should be
commented, how it should be
indented... And it's more than how it
looks, it's how it's structured. A lot of
people send me patches and things,
and I appreciate the idea, I really do,
but I think, "You haven't really thought
this through, you haven't simplified the
code to the extreme, you haven't
refactored it three or four times..." I like
to make code as simple as possible.
LXF: That's one of the things that
people always talk about with
SQLite. It's one of the few open
source projects that has very
good documentation and that is
actually very well formatted and
commented. You can actually
jump into the code and understand
it, which is a real surprise in
RH: That is my goal...
LXF: And does that have anything
to do with it being public domain?
I mean, if your code wasn't
commented, there'd be a lot more
reliance on you, but by making the
code completely stand by itself -
RH: People should not and do not
have to call on me. Yes, that's true. I
don't know that it was a conscious
choice when I started doing it, but I'm
aware of that now.
"Putting the code into the
public domain, that was a big
step. It was sort of like
marrying off your firstborn."
LXF: You're just a picky
programmer, are you?
RH: The reason I'm so fussy about the
code and insist on commenting on it
so well and making it so simple is to
compensate for my own mental deficit
- I have a notoriously bad memory
and if I don't structure the code very
carefully and make it very easy to
read, I'll forget what it does in a couple
But also, when I was first looking at
databases I was curious about how
databases and database engines work.
I went reading about things like
PostgreSQL on the internet, and it
takes six or nine months of study to
really understand what this does. I
wanted my program to be something
that somebody could just pick up, who
was just curious about how these
things work, read through it and say,
"Now I understand better how a
database engine is supposed to work."
LXF: Which is I think one of the
goals of free software: to be able
to learn from other source code.
RH: Yes, I try to keep the comments
helpful. If the comment exists as a
huge hack I'm thinking, "Well, why is it
in here? Take it out."
LXF: What kind of new features
are you working on for SQLite 4.0?
RH: Actually I'm really hoping that I
never get to 4.0. The only reason I'd go
to 4.0 is if we had an incompatible file
LXF: So in ten years' time it will
RH: Or whatever, yes! Right now I'm
making some changes to the query
optimiser. It's picky about choosing
indices; it needs to do a better job of
making good choices. Traditionally
SQLite has been pretty weak in that
area. If you write the query correctly
you'll do a real good job with it but it
doesn't spend a lot of time analysing
the query and trying to find the best
way to do it. Whereas I hope that the
big commercial client-server enterprise
database systems spend a lot of time
trying to optimise these queries.
I'm actually getting a bit of funding
from a company and that helps me to
work on things without having to worry
about making a living otherwise.
Once I get [the query optimisers]
going well, I want to get in some
semblance of referential integrity and
support. That's just stuff I'm doing, I'm
not getting paid for it.
LXF: Do you ever see yourself
supporting SQL 92 queries; heck,
99 - let's go the whole hog!
RH: I'm not trying to make a product
out of this. I sell services, not products.
And the people who are coming to me
for services are using SQLite in
embedded devices. Their primary
concerns are low memory usage and
low power consumption. So I'm really,
really focused on maintaining both of
these because they're good customers
- they keep coming back to me for
consulting help and I want to make
sure they stay happy. A lot of the SQL
99, 2003 - whatever the latest entity
is - would cause [the device] to bloat
up and run slower so I'm really going
to resist that as much as I can.
LXF: SQLite is really, really fast,
and it's also really, really small. If
it came down to it, which would
RH: I don't know. The embedded
customers want both. Fast in the sense
of low power consumption. It does its
work quickly and then it finishes and
stops draining the battery. Small,
because they've got small memories,
both Flash and RAM, and they're
accounting for every byte. "Oh Richard,
can you save us 12 bytes here?" "Sure,
LXF: Presumably, memory capacity
is going up a lot faster than
RH: Probably so. But as memory
capacity goes up they start putting
[SQLite] into smaller and smaller
devices. So now it's no problem to put
it in a set-top box, but to go into your
cellphone, wristwatch or smartcard,
which is what people are doing with it
now, you're back to square one again.
LXF: That simplicity comes across
in the way that SQLite handles
stuff. For example, you say, "This is
an integer, this is a big integer, this
is a string", and it says, "Yeah,
whatever." As far as I'm aware it
ignores the lot. Is that a feature or
something you want to change?
RH: I think of it as a feature. I like it
very much because I do most of my
development work for clients in
scripting languages, mostly Tcl/Tk, and
it does the same kind of thing.
LXF: Just to clarify: it treats
everything as a string?
RH: Version 2.0 treats everything as a
string. Version 3.0 makes the
distinction between integers, floating
point numbers, strings and blobs
[binary data objects]. But it allows you
to put any of those four things in
different rows of the same column. In
a traditional SQL database you declare
a column to be 'short int' and can only
put a short int there. With SQLite you
can declare a column 'short int' and
put anything you like in there.
LXF: If you can put a string into an
integer field, is it really an integer
field? That's the question.
RH: Well, to my mind the whole
business of all the strong type in SQL
is a bug in the design of the language.
I contend that it was put there initially
so that implementers could determine
in advance how much disk space they
needed for a single row. Because they
were unwilling to go to the extra
trouble of writing disk access routines
that could have variable record sizes;
it's much harder to do. So it's a part of
the implementation showing through -
LXF: Outdated, really.
RH: It's a database: it's designed to
store data, not to tell you how to
format your data. That's my view. There
are opposing points of view, to put it
nicely. But I come from a scripting
language background and it made a
lot of sense to me just to allow you to
put anything you want into any
variable, because that's what you do in
a scripting language.
LXF: So that 'feature' is by design.
RH: People keep coming to me and
saying, "When are you going to fix
that?" Well, it's not broken! I think a lot
of it is just institutional resistance. SQL
has always been done this other way,
and now I've sort of extended the
design to allow you to do these other
things and it's "We've never done it
that way, this isn't really SQL".
LXF: Where do you see your role in
RH: I want to keep doing what I'm
doing right now - just kind of
maintaining it. SQLite doesn't take a lot
of maintenance work because it
doesn't give a lot of problems.
I'm trying to build up a lot of other
people to support it so that if I get hit
by a bus on the way home someone
else can take it over. Dan Kennedy is
more than capable of doing anything
that needs to be done, he's sort of my
backup at the moment, but it would
be nice if there were a bunch of
people that understood the code well
enough that they're really comfortable
just going in and fixing problems.
LXF: Is that what your goal is?
RH: I'd love to find a bunch of Dan
Kennedys. I guess I'm just really fussy
and I'm looking for the very best talent
that's out there, and the very best
people are already working on other
things. It would be great if I could
develop some more people, and get
some people to contribute. You know,
my name appears nowhere in the
LXF: "If you have complaints, just
RH: Putting the code in the public
domain, that was a big step. It was sort
of like marrying off your firstborn. It
was really hard to do because I put so
much time into that and I really did
release it, I intentionally went through
and removed my name. It's more than
just a legal thing: I really let it go. And
I'm trying to continue to do that, to let
it go. And that sounds incompatible
with this pickyness I have in terms of
who can contribute, but it's not,
because I'm not trying to protect my
code, I'm trying to protect the public's
(C) Future Publishing. Not to be copied or redistributed without permission.