The website of the UK's best-selling Linux magazine

Forums | Newsletters | Archives | Tutorials | LXF Blog
[ Add to Firefox ]
RSS feeds
News stories
Forum posts
Blog entries

http://www.linuxformat.co.uk/modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=21

Main Menu

Things to see
· Visit the forums
· LXF Team Blog
· LXF Newsletters
· Edit the Wiki

· LXF interviews
· Mag archives

· Search this website
· Submit news story
· Top 10 lists
· Other sites to visit

Linux Format magazine
· Subscribe today
· DVD listings
· Back issues
· International LXF
· Contact the mag
· Write for us!

Advertisement



On the forums
LATEST POSTS
goto Postlive cds on colins brothe...
 by Dutch_Mast...
 
goto PostThis week in Retail Humor...
 by wyliecoyot...
 
goto PostRE: Hardware venders bein...
 by pootman
 
goto PostTrouble with PPTPD login
 by nerdmaster
 
goto PostRe: RE: Re: RE: Update ph...
 by Rhakios
 

[Access Forum]


TotalDaily
Topics:6441  6.99
Users:7224  7.84
Posts:49288  53.52

Online
There are 614 unregistered users and 20 registered users on-line.

You are an anonymous user. You can register for free by clicking here


Random quote

"To err is human, but for a real disaster you need a computer."

-- Anon

   

If you've heard of the database engine SQLite and would like to contribute... good luck. Its author Richard Hipp is very, very picky. He tells us why.

(From issue 73, December 2005)

Learning SQL and database normalisation are difficult topics for computer students, but how much harder is it to learn how to develop your own database management system then release it to the close inspection of the wider world?

Dr Richard Hipp did just that when he created SQLite (pronounced Ess-cue-ell-ite), which is one of the most successful public domain software projects around. We caught up with Richard at OSCon and had a poolside chat about sponsorship from AOL, coding standards and life in the public domain...

 (Shortcuts: first release, public domain)



LXF: Let's start at the beginning: Dr Hipp before SQLite. When did you start getting interested in databases?

RH: When I came out of graduate school with my fresh PhD in 1992 -


LXF: That was in computer science, wasn't it?

RH: Yes, I took a Master's degree in electrical engineering and computer science. So I came out of graduate school and at that time there was this enormous glut of people applying for jobs as faculty, university. And so there would be an opening for one position and there would literally be 500 applications. And of those about 475 were more qualified than me, so I quickly recognised where this was going and I just decided to go in business myself doing consulting work. I developed a good practice where I would work with a client for years on end... they would have a hard problem and I would develop solutions to problems that really hadn't been solved before, not your IT kind of jobs, but engineering types of computation.

I was working on one of these [problems], and you know, businesses that buy services are never a monolithic entity, there are always factions within the customer organisation, and one of them just insisted that this thing be driven by a database. And because of legacy constraints it was an Informix database, running on legacy hardware and an HP-UX system. Informix ran fine on HP-US, and did everything it needed to do, but sometimes they would power the system down and bring it back up, and the Informix database would not come up correctly. Then the program that I was writing, together with another group, would get error messages and couldn't connect to the database.

Because I was doing the user interface I got the blame even though it was the Informix server that didn't come up! We talked about it and said, "What we really need is an embedded SQL database engine that just reads data out of the file and still handles the SQL." That project ended and we didn't take action on it in the end, but during a period when I didn't have any work later on I said, "I'm going to get busy and write that. I'm curious how an SQL database engine works so I think I'll just write up my own database engine cause that's always a great way to learn how things work." And that's how SQLite started.


LXF: So that's how you got into it...

RH: Yes, I did not study database theory, I had zero courses in database. When I was at university I was browsing the library once and came across a copy of... the SQL-89 Standard I guess it was, and I read over it and I thought, "Well, that's nice and all, but I don't see any application to what I'm doing," but then I came back to it ten years later and what I was doing had shifted and it made some sense.


 The first release


LXF: Was it always planned to throw the first release away? Was it pretty dire, or something, is that what you're saying?

RH: No, it wasn't, I mean, version 1.0 used GDB for the storage so it had multiple files, you had one file for each table and index. And it didn't have transactions because GDB doesn't support that. Also there were a lot of things I couldn't do because GDB is a hash-based thing, it doesn't keep things in sorted order, so I couldn't take advantage of indices using any quality constraints, there were a lot of permutations there. I really needed to go to a btree-based system. I looked at BerkelyDB, and I spent a day or two days just reading the documentation, which went on and on. After a couple of days of reading that stuff I thought, "This is really complicated"- you know, the API - "I think I'll just write my own."

So I pulled Cummings off the shelf and read what he had to say about it, sat down and designed my own btree layer. And that led to version 2.0. You know, I'm still learning while I'm doing all this, so version 2.0 was... well, I did a lot of things wrong with it initially, but it eventually got to be pretty good - by about version 2.7, 2.8 it was pretty nice. Everything was still a string. But it was good enough at that point for people to notice it. PHP picked up on it -


LXF: 2.8 it was then, wasn't it?

RH: Yeah, they got 2.8. Apple first noticed when it when it was at version 2.7 or 2.8. A lot of people started using it. But then AOL came along and they wanted to use it for some of their products, and they said, "Hey Richard, we need to support international languages and we need some other features." I'd been planning to do version 3.0 and fix all the problems that I had, so I told AOL, "I'm gonna do that, it's just a matter of time - it's gonna be about a year, I'll have something by June of this year"- this was 2004. And they said, "Can we pay you to expedite your schedule?" So I said, "Sure," and gave them a number.


LXF: And did you hit the deadline?

RH: Well, Dan Kennedy worked with me closely on it. Dan used to work for a company in Brisbane that used SQLite, and he had sent me some patches, which were really, really good - he does really good work - so I got him to help me and together in a couple of months we went from nothing to version 3.0. We wrote the btree back-end and rewrote all kinds of things and worked around the clock for two months pretty much but we made the schedule. And that was version 3.0.

"People send me patches and things,
and I appreciate the idea, I really
do, but I think, 'You haven't
really thought this through'."


LXF: SQLite 3.0 was very, very different. The file format changed, which is probably a result of you writing it from scratch, and AOL obviously provided some funding -

RH: AOL permitted me to devote full time to it, because I didn't have to do other things to make a living, and it also permitted me to pay Dan to work with me on it.


LXF: Is that help still ongoing from AOL?

RH: No, that was a one-time thing. In fact AOL doesn't even pay for support, though because they did give us that funding, whenever they call I jump through hoops to help them. But technically that contract has long since expired. I'm led to understand that they are very happy with the product and continue to use it, though I do not know exactly what they're using it in.


LXF: So in terms of the file format changing it's obviously quite a bit different from version 2.0. Was it really necessary to rewrite the btree layer?

RH: Well, no. I just incorporated all the lessons learned, is what it came down to, and I wanted to relax the constraints. In the old version I had a hard limit of 16MB, and now there is a practical limit of 264 [MB] or something like that. There were a lot of lessons learned and I had been accumulating these over the previous couple of years: "When I redo the btree layer I'm gonna do this differently." But I couldn't just do it incrementally because each one of those changes would have been incompatible and I wanted to preserve compatibility; you know, I thought, "When I go to 3.0 and make that clean break I might as well get this all in here at once." And that was the plan.


LXF: It almost seems a shame that Apple and PHP picked it up before version 3.0.

RH: Apple started working with it at 2.8 but it's three-point-something that's in Tiger. They made the change because they could do it in time. But it didn't come out in time for PHP. But version 3.0 is in the PDO stuff of PHP.


 Public domain


LXF: Tell us about releasing SQLite into the public domain.

RH: Well, version 1.0 was GPL. And it had to be because it depended on GDB, which was GPL. So we were coming up on version 2.0, and I thought, "What licence should I use? I could do LGPL, but it's kind of restrictive, or I could do a BSD-style licence." But I looked at that and thought, "What's the point of the BSD licence? What does it really give you? Why not just put it in the public domain and be done with it?" Because with the BSD, anybody can take it and use it for whatever they want, so just give it to them, make it explicit... renounce this whole copyright thing altogether. So that's what I decided to do. Now, since that time a lot of lawyers have come to me and said, "Well, there's liability, someone could come and sue you," but I'm sceptical of these arguments.


LXF: Well, it's in the public domain!

RH: Yes, that's not going to play well to a jury in my mind. Someone pulls in some public domain source code and incorporates it in a larger application, and that larger application fails and somebody gets hurt and they're gonna blame me for that? That's a real stretch. Even in an American court that's a real stretch. So I'm not real worried about it.


LXF: So how did you find going from GPL to public domain? Or were you the only person really contributing to the code?

RH: I was the only contributor, so I could do what I liked. At that point when I made the transition every single line of code came from me.


LXF: How big was it at the time?

RH: Maybe 20,000 lines of code.


LXF: So people who submit code for SQLite now sign something?

RH: I've got a form that they have to fill out, dedicating it to the public domain. It just goes on and on about how they renounce all future rights to this and all of that. It's based on the Creative Commons licence.


LXF: Do you get many contributors to the project?

RH: I went in to the CVS tree two weeks ago and did a study to see how much code had been contributed by whom. I counted lines of code, and 69% of the code came from me. Thirty per cent came from Dan Kennedy. The other 1% was divided among I think six other contributors, the largest of which was half a per cent. So it's really mostly me there.


LXF: Would you say that perhaps the public domain licence turns people off?

RH: From contributing? I don't know. I can see how the BSD licence seems to really hurt OpenBSD. My theory is that Linux took off when OpenBSD did not simply because Linux is GPL and people feel free to contribute to it with the certainty that no one was going to abuse their work.

But I think the real reason that not many people have contributed to SQLite is because I'm so incredibly fussy about the code that goes into the core. And that's more to do with me than anything.

People send me code all the time and I really do appreciate it, I really do, but I have very, very specific ideas of what the code should look like, how it should be commented, how it should be indented... And it's more than how it looks, it's how it's structured. A lot of people send me patches and things, and I appreciate the idea, I really do, but I think, "You haven't really thought this through, you haven't simplified the code to the extreme, you haven't refactored it three or four times..." I like to make code as simple as possible.


LXF: That's one of the things that people always talk about with SQLite. It's one of the few open source projects that has very good documentation and that is actually very well formatted and commented. You can actually jump into the code and understand it, which is a real surprise in today's world.

RH: That is my goal...


LXF: And does that have anything to do with it being public domain? I mean, if your code wasn't commented, there'd be a lot more reliance on you, but by making the code completely stand by itself -

RH: People should not and do not have to call on me. Yes, that's true. I don't know that it was a conscious choice when I started doing it, but I'm aware of that now.

"Putting the code into the
public domain, that was a big
step. It was sort of like
marrying off your firstborn."


LXF: You're just a picky programmer, are you?

RH: The reason I'm so fussy about the code and insist on commenting on it so well and making it so simple is to compensate for my own mental deficit - I have a notoriously bad memory and if I don't structure the code very carefully and make it very easy to read, I'll forget what it does in a couple of weeks!

But also, when I was first looking at databases I was curious about how databases and database engines work. I went reading about things like PostgreSQL on the internet, and it takes six or nine months of study to really understand what this does. I wanted my program to be something that somebody could just pick up, who was just curious about how these things work, read through it and say, "Now I understand better how a database engine is supposed to work."


LXF: Which is I think one of the goals of free software: to be able to learn from other source code.

RH: Yes, I try to keep the comments helpful. If the comment exists as a huge hack I'm thinking, "Well, why is it in here? Take it out."


LXF: What kind of new features are you working on for SQLite 4.0?

RH: Actually I'm really hoping that I never get to 4.0. The only reason I'd go to 4.0 is if we had an incompatible file format change.


LXF: So in ten years' time it will be 3.9.9.9.6.3...

RH: Or whatever, yes! Right now I'm making some changes to the query optimiser. It's picky about choosing indices; it needs to do a better job of making good choices. Traditionally SQLite has been pretty weak in that area. If you write the query correctly you'll do a real good job with it but it doesn't spend a lot of time analysing the query and trying to find the best way to do it. Whereas I hope that the big commercial client-server enterprise database systems spend a lot of time trying to optimise these queries.

I'm actually getting a bit of funding from a company and that helps me to work on things without having to worry about making a living otherwise. Once I get [the query optimisers] going well, I want to get in some semblance of referential integrity and support. That's just stuff I'm doing, I'm not getting paid for it.


LXF: Do you ever see yourself supporting SQL 92 queries; heck, 99 - let's go the whole hog!

RH: I'm not trying to make a product out of this. I sell services, not products. And the people who are coming to me for services are using SQLite in embedded devices. Their primary concerns are low memory usage and low power consumption. So I'm really, really focused on maintaining both of these because they're good customers - they keep coming back to me for consulting help and I want to make sure they stay happy. A lot of the SQL 99, 2003 - whatever the latest entity is - would cause [the device] to bloat up and run slower so I'm really going to resist that as much as I can.


LXF: SQLite is really, really fast, and it's also really, really small. If it came down to it, which would win out?

RH: I don't know. The embedded customers want both. Fast in the sense of low power consumption. It does its work quickly and then it finishes and stops draining the battery. Small, because they've got small memories, both Flash and RAM, and they're accounting for every byte. "Oh Richard, can you save us 12 bytes here?" "Sure, why not?"


LXF: Presumably, memory capacity is going up a lot faster than battery power.

RH: Probably so. But as memory capacity goes up they start putting [SQLite] into smaller and smaller devices. So now it's no problem to put it in a set-top box, but to go into your cellphone, wristwatch or smartcard, which is what people are doing with it now, you're back to square one again.


LXF: That simplicity comes across in the way that SQLite handles stuff. For example, you say, "This is an integer, this is a big integer, this is a string", and it says, "Yeah, whatever." As far as I'm aware it ignores the lot. Is that a feature or something you want to change?

RH: I think of it as a feature. I like it very much because I do most of my development work for clients in scripting languages, mostly Tcl/Tk, and it does the same kind of thing.


LXF: Just to clarify: it treats everything as a string?

RH: Version 2.0 treats everything as a string. Version 3.0 makes the distinction between integers, floating point numbers, strings and blobs [binary data objects]. But it allows you to put any of those four things in different rows of the same column. In a traditional SQL database you declare a column to be 'short int' and can only put a short int there. With SQLite you can declare a column 'short int' and put anything you like in there.


LXF: If you can put a string into an integer field, is it really an integer field? That's the question.

RH: Well, to my mind the whole business of all the strong type in SQL is a bug in the design of the language. I contend that it was put there initially so that implementers could determine in advance how much disk space they needed for a single row. Because they were unwilling to go to the extra trouble of writing disk access routines that could have variable record sizes; it's much harder to do. So it's a part of the implementation showing through -


LXF: Outdated, really.

RH: It's a database: it's designed to store data, not to tell you how to format your data. That's my view. There are opposing points of view, to put it nicely. But I come from a scripting language background and it made a lot of sense to me just to allow you to put anything you want into any variable, because that's what you do in a scripting language.


LXF: So that 'feature' is by design.

RH: People keep coming to me and saying, "When are you going to fix that?" Well, it's not broken! I think a lot of it is just institutional resistance. SQL has always been done this other way, and now I've sort of extended the design to allow you to do these other things and it's "We've never done it that way, this isn't really SQL".


LXF: Where do you see your role in SQLite being?

RH: I want to keep doing what I'm doing right now - just kind of maintaining it. SQLite doesn't take a lot of maintenance work because it doesn't give a lot of problems. I'm trying to build up a lot of other people to support it so that if I get hit by a bus on the way home someone else can take it over. Dan Kennedy is more than capable of doing anything that needs to be done, he's sort of my backup at the moment, but it would be nice if there were a bunch of people that understood the code well enough that they're really comfortable just going in and fixing problems.


LXF: Is that what your goal is?

RH: I'd love to find a bunch of Dan Kennedys. I guess I'm just really fussy and I'm looking for the very best talent that's out there, and the very best people are already working on other things. It would be great if I could develop some more people, and get some people to contribute. You know, my name appears nowhere in the source code.


LXF: "If you have complaints, just email dan.kennedy@..."

RH: Putting the code in the public domain, that was a big step. It was sort of like marrying off your firstborn. It was really hard to do because I put so much time into that and I really did release it, I intentionally went through and removed my name. It's more than just a legal thing: I really let it go. And I'm trying to continue to do that, to let it go. And that sounds incompatible with this pickyness I have in terms of who can contribute, but it's not, because I'm not trying to protect my code, I'm trying to protect the public's code.


(C) Future Publishing. Not to be copied or redistributed without permission.

  

[ Back to Interviews | Sections index ]

    http://www.linuxformat.co.uk/modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=3&page=1

Latest issue

Cover image

LXF 98, NOVEMBER (ON SALE 20 SEPTEMBER)

Click here for the contents


Login
 



 


 Log in Problems?
 New User? Sign Up!

On the Blog
» The week so far (in photos)
» Announcing MikeOS 1.0.0!
» WMBC
» Stupidity: A new kind of spam
» “Back once again with the ill behaviour”

Current poll
Your current favourite distro?

· (K/X)Ubuntu
· (Open)SUSE
· Fedora
· Mandriva
· PCLinuxOS
· Debian
· Mepis
· Slackware
· Gentoo
· Linux Mint
· Sabayon
· Other (comment)

[ Results | Polls ]

Votes: 1
Comments: 0


Advertisement

Latest news
· The upcoming Mandriva 2008.0 (Sep 26, 2007)
· SCO blames Linux for Chapter 11 (Sep 24, 2007)
· OpenOffice.org 2.3 and Gnome 2.20 released (Sep 20, 2007)
· SCO files Chapter 11; Microsoft loses EC appeal (Sep 17, 2007)
· Debian win32-loader added to Lenny (Sep 13, 2007)
· AMD to open source ATI graphics drivers (Sep 10, 2007)
· Chrooted SSH/SFTP tutorial (Sep 10, 2007)

Older news
Wednesday, August 15
·SCO's statement on court ruling (0)
·Ubuntu servers cracked? (0)
·Using an iPod in Linux with Amarok or Rhythmbox (1)
Saturday, August 11
·Bad news for SCO (1)
Friday, July 27
·Microsoft launches open source website (0)
·Thunderbird to leave Mozilla Foundation? (0)
Monday, July 23
·Linspire/MS agreement picked apart (0)
 Older articles



Web hosting provided by UKFast.net
This web site was made with PostNuke, a web portal system written in PHP. PostNuke is Free Software released under the GNU/GPL license.
All contributions made to this site, as comments or forum posts, reviews, links or other words and images are assumed to be the property of the person submitting it. All submissions are accepted on the basis of a non-exclusive worldwide right to publish and to license others to do so.
© Future Publishing Limited. Reg No. 2008885 England. 30 Monmouth Street, Bath, BA1 2BW.