Friday, December 09, 2005

Let's get a real database

This story is a bit of a tangent, but I think it's worth telling because there are some useful lessons to be learned.

AdWords was built using the MySQL database, which is open-source and therefore available for free. It is by now also nearly as full-featured as the best commercial databases, but back in 2000 this was not the case. MySQL was quite a capable system, but missing a few (what some would consider basic) features. These missing features were obviously not a show-stopper, as we managed to get AdWords to work without them, but in a few cases it did take some extra programming to work around one of these missing features. On the plus side, MySQL was fast and reliable and, as I have already noted, free.

After AdWords launched, Jane, the ads group manager, decided that now would be a good time to switch over to a "real" database. "Real" is one of those words that Doug ought to add to his list of words. It means "expensive". Many managers seem to have this idea that it is invariably true that you get what you pay for, and that therefore nothing that is available for free can possibly be any good. Using MySQL was acceptable as an expedient to get things up and running quickly and with a minimal of capital outlay, but now that things were settling down it was time to recognize that this was really, fundamentally, a mistake, and it should be fixed sooner rather than later.

The flip side of this philosophy is the one more commonly espoused by engineers, which is nicely summed up by the old aphorism, "If it ain't broke, don't fix it." Yes, MySQL was missing some features, but it wasn't broke(n). We had spirited debates in ads group meetings over what to do.

We finally decided to go with a commercial database (I won't say which one) over the objections of a number of engineers, including myself. To ease the transition it was decided to convert AdWords over to the new system first, and to do the main ads system later. It was a project on a par with the internationalzation effort in terms of the tedious work required to comb over nearly all of the AdWords code and change all of the database queries. (Databases are supposed to all be compatible with one another, but in reality they pretty much aren't.)

To make a long story short, it was an unmitigated disaster. The new system was slower than molasses in February. Some heroic optimization efforts eventually produced acceptable performance, but it was never as good as the old MySQL-based system had been. For a long time we were stuck with the worst of all possible worlds, with the two ads systems running on two different databases. It was still that way when I left Google in October of 2001, but I have heard through the grapevine that they eventually went back to MySQL. (Since then, MySQL has added many of the features that had been missing at the time.)

The moral of the story is that sometimes, and in particular with free software, you get more than what you pay for. There are a lot of companies out there paying dearly for commercial databases (and operating systems for that matter). As far as I'm concerned they might as well be flushing that money down the toilet. Actually, they might be better off. We certainly would have been.

As an aside, there is a raging debate in the hacker community about the overall economic merit of the open source model. (Making money producing free software is quite a challenge.) I am not taking sides in that debate here. All I am saying is that from the end user's point of view free software is often much better than the producers of commercial software would like people to think.

69 Comments:

James A. Warholic said...

I am actually somewhat surprised that there was a push by some at Google to pay for another program in which the original was working quite well to begin with. After all, I thought the main search servers were built with Open Source Software such as Python. Anyways, I have thoroughly enjoyed reading the posts back from the beginning. I look forward to the next segments.

5:36 PM  
Jonathan Ellis said...

holy crap, you ran adwords on a transactionless database?

you have balls. and you're lucky as hell nothing (else) went wrong.

9:48 PM  
Ron said...

holy crap, you ran adwords on a transactionless database?

Yep.

you have balls. and you're lucky as hell nothing (else) went wrong.

Nope. Luck had nothing to do with it. If you don't have transactions you just roll your own. It's actually not hard at all.

10:21 PM  
Anonymous said...

Aren't there issues with the GPL? PostgreSQL for example offers a BSD license, which is why many commercial users prefer it to MySQL.

1:45 AM  
Rob said...

This neatly encapsulates my employer's experiences (and I was one of the tech guys who encouraged the company to move to *cough* Oracle *cough*!). You can get an elephant to tapdance, but the results will never be graceful. Asking one to do so in your living room is twice a mistake.

6:39 AM  
Anonymous said...

For the life of me I don't understand why more folks don't use Postgres. The BSD license seems to be the way to go. Similarly, I can't comprehend why Google uses a Linux kernel when the BSD kernel is completely free of any GPL b.s.

6:47 AM  
Ron said...

I can't comprehend why Google uses a Linux kernel when the BSD kernel is completely free of any GPL b.s.

Google isn't distributing software based on GPL code, so the GPL "b.s." doesn't come into play.

9:06 AM  
seo black & white said...

Btw, MySQL supports transaction rather well since they added InnoDB storage engine. In the last three years it works for me like a charm...

9:23 AM  
Steve said...

This story says a lot about the culture at Google--especially in those early days. While LAMP (er non commercial platform based) architectures are today given the respect they deserve--it was a much more aggressive thing to do back then (and still be taken seriously). I recently left a moldy old company doing bank automation systems which collapsed under the weight of WebSphere, DB/2 and Oracle. Unfortunately it takes an engineering culture to cut through the hype and distill the issues to the essence of a particular tool or system's purpose--then to select the right tool for the job. Google's culture embraces this and is a fundamental reason for their success.

10:18 AM  
Anonymous said...

In 1999 I gave a presentation to the bankers taking MP3.com public (July 1999, $370m offering). We used MySQL (wish I'd used Postgres and today, that's all I use) and the bankers were agast when I told them we did not use Oracle. Today your bankers would be agast if you did use Oracle in your startup. MP3.com also funded Hans Reiser to add journaling to Reiserfs so as to reduce dependencies on VxFS. An estimated savings of $8m over five years. Open source definately has its place and you've got to give back to the community to keep it fresh and moving forward.

6:36 PM  
Anonymous said...

People who don't like the GPL must really like the reach around Microsoft and Apple give you when they are fucking you for your code.

6:51 PM  
Anonymous said...

A former employer of mine got the "let's get a real language" fever. They decided to ditch Python for Java because of a whole bunch of pointy-haired boss reasons. The mentality seems the same as that of the "real database" story here...why does management so often disregard the opinions of the people that have actually been in the trenches doing the work? Why do they decide to throw away or overhaul working systems because of some marketing gobbledygook?

This is an apect of business I really don't understand. Even when the decision precipatates horrible consequences (unbearably slow systems, unmanageable hulking codebases), some folks just can't recognize that they made a mistake by not listening to the engineers.

7:02 PM  
Jonathan Ellis said...

"If you don't have transactions you just roll your own. It's actually not hard at all."

You're scaring me.

7:05 PM  
Jonathan Ellis said...

To elaborate: some transaction benefits can be simulated "easily," such as logging each update to make sure Bad Things don't happen when you, say, charge a customer and the db goes down before the ad gets inserted. Although if single updates aren't even atomic -- and if your db doesn't support transactions, they probably aren't, which is the case for MySQL's in non-innodb tables -- you're still in trouble if you're running something like "update users set balance = balance - delta" when it goes down.

The harder ones, though, such as isolation, I've never seen done outside the database, and I wouldn't trust the code of anyone who claimed to do it. Much safer to use a database that's already debugged their code many times over.

7:14 PM  
Mike Hillyer said...

Jonathan Ellis said...

holy crap, you ran adwords on a transactionless database?

you have balls. and you're lucky as hell nothing (else) went wrong.


Actually, MySQL has had ACID transactions for many years now, including savepoints and four levels of transaction isolation. Best get current before criticizing.

7:24 PM  
Anonymous said...

Erm, the article is about yr 2000. MySQL got transactions about 2001 if i'm correct, but don't really count on that :) However, we can presume that MySQL had no transactions back then :D

7:47 PM  
Anonymous said...

Single MySQL updates have been atomic for a looong time (1999 at worst), maybe always. With those you can roll your own transactions.

It's funny when people trot out disaster scenarios that derive from premises that are incorrect. "Well, yeah, if not for Gravity, things might be different like you said..."

7:48 PM  
Jonathan Ellis said...

MySQL does claim that updates are atomic even on MyISAM tables, but I've seen many people report that this is only true if nothing goes wrong, i.e., power loss, simple crashing, etc. Two such people have commented on the relevant doc page: http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-transactions.html

I'm no expert on MySQL source but I've seen enough non-idiots report on MySQL corruption that I find toeing-the-party-line claims to the contrary difficult to believe.

In any case, my main point stands: atomic updates alone aren't going to help you improvise isolation support, etc., if your db doesn't have a concept of transactions.

8:22 PM  
Hank said...

Actually Ron, "rolling your own" is not a smart way to go. Trying to emulate transactions in your app means that you don't actually have the guarentee of a transaction, since your app and the DB are seperate, and one or the other can go down in the middle of a pseudo-transaction, leaving you with inconsistant data.

Using such a crippled and broken database is even more crazy when you consider that it corrupts data (intentionally) and that there was a free alternative without these problems, and that handles many concurrent queries better.

And anon, they can't use BSD, they're from Stanford.

8:33 PM  
Anonymous said...

MySQL is rock solid.

10:16 PM  
Online Degree said...

We have been using MySQL on our site (www.degreeme.com) since its inception and have never had a single problem. Impressed? Just kidding... I know that isn't at all impressive :) But I am still a huge supporter of MySQL although I also have heard of instances where MySQL has had corruption problems. Hopefully the new version 5 has fixed that.

11:35 PM  
Anonymous said...

i love this story... you do get what you pay for... of course how you pay is relative... because some of the best things in life are free... fun stuff

11:38 PM  
Anonymous said...

I'd dare to guess that the reason they had such a big problem porting over to some other database was that their code base was full of "hand rolled transactions", including a huge number of calls to "lock table".

If they had gome with a database with transaction support from the get go, like PostgreSQL, the porting work would have been much simpler.

The fact that MySQL makes you code a lot of non-standard stuff which makes it difficult to port the code over to other databases is being presented as a good thing in this article. If the author had opened his eyes a bit he'd learn to use something else from the get go the next time.

And yes, PostgreSQL is a "real database", and it's not expensive, it's free.

11:53 PM  
Ron said...

Actually Ron, "rolling your own" is not a smart way to go.

What can I say? It worked for us. Worked for this guy too.

12:33 AM  
Anonymous said...

As some previous people said, I would be surprised if the problems porting to the 'other' database from MySQL where really any different from the problems developers generally get when porting from database A to B. Without a really good understanding of how things work in a given DBMS, especially locking and transaction logging, it is very difficult to get something to work as well when ported. I would wager than even the high calibre googleplex folk would find it a challenge to get that experience quickly. Tom Kyte's Oracle books contain numerous stories which illustrate this point from his experiences of going from Sybase to SQL Server to Oracle.

3:08 AM  
sf said...

It's not "free" vs "proprietary". It's some features against other features and tradeoffs against other tradeoffs. MySQL has the single unique feature of being a speed demon amongst the databases. The tradeoffs are quite dire in the domain of functionality, and need to use specific MySQLisms instead of the "high way" to have things running fast, not slower than all the rest. Other databases are more feature-packed, but they are simply slower. Sometimes WAY slower. The change of the database should have considered this.

3:48 AM  
Samuel Cochran said...

It is quite possible to make money from Open Source software. Rather than charging for software, one must charge for services. Installation and configuration of software, customisation of software and solutions, etc. can all be marketed rather than software. This is my approach.

By doing this one can contribute any changes and improvments to the Open Source community whilst providing solutions for companies, and make a living, too. :)

5:38 AM  
Anonymous said...

I went through similar issues at a biotech startup. I was the only programmer there for the the first 1.5 years and had built everything with a postgres back end. I caught a lot of heat resisting managements wish to convert to Oracle, but persist I did, because there was no technical justification for the effort. I left for a much better opportunity six months later and my successor apparently got the job by promising the conversion. After 1.5 man-years, they still did not have the system working as well as it had been. Eventually the company was sold for pennies on the dollar.

8:09 AM  
Anonymous said...

You have a team of engineers that are resisting the move to another database system pushed by management.
These engineers are all happy with their current database, which is probably the one they use any chance they get, so they are really in tune with how it works.
The new database has to be used and tuned in a completely different way, requiring its own years of experience to master.
Of course the second solution using the new database is going to be worse, how could it not? It says nothing about the database products tbemselves.

9:01 AM  
Anonymous said...

postgres advocates are delusional -- millions of mysql users have no time for crashgreslow.

Back in 2000, postgres didn't support the LIMIT keyword. I got hoodwinked by the FUD they spread about MySQL and tried using it for three projects: each time I got crashes and data corruption with small data sets. You could find very few sites that were using Postgres on the web, and the operators of many of those sites regretted that choice.

Somewhere around postgres 7.4, postgres became stable enough that you actually ~could~ use it in a production system. Still, it's pretty common for people to try a few queries on Postgres and mysql, notice that some very common queries run 60 times faster on mysql, and come to an obvious conclusion.

People who push the "no transactions" FUD also forget that transaction support often reduces the reliability of applications -- yes, it's the stupidity of the application programmers to blame, but an organization I've worked for spent $250,000 on some commercial junkware which doesn't do correct error handling with transactions, and this leads to strange, glitchy, sporadic problems... And these guys aren't using an "off brand database", but they're using Oracle.

The truth is that it's not the end of the world if you mess up a row or two in most databases: if you consider all the other ways an e-commerce site can screw up because of user error, human error in fulfillment, problems in shipping, other programming errors, losing a row because of the lack of transaction support is a minor problem... As the CEO of one company told me, "if we mess up one order in six months, we can just let the customer have it for free" -- that's a lot cheaper than the cost of a DBA and the extra hardware to run Oracle, never mind the cost to license Oracle.

9:22 AM  
Anonymous said...

What I don't understand is why modern huge database applications are written using the database engines that run on top of some universal OS? I mean, what else can be on that server except the database? So the ideal modern database should be the OS, possibly even without the file system as we know it. He hard-drive itself can be structured to suit the database needs so one would never have to rewrite any files, just the segment of the disk that was assigned to the particular data. The improvement is speed and reliability would be enormous.

9:26 AM  
Lauri said...

Ron, how much need did you have for transactions when working AdSense anyway?

It looks to me like it is mostly INSERT operations and lots of them. They get written to separate servers based on the user hash and the balance changes get calculated at X intervals offline.

Am I close? Or did you have some other setup.

10:42 AM  
Anonymous said...

I'm not a developer, i'm a financier so please excuse my ignorance. I found this post quite interesting -- where would you suggest I go to read the best open source debates? I'm interested in what the technology community pitches as the pro's and con's. Thanks.

4:58 PM  
Ron said...

Ron, how much need did you have for transactions when working AdSense anyway?

Well, that depends on what you mean. In some sense we had no need for them at all, obviously, because we built the system without them. If you want to know how many times we had to write extra code that we wouldn't have had to write had we had transactions, it's hard for me to recall exactly (it's been five years) but it wasn't very much. It certainly was not a major issue. And, like I said, in those situations where you need real transactions (fairly rare in most real world applications I believe), emulating them yourself isn't particularly hard.

Of course, it's all a moot point now that MySQL has transactions.

5:11 PM  
Anonymous said...

The MySQL doc page says an update might not update ALL of the rows if the thread is killed during the update. That still does not stop single-row updates from working, as mentioned. Furthermore, if you are using your own transactions, you are writing some extra code to make sure multi-row updates work as well.

6:15 PM  
Matisse Enzer said...

One way of setting up MySQL these days is to do all the INSERT/UPDATE/DELETE to an InnoDb table and then to have a pool of "query slaves" where the table is MyISAM.

The query-slaves are behind a load-balancer and are used by applications that need to do SELECTs, and this way they never have to wait for the locks that the INSERT/UPDATE/DELETE create.

The Master can be a MyISAM table if transaction-safety isn't as important as write-speed - INSERT into a MyISAM table is very very fast.

10:32 PM  
Sugree Phatanapherom said...

Did they develop any lock server to simulate transaction in global scale?

1:53 AM  
Anonymous said...

All I am saying is that from the end user's point of view free software is often much better than the producers of commercial software would like people to think.

"End user" being the developer or the manager who actually gets to pick and pay the product?

2:55 AM  
Anonymous said...

People prefer MySQL over PostgreSQL for the same reason people prefer GNOME over KDE: Because the developers are arrogant bastards, and the supporters act like dicks.

5:16 AM  
Jayson Vantuyl said...

In all seriousness, I don't see what the GPL vs BSD comments are about.

Google developed an app on MySQL on Linux. The GPL didn't come into play unless they tweaked and then distributed a modified Linux kernel. Even then, they CAN STILL LEGALLY SELL IT!

The only thing that someone else can't do is improve the source without providing a little quid pro quo. This is not unreasonable, it's a LICENSE choice. For the small developer, it's a damn sensible one as well.

That said, I prefer Postgres for reasons absolutely unassociated with the license. I feel it implements much "purer" SQL than MySQL. I like its administration better. I like how it plans queries better.

As for rolling your own transactions, you should have covered this if you took a CS degree. Between lockless techniques and a simple test_and_set routine, rolling your own locks (and thus everything Atomicity, Concurrency, and Isolation) is really childs play. As for Durability, a log table (or fsync'd file even) is really not that difficult either.

7:15 AM  
Anonymous said...

Ron, I don't see anything at all about "that guy" trying to create safe transactions on top of year 2000 mysql. He says very clearly that he didn't use a database at all, he used files. Because he not using tables and rows accessed through SQL, he can ensure that his updates are completely atomic. You can't do that with year 2000 mysql. There is a very big difference.

And Jayson, I realize its easy to say you can make safe transactions, but its not easy to do. The only way to make it safe is to rely on a log, and if a query fails, take the database down and restore it from a known good point, then replay the log. This is not transactions, its recovery. There is simply no way with year 2000 mysql to run a bunch of queries in a single atomic transaction and guarentee that they either all succeed, or they all do nothing.

10:56 AM  
Arqueaopterix said...

They were right in that MySQL is a toy database. It only got popularity because Slashdot uses it.

However, unless you're using Oracle, PostgreSQL is better than your database.

12:07 PM  
Ron said...

There is simply no way with year 2000 mysql to run a bunch of queries in a single atomic transaction and guarentee that they either all succeed, or they all do nothing.

That is true, but irrelevant. What is important is not to guarantee that they do nothing, but merely to guarantee that whatever they do they do not affect the results of subseqent queries unless they all succeed. That is possible, even with y2k MySQL. Figuring out how is left as an exercise.

11:17 PM  
Rob said...

What I don't understand is why modern huge database applications are written using the database engines that run on top of some universal OS? I mean, what else can be on that server except the database? So the ideal modern database should be the OS, possibly even without the file system as we know it. He hard-drive itself can be structured to suit the database needs so one would never have to rewrite any files, just the segment of the disk that was assigned to the particular data. The improvement is speed and reliability would be enormous.

This exists presently. It is called Microsoft SQL Server (presumably, as usual for MS products, its hooks dig deep beneath the OS into the kernel). It is of necessity a security nightmare, as are all MS products.

You have a team of engineers that are resisting the move to another database system pushed by management.
These engineers are all happy with their current database, which is probably the one they use any chance they get, so they are really in tune with how it works.
The new database has to be used and tuned in a completely different way, requiring its own years of experience to master.
Of course the second solution using the new database is going to be worse, how could it not? It says nothing about the database products tbemselves.


Certainly there are elements of this, but then, if the DB requires an army of expensive high priests to keep it running smoothly, why bother? The design flaw of failing to self-tune becomes a business feature in that the customer now has to sign up for hours and hours of expensive consultant hours. I don't see this as a positive.

6:31 PM  
r_wolfcastle said...

I have worked at 2 companies that rolled their own transaction logic in the application and used MySQL MyISAM tables -- one starting in 1999 and the other starting in 2001. Anyone remember AvantGo? That entire thing was a single MySQL instance with a load balancer and multiple query servers.

At the second job, the one starting in 2001, as jack-of-all-trades I got put in charge of database stuff. We wrote our own layer between the application logic and the database for ease of porting. Our app was essentially 100% database-centric, but we were able to port from MySQL to MS SQL Server in a couple of weeks because of the way we had written the layer. It turned out that none of our customers wanted MS SQL Server (actually, they didn't care what we used as long as it worked and it was cheap), so we let support for that lapse and just stuck with MySQL.

If you are selling a product that is substantially smaller in scope than, say, a comprehensive SAP solution or e-commerce infrastructure for eBay or Amazon, you cannot use Oracle for economic reasons. You can't use the customer's existing Oracle installation for your app, because that installation is already highly-tuned by expensive folks for SAP or CRM or whatever. They won't let you and your app and its processing and bandwidth requirements anywhere near it. So to use Oracle the customer has to have an Oracle instance just for your app, and it is pretty damned difficult to sell a customer an $80,000 product when you have to say, "Oh, and by the way, you'll need to fork over about $30,000 for Oracle and hire another admin to tune and babysit it." That is where open source databases like MySQL really shine.

Finally, someone said that INSERT into MyISAM tables is very fast, and that is true. But there are special circumstances (and they're not that rare) in which you can go even faster using LOAD DATA INFILE. MySQL claims is "up to 20 times faster" (last time I looked) than using MyISAM INSERTs, but which in practice is actually 5-8 times faster in apps I've measured.

3:18 PM  
Anonymous said...

This has just got to be a wind-up, right?

BTW Oracle bought up InnoDB a while back. You stick to your, um, MySQL disaster-in-waiting. I'll stick with PostGres, or Firebird rather than a toy.

6:55 AM  
Tyler said...

Just wanted to chime in and comment on some of the "Oracle is expensive" comments. There are really 4 versions of Oracle to consider:
- Enterprise Edition = $1x
- Standard Edition = $.38x
- Standard Edition One = $.13x
- Express Edition = FREE! (just released this month)

The .38 and .13 numbers are from the current price list.

I use x, since government pricing is typically MUCH lower than what you see on our price list.

Most people have seen the price for EE, but don't know about the other versions. MS has a VERY similar pricing structure.

Full Disclosure: I am an Oracle employee.

6:05 AM  
Anonymous said...

What a sad commentary about how little so many in the IT industry know about data.

I've read that mysql was free, but commercial databases are expensive. Then the author admits that they had to write their own transaction logic. Where did this free labor come from?

I've read that "it's not the end of the world if you mess up a row or two in most databases". The issue is that the results are *unpredictable*. You don't know what will break when you violate your contract between components.

I've read that mysql is a speed demon. Ah, right. With no parallelism, no partitioning capability, inability to properly optimize queries with 4 joins, no memory tuning. Please. The only queries that it runs fast fall into a very small niche of read-only retrievels of highly selective data (<1%) via btree index. That's good - but only covers a subset of possibilites that you should be looking at.

I've read that mysql has all the features you really need. Back in 2000 it didn't support transactions, pk/fk constraints, views, unions, subselects, triggers, stored procedures, etc. And it still doesn't support query parallelism or partitioning. If you're writing database apps without any of these features you might as well stick to MS Access.

I've read that they wrote their own transaction logic (since mysql didn't support transactions well) and it worked fine. Really? How do you really know? How did you empiracly measure data accuracy? Hmmm, you didn't? Really, I wonder what kinds of problems (loss of accuracy of adsense) were caused by data quality problems. Note: I've never encountered a database that didn't have data quality problems, the lack of constraints and transactions almost guarantees their widespread existance - whether or not it was shoved in the face of a programmer.

I've read that conversion from mysql to the other database was tough. As though that was caused by the other database. FYI, mysql is the least ANSI-SQL compatible database in the market (especially so in 2000). Having to roll your own transactions, and getting by with mysql deliberately converting invalid data arbitrarily to work and then going to a standard solution would of course, be a nightmare.

Undeniably, oracle was (and still is) overpriced. Though it isn't tough to get discounts if you know what you're doing. DB2 and Informix are both cheaper and easier to work with, though they weren't VC darlings in 2000. And undeniably, you can write working software with mysql. But there are compatibility problems, data quality problems, and scalability problems. Maybe it's unfair to assume that this is obvious to junior programmers unfamiliar with database technology. Especially given the current hype around mysql.

9:35 PM  
Anonymous said...

I've been using MySQL since 2000 to implement a community website (bicycle touring journals, now about 800 journals and over 60,000 pictures). The site gets somewhere in the region of 20-50,000 page requests (150,000 - 300,000 http reqs) per day. So it's not exactly a super busy site, but it does have quite a few active users. I've had the site up and running 24/7 (with some breaks for moves and hardware failure) for about six years now. Not once have I had any integrity or dataloss problems caused by MySQL. It's been really rock solid. All these people who say that using MySQL for anything significant is a "disaster in waiting" are simply blindly repeating religious dogma. That's what it is - religious. They believe that MySQL is crap, so no matter what other people demonstrate to the contrary (that it, uh, actually works, rather well in fact) they will still continue to twist reality around to suit their existing viewpoint. I don't really understand the outright hatred that exists for MySQL in the minds of some people. It's been working for me for six years now... how much evidence do you need before you accept that it really does work well? And as for the data integrity issues, lack of transactions etc... all I can say is this: Despite all the wailing and gnashing of teeth, the fact is that fixing data problems is really not all that hard. It's very occasional, and actually I haven't even seen any problems that were caused by MySQL itself - the biggest problems were caused by user error (mine, generally). This involves stuff like dropping the wrong table or field, or mistakenly deleting entries wholesale. Sorry, but arguing that "the database should stop you from making that kind of mistake" isn't valid at all. Any database that lets you change data will have the potential for user error, and that's the biggest source of data loss in my experience. Only on a couple of occasions has MySQL itself had a problem - and then it was just a corrupted index file, which was easily fixed. I'll take that kind of rock solid stability any day. Sure, MySQL could be better - I have wished for a while that it was able to utilize more than one index in a query. I think that is starting to happen. I'll be using MySQL for a while yet, I think.

9:23 AM  
Neil Gunton said...

Sorry, for some reason the post preview system reverted my identity on the previous msg to "anonymous" even though I had previously selected "other" along with my home page. Anyway, the community website I was referring to is crazyguyonabike.

9:26 AM  
b7j0c said...

This post has been removed by a blog administrator.

10:11 PM  
Anonymous said...

Rob said:
What I don't understand is why modern huge database applications are written using the database engines that run on top of some universal OS? I mean, what else can be on that server except the database? So the ideal modern database should be the OS, possibly even without the file system as we know it. He hard-drive itself can be structured to suit the database needs so one would never have to rewrite any files, just the segment of the disk that was assigned to the particular data. The improvement is speed and reliability would be enormous.


The ability to create database "files" on raw disk slices/partitions has existed in Oracle for as long as I can remember (at least 13 years). This was a highly recommended way of installing Oracle some years back but modern Oracle documentation now downplays this method in favour of cooked file systems.

7:00 AM  
Aminorex said...

I've been running dozens of websites offering commercial services, all backed by mysql, since 2001. I can attest that it is good enough and fast enough and plenty plenty cheap enough for everything I've ever needed in a database system. It scales on demand, with no incremental cost. Unless you have a very specialized need, anything else should draw the intense ire of your CFO, and maybe a shareholder lawsuit.

4:46 PM  
Anonymous said...

Someone asked where the hatred of MySQL comes from? It started with the... I want to call them lies... that they continually repeated in their presentations, community websites, and documentation; like how transactions are not important, that ACID doesn't really matter, and that being strict about data integrity and features like foreign keys and views were only things that "purists" cared about. And the hatred grew as people watched them implement all these things over the last 5 years.

Sure you can make MySQL work, you can make GOTO work as well, but that doesn't change that 90%+ of MySQL users would be better served by choosing a different database.

12:35 PM  
Paul Reilly said...

At the end of the day, MySQL WAS adopted by Google. OK, so additional transactional development was required but it worked, and more importantly, it worked more efficiently that the commercial counterpart! I see this attitude towards the adoption of commercial software Vs. the adoption of Open Source software.

Here within the corporate culture, where I work, I often see the same pro-commercial attitude. The key point here is that by adopting the commercial alternative, it provides a scape goat surely every corporate top director who answers to shareholders, surely needs to protect himself by having someone to sue in the event of a mission critical failure, a fine motivation to adopt a commercial database.

8:26 AM  
Anonymous said...

A few thoughts about recent posts:

1. Why the hatred towards mysql? I don't think there is hatred, I think a few issues have really raised people's hackles:
- hype vs reality
- history of company for spreading worst-practices (don't use transactions or foreign keys, etc)
As proof of the above - note that you almost never see so many criticisms of Postgresql - a database that is both freer and more standard than mysql.

2. It works great on my hobby website, it should be good enough for everyone: Ah, no. Go do a dozen business critical database projects - and for god's sake, use a few other products besides mysql to get a perspective. *Then* you're in a much better position to talk about mysql vs the rest of the options. In the meanwhile, you really have zero credibility on the subject of databases.

3. I've used it for years on my site and have never seen any data quality problems, therefore that just isn't a big deal: Really? How do you know you don't have data quality problems? Are you explicitly testing for them? Almost nobody does - they take it for granted that things are working right. The fact that they are only looking at a report with 95% of the data instead of 100% is quite difficult for them to know!

4. mysql is good enough for everyone, it's just the out of touch purists that don't like it: the people who say this are typically those with very little experience with databases, perhaps only with mysql. And they are complaining about the people with 10+ years of experience across a half-dozen products. These aren't purists - they're just experienced professionals.

5. I've been running dozens of mysql websites offering commercial services. Anything else is just a specialized application: if you're just supporting e-commerce packages for smallish companies at your isp you've touched on about 1% of what people use databases for. You haven't yet handled large volumes, 100% failover environments, reporting, complex queries, etc. It's great if it worked for you, but you really probably wouldn't notice the data corruption caused by silent data truncation/conversion, orphaned data allowed by lack of fk constraints, etc.

Sigh. The proponents of mysql really need to broaden their perspective by trying out a few other products and developing a few applications that have higher standards (not just performance, but also reliability & quality). Additional recommendation: work with people who have been doing this for a while and who can show you the many ways that data quality problems occur, how they can have unpredictable impacts on systems, and how to best prevent them.

10:36 AM  
Anonymous said...

Sigh. The ignorance and lack of objectivity from commercial hucksters is disheartening. The commercial bigotry persists. The truth is many of those closed minds have their careers tied to a commercial product and have little or no knowledge of alternative products or solutions. The folk pushing commercial products need to ask themselves what features of a commercial product are 'really' needed in the majority of real world applications and STOP preaching vendor product marketing jargon and inventing FUD of open source.

5:53 PM  
Neil Gunton said...

I thought I'd come back here to see what other people had posted since my prior comment. And, of course, there it is... the typical attitude of the PostgreSQL pusher, which can roughly be summarized as this: "You don't know what a database is, even if you think you've been using it for years to do non-trivial tasks, you don't really know what a real database is. There are certain things that any database MUST do, and if it doesn't do them then it is NOT a database and you should run along now and run your little hobby website. Leave the REAL database stuff to the professionals who know what they're doing". Wow! Talk about arrogance, not to mention obnoxious. It really shouldn't surprise anybody if I actually avoid PostgreSQL because of people like this. Also, I do hear from the odd person that actually tried shifting over to something like PostgreSQL that it was *very* slow. That little fact gets conveniently swept under the carpet by the fundamentalists. If you look at any discussion involving MySQL vs PostgreSQL over on slashdot (and it seems that any discussion that has to do with MySQL will bring out the PostgreSQL fanboys), then you'll see comments tucked away in the oblivion of "unmoderated hell" which talk about how difficult or slow PostgreSQL is in reality. I have learned to take the Open Source zealots with a huge pinch of salt these days. What you tend to see are comments that just keep repeating the same thing over and over - PostgreSQL is the only thing anybody should be using (even though it's actually still really slow)... or Mozilla/Firefox is the best thing since sliced bread (even though it was basically unusable on my old 450 MHz workstation, it was so slow, and moreover has had major bugs unfixed in the rendering engine for literally years)... seems like the whole Open Source crowd has a tendency toward the religious in terms of closed thinking. Something to do with a mixture of "us vs them" tribal attitude, with an added dose of dogma and hatred of anything that threatens their cosy little world view. You'll notice that these people won't acknowledge that MySQL is pretty good for many applications (the majority of database applications, actually). In fact, many of them won't even acknowledge that MySQL deserves to be called an RDBMS. Any post praising or supporting MySQL is rebuffed with a link to a "Gotchas" page listing irrelevant or out of date bugs or anomalies. Any post defending MySQL is picked apart, point by point - you can tell a zealot when they think you are *totally* wrong on *every* point, each of which has to be individually addressed, simply to show you just how *wrong* you are. No middle ground, no no no. Just "mysql totally sucks, PostgreSQL is reasonable and sane, anybody who says otherwise is delusional". Whatever. I'll just keep developing my "hobby site" quite happily, scaling it up as needed using MySQL. It works, thanks MySQL! Oh, and for the record: I'm sure PostgreSQL works just fine too. I'm sure you can tune it just so, so that it is ok speedwise. But a note for the PG fanboys: You don't win people over with overbearing, arrogant posts that say anybody who thinks that you can do quite a lot with MySQL is somehow less professional or just a hobbyist. I'm sorry to go on, but this kinda gets on my nerves after a while. It's like swatting annoying flies - every time someone mentions MySQL, these fanatics show up claiming that it can't possibly be used for anything. To which I reply: GET A LIFE MATE!
Thanks.

10:41 PM  
GoodBytes said...

I am coming a little late in the debate...

I wanted to congratulate you on this wonderful article and two excellent points:

"with free software, you get more than what you pay for", and

"from the end user's point of view free software is often much better than the producers of commercial software would like people to think".

I have used both MySQL and PostgreSQL, and have good opinion about them both. They have their good and bad sides, but you get to learn them in time and find workarounds. What's best, they are free and they are constantly improving with new versions.

Also, it is good to mention that not everyone needs a database for heavy-lifting applications. For simpler tasks, having one of these databases is priceless.

Thanks.

3:28 PM  
Anonymous said...

I think someone made a comment a bit like that one. I've used both MySQL and Oracle and I must say that i'd rather work with an Oracle db that with MySQL. It has many advantages like having the "data integrity logic" directly in the database. This is priceless when two or more apps are accessing/writing to the database.

But, i would never use Oracle for an high number of concurrent sessions. By high number I mean as low as 1000. IMHO, Oracle for a complex and/or high data volume database and where transactions matters; MySQL for a simple but heavely queried database. Like the frontend-backend thing.

I still believe that you can achieve the same performance with Oracle or with MySQL. But with Oracle you need the knowledge and the resources to do so. And why would you need to put your hand in your pocket, MySQL is there to do just that (the performance thing). But I fear that while on the road of adding more and more "basic" features, MySQL will release a light version of itself.

js

8:22 PM  
uri said...

Just wondering, when you talk about the "AdWords program" is this the one external customers work with, or something used for internal processing of ads etc.?

Thanks for the insightful story :)

6:06 AM  
Andrei Lopatenko said...

Hard to believe.
I certainly agree that MySQL may outperform untuned Oracle (sorry, I talk ONLY about database I am truly experienced with fort many years) for transaction-free database under 10 million records. I believe that AdWords, AdSense needs much large database to operate.
Tuning of Oracle to real environment which depends on both hardware and software - RA memory, HDD, network, operating system, RAD is hard and time consuming but results maybe surpising. Performanse may increase scores times comparing to "default" untuned configuration. Proper programming like more functionality to PL/SQL then to Python may improve functionality up to 10 times in data intensive applications even for plain select statements without any database update functionality.
I do understand that select processing is 99.9999% of time of functionality required from Google AdSense database, but remaining 0.0001% is not less important for such applications.
Besides performance issues -- what is about fault recovery which includes transaction loging, backup and restore. If hardware and software fails, could Google restore it to the point just before fail or some users, customers, will loss their money, settngs etc
What is about updates? MySQL is pretty efficient for SELECT, but not for updates especially for updates of indexed tables. How many update transactions (new AdWord customers? users? settings changes? clicks?) per day ar eperformed? Can MySQL handle them? If it can handle them will it handle in one year? Are updates done in parallel?

Again, let mt emphasize a point about perfortmance. It is hard to believe that good Oracle engineer can not tune Oracle to be faster then MySQL in case if you have a database more then 10 million tuples.

I am not again Open source and I do not sell Oracle Database :). I like Open Source, but my around 12-year experience proves that for large databases of mission critical data Open Source databases can not compete with Oracle.

I do not say anything about MS SQL or IBM DB2 for only the reason that I do not know how to tune them or how efficient they are for large databases.

12:04 PM  
Andrei Lopatenko said...

But, i would never use Oracle for an high number of concurrent sessions. By high number I mean as low as 1000.

What do you mean "concurrent sessions"?
User connections? You definitly do not need to use 1000 connection to run applications like AdSense
1000 parallel. My experience - Oracle is pretty efficient for parallel queries, especially if you try to apply cluster architecturw

about PostgresQL. It is reaLLy slow

12:08 PM  
Anonymous said...

"To elaborate: some transaction benefits can be simulated "easily," such as logging each update to make sure Bad Things don't happen when you, say, charge a customer and the db goes down before the ad gets inserted. Although if single updates aren't even atomic -- and if your db doesn't support transactions, they probably aren't, which is the case for MySQL's in non-innodb tables -- you're still in trouble if you're running something like "update users set balance = balance - delta" when it goes down.

The harder ones, though, such as isolation, I've never seen done outside the database, and I wouldn't trust the code of anyone who claimed to do it. Much safer to use a database that's already debugged their code many times over."

....

It's all code that makes it all work in the database. If the base goes out at the wrong time, no matter how many hundreds of safeguards happen, you're going to have a loss of data. All this advanced junk to prevent it is pretty meaningless, because all it does is move the point where a failure will cause a loss to a different spot in the program.

9:32 PM  
Administrator said...

Thanks for your blog...
http://best-casino-online.t35.com
Best Online Casino

11:15 AM  
Anonymous said...

Hmm,

After reading the mixed opinions, and very stern beliefs I found it rather difficult to not post a response.
All in all the artical touts the success of Mysql Over a commercial database. This onto itself is a benchmark
for success. I have been involved in a large number of commercial projects, including one of the worlds largest
databases which happens to run on Mysql "EdgeHealth".

I have worked side by side with developers and dba's who have scratched their head at how well Mysql performs.
Oracle, DB2, Postgres all seem to perform rather well until you compair them to Mysql. Although I do often enjoy
the ability to extend postges with perl modules, and create objects for oracle in java.. The speed and reliability of
Mysql cannot be denyed.

Consider the following, Cox, Google, IBM, Sun, Oracle, Microsoft, and a hoard of other companies deploy
Mysql as thier database of choice for web and e-commerce applications, it speaks volumes when a vendor
does not trust thier own products for production enviorments.

Mysql Postgres and Oracle will be around for years to come because they are good products. As for which database is the best..
thats just a matter of oppion. For the moment Mysql is the king of spead, and step child of features. My experiance says of the
3 Mysql wins out in deployment for the reasons, 1 performance, 2 scalability, 3 cost.

Don't forget products evovlve or die, so if all 3 of these products are still around in the next couple of years, I would expect them
to evolve in the dirrection of what they don't have now.

Oracle and Postgres simply lack through put, Mysql lacks external scripting langauges.
All of these products offer transactions, clustering, and high avalibility.

If Mysql ever adopts external scripting languages, will it still keep its speed? Only time will tell.

2:04 PM  
Anonymous said...

Re: Database as OS

You are absolutely correct, the database server typically runs nothing else, and using the file system to store tables rather than a raw partition is inappropriate. The file system is tuned for small files; in the 1-8kB range. That's just what Unix normally holds.

OTOH, by keeping the databases as normal files, backup and manipulation don't require new tools; you just use mv and cp, like you would any other file (with the database turned off, natch).

5:58 PM  
Anonymous said...

Speaking of SQL, I heard that no database actually implements the full feature set that was defined back in the 1970s.

I personally tried to write a python layer that implemented views for MySQL. It was quite a bit of work and I never finished it.

Does anyone else have their sense of aesthetics offended when they see SQL embedded in another language? Reminds me of inline assembler or something. There has to be a better way. I tried to invent an OO layer over MySQL in the views project, but it was very ambitious.

7:04 PM  
Anonymous said...

"Real" simply doesn't mean expensive in the database case. PostgreSQL and Firebird are both just as free as MySQL, but are actually robust and reliable.

7:42 AM  

Post a Comment

<< Home