December 24, 2009

Sphinx and Gearman: A Distributed Computing AH-HA! Moment

A week ago I decided to finally get serious about putting gearman to use for search indexing. I had been batting the idea around in my head for a long time (too long, really) but figured I should just write the code and see what happens. It took less than a day to get a prototype working in our development environment, but the end result made me very happy.

Today, in our production deployment, when a sphinx cluster pulls new content to index, the master does all the work. It fetches the new and changed postings, massages them into the XML format that sphinx expects (and makes a lot of small changes along the way), invokes the indexer, and makes the new indexes available for the slaves. The second step is usually the most CPU intensive. Processing the raw data into XML involves a lot of other tweaks and changes that are very specific to Criagslist.

What I did was turn that into a gearman client/worker pair. The client (or master) simply submits processing tasks and then waits for each of them to complete. The workers fetch the data from the master, transform it, and make the transformed data available. When each task completes, the master grabs the transformed data an informs the worker that it can delete the file.

So instead of being stuck at using only the 4 CPU cores on a single box, I can run 4 workers on each of 3 machines and get 12 CPU cores involved. The end result is that I have a solid foundation for a system that can easily scale to many machines. AH-HA! Linear scaling rocks! So does relatively seamless distributed computing.

As time allows I'll have to work on deploying this in production.

Posted by jzawodn at 10:02 AM

December 23, 2009

On the MyBlogLog Shutdown

Marshall Kirkpatrick is reporting that Yahoo! will shut down MyBlogLog next year. Well, color me unsurprised. The service has languished for years. I removed it from my site a long time ago. It made me a little sad to do so, but it was just slowing things down and not really "adding value" as they like to say.

It's sad because I was involved in the MyBlogLog acquisition at Yahoo! and believed in what they were doing. I worked to help get the team on board, nag the right people to make sure they got reasonable hardware on which they could grow, interviewed their first post-Yahoo engineer, and made the trek up to the Berkeley office a few times a week to help them transition into Yahoo and work on plans.

I genuinely had high hopes for what MyBlogLog could do both inside and outside of Yahoo. But as I wrote in Watching Yahoo's Transformation:

MyBlogLog has all but died on the vine, right? Is there anyone left of the original team of 5 or 6 engineers still working on it? No, I think it fell victim to Yahoo's larger social strategy. FAIL.

On the one hand, it's sad that our collective time was wasted, but the members of the MyBlogLog team have all gone on to bigger and better things outside of Yahoo. And I suspect everyone involved learned some important lessons along the way.

Posted by jzawodn at 07:34 AM

December 14, 2009

Trust Oracle? Why?

In a 10-point press release issued today Oracle has listed a series of "commitments" regarding their acquisition of MySQL by way of acquiring Sun.

I am not impressed.

As a former employee of a large Internet company (the largest at the time, in fact) that used both Oracle and MySQL, I'm utterly puzzled by this. I can't think of why we should trust Oracle to do right by the users of MySQL--especially the non-paying users.

You see, for years Oracle worked agressively behind the scenes to discredit MySQL and tried hard to understand how their customers could ever consider using such a "toy" instead of their flagship product. In fact, it was so important to Oracle that they offered some very substantial discounts to customers who were using MySQL and Oracle. In some cases the discounts were so impressive that their motivation was clear: cut off the opportunity for MySQL to grow and spread in such organizations. (Remember what happened to Netscape when Microsoft gave away Internet Explorer for free?)

The funny thing is that it really didn't work. MySQL was already a fast moving train with lots of momentum. And it was still accelerating.

It was clear that Oracle saw MySQL as a threat to their business. When they eventually bought Innobase (the company that makes the InnoDB storage engine), many of us got more than a bit nervious. That put Oracle in a position of having a choke hold on the one componenet that was critical to MySQL's future success. They could have just shut down development entirely. But that may have made their motives a bit too clear.

Since then they've continued to develop InnoDB. However, the pace hasn't exactly been agressive and their openness around that has left me (and others) wondering what their longer term plans really are. The few tidbits we get seem to be overly vague. Could they have been throttling development of InnoDB? Or not providing the same resources that MySQL (and now Sun) would have? It's hard to say.

But here's the thing that continues to bug me...

Back a few years ago when Oracle dismissing MySQL in public while working hard against it in private, I realized that they were simply trying everything they could to protect their crowned jewels: public denials and classic FUD paired with hush-hugh backroom deals.

Nobody has managed to explain, in even a mildly convincing way, what has changed since then. Why should we suddenly trust Oracle? Their crowned jewels are still threatened by MySQL.

Convince me.

See Also: Monty's appeal is selfless!

Posted by jzawodn at 06:27 PM

December 13, 2009

Weird Email People Send Me

I really have no idea what prompted this:

This is Rev.Willson and am interested in some of your Barrel. I need the Barrel in the specifications of 55 Gallon Barrel. Blue Plastic.So what will the total cost of the Barrel with the dimensions i gave and i need 100 quantities of the Barrel.So i will like you to go ahead and quote me the total charges of 100 quantities of the Barrel + tax without shipping charges included.Because i will be picking this up as soon as it is ready from your location.Please if you don't have the type of barrel that am interesting kindly email me back with the types that you have.And i will be very glad if you can also email me back with the types of credit card that you will accept for payment. Please advice. Thank You.

It's so random that it's funny.

Posted by jzawodn at 07:28 PM