Try our RSS feed

Unix Review > Archives > 2006 > December 2006
Print-Friendly Version

UnixReview.com
December 2006

Book Review: SQL Hacks

Reviewed by Cameron Laird

SQL Hacks
Andrew Cumming and Gordon Russell
O'Reilly, November 2006 
ISBN 10: 0-596-52799-3
410 pages

So there you are, alone on a desert island, and all you have is a generic SQL engine. Is there a declarative way to define the greater of two expressions?

Yes, in fact. Notice we're not talking about the MAX() function; that acts through the values of a column (in relational database management system vernacular). We want to compute, for example, the greater of the two fields SECURITY_SCORE and PERFORMANCE_SCORE, for each row of some table.

You can do that. If you're a grizzled SQL coder, or you look at #30 of SQL Hacks (SH), you know to compute:

	 (SECURITY_SCORE + PERFORMANCE_SCORE +
	     ABS(SECURITY_SCORE - PERFORMANCE_SCORE) ) / 2
The nature of hacking

The joke doesn't stop there, either. Authors Andrew Cumming and Gordon Russell show an entirely different approach, based on an auxiliary table, that generalizes nicely to maxima over many columns.

There's more to this cleverness, though, than the intellectual equivalent of fixing a model train by cannibalizing common kitchen utensils, however much such hacks entertain me. Suppose the overnight financial report that your company currently takes seven hours to generate could be finished in just under six minutes; that is the kind of change that gets the attention of people with checkbooks, wins Certificates of Appreciation, is described in magazine articles with $ figures, and generally elevates the perpetrator from the coding bullpen to ... well, to a happier place. Databases are big, serious matters, with crews of handlers, elaborate maintenance procedures, executive visibility, and significant attention available to those who can squeeze a little more performance and efficiency from them. Good SQL hackers don't just get a little more performance; they get a LOT more.

Here's the punch line: many of the important improvements that boost database operation by orders of magnitude depend crucially on superficially cute little tricks like the one that headed this review. That means that SH is simultaneously a programming romp and deadly serious to the people who think about a million customers or sales or measurements at a time.

Cumming and Russell meet the challenge of balancing these two aspects — the exuberant and pragmatic — well.

RDBMS programming architecture

Why is it so characteristic of database programming that different programmers achieve the same functionality with run-time speeds that differ by such large factors — a hundred times or more? As I've discussed several times before for UnixReview, most recently in reviewing Refactoring Databases, data management applications typically involve several programming models simultaneously. Common Web sites are built with database-backed PHP, say, and perform the few computations they need — simple sums and ratios, mostly — in PHP. When the same architecture is applied to problems like reporting on aggregations of all worldwide daily sales of a restaurant chain, though, the results are hilariously suboptimal. As the authors advise in their Preface, "With a bit more understanding of SQL you can do more processing at the database, and as a result have less traffic between the database and your application. ... It's all about letting your program do the things it's good at, and letting the database do the things it's good at."

That's a worthy goal, and SH meets it. It's a three-hundred page book, organized as one hundred annotated "hacks". They're all eminently practical, and I think even the most jaded SQLer will find at least a few novel ones. The authors are the good kind of full-time academics: rigorous but engaging, and sensitive to the importance of clarity in explaining such basics as command-line operations, performance concepts, security practices, and accounting calculations. It's evident they have plenty of real-life computing experience themselves. Examples are all tersely simple but appropriate.

If you were still on that desert island, you could even use SH as a tutorial, especially if, like most database programmers, you do more reading than writing. SH is not a reference, but it's methodical enough that an alert and experienced programmer with no prior knowledge of SQL could probably learn all of the language he needs for most purposes. They even tackle the preliminaries of auditing, deployment, and administration, topics some programming introductions leave out.

The book targets Jet, PostGreSQL, MySQL, Oracle, and SQL Server, but not DB2, SQLite, Sybase, or Firebird (among others). Users of the latter engines shouldn't feel discouraged; less than 15% of the content, by the most generous interpretation, is specific to a particular database server. A few of the tips, though, have bonuses for programmers working with the market-leading engines.

I've found a few typographical errors, mostly with a single character out of place or mistaken. SH reads easily; while the authors claim they tackle only simple or occasionally intermediate-level topics, I think they simply write with sufficient care and precision to make any SQL subject understandable. Each hack is tagged with a difficulty icon; I find a couple of their judgments bizarre (are ordinal reports based on correlated subqueries truly harder than calendar arithmetic or protection against runaway queries or unexpected redos?). The main point, though, is that SH reaches plenty of noteworthy peaks: optimistic locking, "Choose Any Three of Five", disk-space monitoring, a humane treatment of navigation features appropriate to Web applications, cooperation between SQL and Excel, and more. Most unexpected to me was #42, which delightfully demonstrates SVG generated from SQL.

Unless you're a very unusual programmer, a significant portion of your daily work involves conversations with database management systems. Unless you're even more unusual, you'll find at least a few SQL tips in SH you hadn't generated on your own.

Cameron is vice president of the Phaseit, Inc., consultancy, specializing in high-reliability and high-performance applications managed by high-level languages. His experience with commercial data management systems extends embarrassingly far into the past.

Sys Admin Spotlight

New Products
April 2007 New Products

CMP DevNet Spotlight

Regular Expressions: Two Easy Steps Better Than One Hard One
Complicated regular expressions and yacc are powerful parsing tools, but they can cause trouble in inexperienced hands. One helpful alternative is "partial evaluation" (PE) or "active data" parsing

In the News

Nokia Signs On To Help With Sprint Nextel's WiMax Deployment
Working with Sprint, Nokia should have WiMax up and running in four Texas cities during the first half of 2008.


Price Drops Boost LCD-TV Sales
Analyst firm iSuppli suggests that LCD-TVs sales will exceed 100 million units in 2008, and reach 171.6 million unites by 2011.


Sony PlayStation Web Traffic Down; Nintendo Wii Soars
Nielsen/NetRatings suggests price and a focus on average customers is helping Nintendo's numbers.


SanDisk Launches PC Solid-State Drive
The hard drive maker's expansion of its product line, as well as Intel's first SSD launch, is an indication that the market is sure to heat up.


Hitachi-LG First With Blu-Ray/HD DVD Combo PC Drive
TThe drive can record up to 50 Gbytes of data and will cost about $1,200.


Wal-Mart Asks Electronics Suppliers To Go Green
The retailer's Packaging Sustainability Network has set a target of 5% reduction in packaging by 2013.


Intel Gets Approval To Build $2.5 Billion China Fabrication Plant
Intel gets OK for $2.5 billion China fab


Newsletter

Subscribe to the UnixReview Newsletter

Subscribe to Sys Admin

Subscribe to THE journal for UNIX systems administrators. Receive 45% off your subscription by following the link below:

CD-ROM

Sys Admin and The Perl Journal CD-ROM version 11.0

Version 11.0 delivers every issue of Sys Admin from 1992 through 2005 and every issue of The Perl Journal from 1996-2002 in one convenient CD-ROM!

Order now!




MarketPlace

UNIX and Linux Performance Tuning Simplified
SarCheck is a UNIX performance analysis and tuning tool for most UNIX and Linux systems. It produces recommendations and explanations, complete with supporting graphs and tables. Get the most from your hardware by keeping your systems tuned.

Used IBM pSeries Servers
Did you know that clients worldwide are making their budgets work by purchasing refurbished and used IBM pSeries servers and parts at up to 80% off list pricing? Many of the latest models and features are available now used at Vibrant Technologies!

Timesheet + time tracking for payroll and projects
Clockware is the first timesheet and time tracking software that is 100% J2EE-compliant. Clockware's Payroll Timesheet integrates with all major Payroll systems. Clockware also supports Time and Attendance, and Project Timesheets in one system.

We Buy & Sell Used Cisco
Hula Networks is overstocked on many items including, used Cisco, Juniper, Foundry and Extreme networking equipment and can therefore offer outstanding pricing. We buy Cisco and sell Cisco networking equipment

Wanna see your ad here?