Book Review: SQL Hacks
Andrew Cumming and Gordon Russell
O'Reilly, November 2006
ISBN 10: 0-596-52799-3
So there you are, alone on a desert island, and all you
have is a generic SQL engine. Is there a declarative way
to define the greater of two expressions?
Yes, in fact. Notice we're not talking about the MAX()
function; that acts through the values of a column (in
relational database management system vernacular). We
want to compute, for example, the greater of the two
fields SECURITY_SCORE and PERFORMANCE_SCORE, for each
row of some table.
You can do that. If you're a grizzled SQL coder, or
you look at #30 of SQL Hacks (SH), you know to compute:
(SECURITY_SCORE + PERFORMANCE_SCORE +
ABS(SECURITY_SCORE - PERFORMANCE_SCORE) ) / 2
The nature of hacking
The joke doesn't stop there, either. Authors Andrew Cumming and Gordon Russell show an entirely different approach, based on an auxiliary
table, that generalizes nicely to maxima over many columns.
There's more to this cleverness, though, than the intellectual
equivalent of fixing a model train by cannibalizing common
kitchen utensils, however much such hacks entertain me. Suppose
the overnight financial report that your company currently takes
seven hours to generate could be finished in just under six
minutes; that is the kind of change that gets the
attention of people with checkbooks, wins Certificates of
Appreciation, is described in magazine articles with $ figures,
and generally elevates the perpetrator from
the coding bullpen to ... well, to a happier place. Databases
are big, serious matters, with crews of handlers, elaborate
maintenance procedures, executive visibility, and significant
attention available to those who can squeeze a little more
performance and efficiency from them. Good SQL hackers don't
just get a little more performance; they get a LOT more.
Here's the punch line: many of the important improvements
that boost database operation by orders of magnitude depend
crucially on superficially cute little tricks like the one
that headed this review. That means that SH is
simultaneously a programming romp and deadly serious to the
people who think about a million customers or sales or
measurements at a time.
Cumming and Russell meet the challenge of balancing these
two aspects — the exuberant and pragmatic — well.
RDBMS programming architecture
Why is it so characteristic of database programming that
different programmers achieve the same functionality with
run-time speeds that differ by such large factors — a hundred
times or more? As I've discussed several times before for
UnixReview, most recently in reviewing Refactoring Databases,
data management applications typically involve several
programming models simultaneously. Common Web sites are
built with database-backed PHP, say, and perform the few
computations they need — simple sums and ratios, mostly — in
PHP. When the same architecture is applied to problems
like reporting on aggregations of all worldwide daily sales
of a restaurant chain, though, the results are hilariously
suboptimal. As the authors advise in their Preface, "With a
bit more understanding of SQL you can do more processing at
the database, and as a result have less traffic between the
database and your application. ... It's all about letting
your program do the things it's good at, and letting the
database do the things it's good at."
That's a worthy goal, and SH meets it. It's a
three-hundred page book, organized as one hundred annotated
"hacks". They're all eminently practical, and I think even
the most jaded SQLer will find at least a few novel ones. The authors are the good kind of full-time academics:
rigorous but engaging, and sensitive to the importance of
clarity in explaining such basics as command-line operations,
performance concepts, security practices, and accounting
calculations. It's evident
they have plenty of real-life computing experience themselves.
Examples are all tersely simple but appropriate.
If you were still on that desert island, you could even use SH as a tutorial, especially if, like most database
programmers, you do more reading than writing. SH is not a reference, but it's methodical enough that an
alert and experienced programmer with no prior knowledge of SQL
could probably learn all of the language he needs for most
purposes. They even tackle the preliminaries of auditing,
deployment, and administration, topics some programming introductions
The book targets Jet, PostGreSQL, MySQL, Oracle, and SQL
Server, but not DB2, SQLite, Sybase, or Firebird (among others). Users of the latter
engines shouldn't feel discouraged; less than 15% of the content,
by the most generous interpretation, is specific to a particular
database server. A few of the tips, though, have bonuses for
programmers working with the market-leading engines.
I've found a few typographical errors, mostly with a single character
out of place or mistaken. SH reads easily; while the
authors claim they tackle only simple or occasionally
intermediate-level topics, I think they simply write with
sufficient care and precision to make any SQL subject understandable. Each hack is tagged with a difficulty icon; I find
a couple of their judgments bizarre (are ordinal reports
based on correlated subqueries truly harder than calendar
arithmetic or protection
against runaway queries or unexpected redos?). The main point,
though, is that SH reaches plenty of noteworthy
peaks: optimistic locking, "Choose Any Three of Five", disk-space
monitoring, a humane treatment of navigation features
appropriate to Web applications, cooperation between SQL and
Excel, and more. Most unexpected to me was #42, which
delightfully demonstrates SVG generated from SQL.
Unless you're a very unusual programmer, a significant portion
of your daily work involves conversations with database
management systems. Unless you're even more unusual, you'll
find at least a few SQL tips in SH you hadn't generated on your own.
Cameron is vice president of the
Phaseit, Inc., consultancy, specializing in high-reliability and high-performance
applications managed by high-level languages.
His experience with commercial data management systems
extends embarrassingly far into the past.