The Dangers of Benchmarks

Jack and Jill run a race. They’re both the same age, wearing the same track shoes and other similar running attire. The weather is favorable — a pleasant, windless summer afternoon, and they’re running on that fancy red rubberized track. Jill soundly beats Jack by a margin of about half a second. Therefore, girls are faster than boys.

Obviously there’s a problem here. The race was as fair and controlled as it could be, and the results indisputable. The conclusion drawn from those results, however, is obviously bogus. It seems painfully apparent in this case, but strangely when you change “Jack” and “Jill” to “C++” and “C#” (or any two arbitrary languages), people happily accept the drawn conclusions even if they are just as bogus as the conclusion about the superiority of the female form on the racetrack.

Let’s make one thing clear, here: performance benchmarking actual languages is not possible. Languages themselves are little more than a grammar and some rules for various things all nicely wrapped up in a thick document that comprises the standard for that language. They don’t have an intrinsic speed, except in the form of possible complexity and efficiency guarantees for algorithms in the language’s standard library. When people (including myself) discuss the relative performance of languages, the topic at hand is really particular implementations of those languages being applied to particular solutions to particular problems within a particular context. In most cases this distinction is implicitly understood by all parties involved, but it’s worth mentioning here because, well, you never know.

Keep in mind as well that performance is very fickle, influenced by a vast array of subtle factors. It is therefore very difficult, and in some cases downright impossible, to obt