TDD, Straw Men, and Rhetoric

Posted on 2014-04-30

In a blog post called Slow database test fallacy, David Heinemeier Hansson, the creator of Rails, begins:

The classical definition of a unit test in TDD lore is one that doesn't touch the database.

First, you can immediately tell that this piece of writing is going to be heavily rhetorical. He refers to "TDD lore" as opposed to, say, "TDD practice". By using the word "lore", he positions it as subjective, unreliable, mythological.

Second, that sentence is false. Isolation from the database, or anything else, is generally done with mocks, but mocks didn't even exist when TDD was rediscovered by Kent Beck in 1994-1995. They were introduced at XP2000 in a paper called Endo-Testing: Unit Testing with Mock Objects, and it took a long time after that for them to gain popularity. Their role in software development was still being fleshed out in 2004 when Mock Roles, Not Objects was published.

Classical TDD does not involve mocking or other forms of synthetic isolation by definition. We even use the term "classical TDD" to mean "TDD without isolation".

David used the word "classical" not because it's correct, but because it implies "old". This is the beginning of a series of rhetorical techniques that he uses to incorrectly associate isolated unit testing with "oldness". He continues:

Connecting to external services like that would be too slow to get the feedback cycle you need. That was probably true in 1997 when you were connecting to a mainframe database or some Oracle abomination.

In 1997, TDD was only known to a small number of people. Mocks did not exist. Certainly no one was isolating unit tests. Why would David invoke that year? It's a disingenuous rhetorical technique: by implying that a modern movement actually occurred far in the past, he positions it as outdated, thereby discrediting it.

The reality is that interest in sub-millisecond tests, and spending a lot of effort to get them, is fairly new. In 2011, when I published the Fast Tests With and Without Rails screencast, almost no one was talking about doing it in Rails. At that point, if my memory serves me, only Corey Haines and I were talking about it loudly in public. The history that David implies is completely false.

I'll stop quoting in a moment, but we need one more. He now repeats a claim from another post: that TDD leads to worse design. He extends that idea, tying it to test speed:

Inflicting such damage may well had been worth it back in the old days when a full suite of tests hitting the database would have taken hours to run. It so much certainly is not when we're talking about a few minutes.

Again, you see reinforcement of the falsehood with "the old days": that was "then", but today is "now"! However, with this passage, you finally get to see what's really going on. David's tests run in a few minutes, and he's fine with that.

I'm not fine with that. A lot of other people are not fine with that.

This is the fundamental issue. It's possible that I'm the most impatient programmer on earth; I want my feedback to be so fast that I can't think before it shows up. If I can think, then I'll sometimes lose attention, and I don't want to lose attention.

I aim for test feedback in 300 ms, from the time I press enter to the time I have a result. That's not an abstract desire; I frequently achieve it. Here are the tests for Destroy All Software's catalog class:

$ time rspec spec/lib/catalog_spec.rb

Finished in 0.00723 seconds
8 examples, 0 failures, 2 pending
0.24s elapsed

That's 105 lines of test covering 59 lines of production code running in 240 milliseconds end to end, with only 7 of those milliseconds being actual tests. The test:code ratio here is slightly higher than DAS' overall ratio, which is about 1.4:1, including both the unit and acceptance test suites. Most of those tests are fully isolated, meaning that they only interact with a single class. Some tests use mocks to achieve that; many don't need to use them.

These tests are fast enough that I can hit enter (my test-running keystroke) and have a response before I have time to think. It means that the flow of my thoughts never breaks. If you've watched Destroy All Software screencasts, you know that I'll sometimes run tests ten times per minute. All of my screencasts are recorded live after doing many takes to smooth the presentation out, so you're seeing my actual speed, not the result of editing.

Walking through a part of my TDD process will make this whole process more concrete. I first write a new test and run it by hitting enter. This runs only the current test file. It also sets that file as "active", meaning that hitting enter from anywhere in the production code will re-run it.

The tests run in a tmux pane to the right of my editor, so they stick around even while I'm editing code. I expect them to fail, since I just wrote a new hopefully-failing test. A quarter of a second after I hit enter, the tests have finished running. I flick my eyes to the output for less than a second; I know where the exception name will appear in the traceback. Usually, I'll see the exception name that I expect. Because the tests run so fast, I literally do not have a chance to do anything before they finish, so I can't get distracted. By the time my eyes have moved to the test pane, the output is there.

While my eyes were moving to the test pane to confirm that the correct exception was raised, my fingers were switching to the production code file. I make whatever change will cause the tests to pass. It's usually a small change, which is the nature of TDD when done well. I then kick off another test by hitting enter. This runs the same test file as before because it was implicitly set as the active test. I expect it to pass, and it does, but I only know this because I see green in my peripheral vision; there's no reason to actually focus my eyes on the test output.

I've seen my tests fail, made a production code change, and seen the tests pass. A few seconds have elapsed since I ran the first failing test. You've spent an order of magnitude more time reading these few paragraphs about it. Unlike David, I'm not exaggerating here for rhetorical purposes: I'm literally talking about a single-digit number of seconds between running the failing test and seeing it pass.

My TDD loop isn't always that tight, of course, but most of the second-to-second steps in building a software system are mundane and don't require careful thought. If there's a design question, I'll stop and think, possibly aided by pen and paper. If the tests fail in an unexpected way, I'll stop and analyze the failure. For many tests, I don't have to do either of these.

Staying under the 300 ms mark necessarily requires isolation. I don't have time to load Rails, or to load fixtures, or to connect to databases. The test runtime is dominated by RSpec startup, not the application. In fact, many of these tests load only a single source file.

That file is going to be a plain old Ruby class: not a Rails model, controller, etc. Isolating those would lead to pain, as anyone who's tried to do it knows. I keep models very simple and allow them to integrate with the database; I keep controllers very simple and generally don't unit test them at all, although I do integration test them.

When I'm at full speed, as I was in that story, I'm typing at 120 words per minute. That's 10 keystrokes per second, with most of those keystrokes being Vim commands. I don't leave home row; I don't touch a mouse; I don't stop and read; I don't wait for tests to run. This process allows me to do the trivial parts of programming as quickly as physically possible. Some people don't believe this, but there are 90 screencasts in the Destroy All Software catalog that document it.

David misses all of this. He's never done it. It takes a lot of practice with TDD and a seriously fast editor that you can use with your eyes closed, but he has neither. (Again, no rhetorical exaggeration: when I'm at full speed, I sometimes close my eyes because removing visual stimulus aids my concentration.)

Unsurprisingly, there are other parts of David's post that I disagree with. Most notably, application preloaders like Spring are band-aids that introduce tremendous complexity. They cause confusing failures that are fundamental to their nature. Not to mention his straw man attacks on the design feedback of TDD, and especially isolated TDD. I've focused on test speed here because it provides a nice microcosm of our disagreement and of his tactics. He hasn't actually done the thing that he's decrying, and he's arguing against it by making things up.

He's been at this for a couple of weeks: if you watch his RailsConf keynote, you'll hear him say that he has very little theoretical computer science knowledge. Then he pooh-poohs the value of that same theoretical knowledge: knowledge which he just said that he doesn't have! It's normal human behavior to form opinions before understanding something. I certainly wouldn't want to deny anyone their human fallibility. However, to stand in front of thousands of eager followers that will believe anything you say, and to proclaim radical, outlandish positions, using made-up history as justification, on topics that you admit to not understanding... that just seems bad.

Finally, I should say that despite being one of the early proponents of sub-millisecond tests in Rails, I don't have any kind of capital-B Belief in isolation, in mocking, or in TDD. I use TDD perhaps 75% percent of the time on web apps, and probably less than 50% of the time on console tools like Selecta. I suspect that test doubles, including mocks, are a stop-gap solution: an intermediate evolutionary form. I've talked about this in conference talks and in screencasts. Both of them contain a lot of code. The screencast is a tour through an actual software system designed using the scheme in question. Selecta is also designed using the same principles (although its design is imperfect in many ways, as designs tend to be).

The straw man that David has been propagating may apply to people who picked up TDD a year ago, and by whom it's viewed as a silver bullet that should be applied in an extreme way. I was one of those people around 2006, so I know that they exist. There may be other, more experienced people who talk about it in an extreme way for rhetorical purposes. (I don't know; I don't read intro-to-TDD material, having done it for almost ten years now.)

If (if!) those things are true, they say nothing about TDD as a technical practice. Despite writing prose so dripping with rhetoric, David doesn't seem to separate the rhetorical presentation of a topic from the facts of its use.

TDD is useful and test isolation is useful, but they both involve making trade-offs. Unfortunately, doing them 100% of the time seems to be the best way to learn what those trade-offs are, and that can temporarily lead beginners toward extremism. TDD and isolation both break down in some situations, and learning to detect those situations in advance takes a lot of time. This is true of advanced techniques in any discipline, programming or otherwise. That is the honest, non-exaggerated, no-lies-involved truth.