Duct tape context: A tale of two rewrites

Last Thursday Joel Spolsky posted an article on his blog, The Duct Tape Programmer, based on my interview with Jamie Zawinski in Coders at Work. It—as Joel’s posts often do—sparked quite a bit of commentary on the programming web, eliciting responses from Uncle Bob Martin and Tim Bray and hundreds of comments on sites like the programming reddit and hackernews.

It was probably fortunate for me that Spolsky used the title he did—praising someone while calling them a “duct-tape programmer” is provocative and the provocation probably helped drive interest which, no doubt, led to a few people buying the book. So, that’s awesome. Thanks, Joel!

On the other hand, being provocative sometimes leaves little room for nuance. Since I dragged Zawinski into this by asking him to be in the book, I thought maybe I could try to unpack some of the context for folks who haven’t yet read the interview.

The first thing to remember is that the first version of Netscape was written in 1994. When Zawinski talks about the choice to not use C++ and the problems with templates, he’s not talking about C++ and templates in 2009; he’s talking about 15 years ago, four years before the language was officially standardized and a time when different compilers didn’t necessarily implement all the nooks and crannies of the language the same way. And Netscape had to run on Windows, Unix, and Mac, so reliable portability was especially important.

The second thing to keep in mind is that the Netscape team in 1994 was under tremendous, if self-imposed, time pressure. I asked Zawinski how it was that the original Netscape team, many of whom had worked on Mosaic at the NSCA and were thus getting a second crack at the same problem, avoided falling into second-system syndrome:

Zawinski: We were so focused on deadline it was like religion. We were shipping a finished product in six months or we were going to die trying.

Seibel: How did you come up with that deadline?

Zawinski: Well, we looked around at the rest of the world and decided, if we’re not done in six months, someone’s going to beat us to it so we’re going to be done in six months.

Could they have made a different trade-off in the old “fast, good, cheap—pick two” space? Possibly. But history certainly testifies that they produced a product quickly enough that no one beat them to the punch and good enough that they changed the history of the web.

But wait! Didn’t Netscape eventually collapse under the burden of accumulated cruft? Didn’t the Mozilla project have to do a massive, ground-up rewrite because the old code base was so covered in duct tape that it was impossible to work with? Doesn’t that completely disprove Spolsky’s point?

Not clear. I haven’t done an extensive investigation of the history of Netscape’s code but I do have Zawinski’s version of events and some corroboration from Brendan Eich, who was also there at the time.

Certainly they paid a price for their rapid development:

Seibel: After this relentless pace, at some point that has to start to catch up with you in terms of the quality of the code. How did you guys deal with that?

Zawinski: Well, the way we dealt with that was badly. There’s never a time to start over and rewrite it. And it’s never a good idea to start over and rewrite it.

Yet we all know that Netscape famously was rewritten after it was spun out into the Mozilla project. So case closed, right? Not quite. What many people don’t know is that—at least according to Zawinski—the Mozilla rewrite was the second rewrite. Here’s Zawinski’s version, from the part of his Coders interview where we were talking about his work with Terry Weissman on the mail reader that was added to Netscape in version 2.0:

Zawinski: So basically [Netscape] acquired this company, Collabra, and hired this whole management structure above me and Terry. Collabra has a product that they had shipped that was similar to what we had done in a lot of ways except it was Windows-only and it had utterly failed in the marketplace.

Then they won the start-up lottery and they got acquired by Netscape. And, basically, Netscape turned over the reins of the company to this company. So rather than just taking over the mail reader they ended up taking over the entire client division. Terry and I had been working on Netscape 2.1 when the Collabra acquisition happened and then the rewrite started. Then clearly their Netscape 3.0 was going to be extremely late and our 2.1 turned into 3.0 because it was time to ship something and we needed it to be a major version.

So the 3.0 that they had begun working on became 4.0 which, as you know, is one of the biggest software disasters there has ever been. It basically killed the company. It took a long time to die, but that was it: the rewrite helmed by this company we’d acquired, who’d never accomplished much of anything, who disregarded all of our work and all of our success, went straight into second-system syndrome and brought us down.

They thought just by virtue of being here, they were bound for glory doing it their way. But when they were doing it their way, at their company, they failed. So when the people who had been successful said to them, “Look, really, don’t use C++; don’t use threads,” they said, “What are you talking about? You don’t know anything.”

Well, it was decisions like not using C++ and not using threads that made us ship the product on time. The other big thing was we always shipped all platforms simultaneously; that was another thing they thought was just stupid. “Oh, 90 percent of people are using Windows, so we’ll focus on the Windows side of things and then we’ll port it later.” Which is what many other failed companies have done. If you’re trying to ship a cross-platform product, history really shows that’s how you don’t do it. If you want it to really be cross-platform, you have to do them simultaneously. The porting thing results in a crappy product on the second platform.

Seibel: Was the 4.0 rewrite from scratch?

Zawinski: They didn’t start from scratch with a blank disk but they eventually replaced every line of code. And they used C++ from the beginning. Which I fought against so hard and, dammit, I was right. It bloated everything; it introduced all these compatibility problems because when you’re programming C++ no one can ever agree on which ten percent of the language is safe to use. There’s going to be one guy who decides, “I have to use templates.” And then you discover that there are no two compilers that implement templates the same way.

And when your background, your entire background, is writing code where multiplatform means both Windows 3.1 and Windows 95, you have no concept how big a deal that is. So it made the Unix side of things—which thankfully was no longer my problem—a disaster. It made the Mac side of things a disaster. It meant it was no longer possible to ship on low-end Windows boxes like Win16. We had to start cutting platforms out. Maybe it was time to do that, but it was a bad reason. It was unnecessary.

Brendan Eich, who was also at Netscape at this time, confirms at least part of Zawinski’s account:

There was an imperative from Netscape to make the acquisition that waved the Design Patterns book around [i.e. Collabra] feel like they were winners by using their new rendering engine, which was like My First Object-Oriented Rendering Engine. From a high level it sounded good; it used C++ and design patterns. But it had a lot of problems.

He does, however, go on to say:

But the second reason we did the big rewrite—I was in mozilla.org and I really was kind of pissed at Netscape, like Jamie, who was getting ready to quit. I thought, you know, we need to open up homesteading space to new contributors. We can’t do it with this old hairball of student code from 1994. Or my fine Unix kernel-style interpreter code.

So Zawinski and Eich differ a bit on the extent to which the Collabra folks had completely replaced the old code in 4.0. If Zawinski’s recollection is accurate and the 4.0 rewrite had really replaced all the old code, then the fact the Mozilla project felt compelled to do a big rewrite is only evidence of what a mess the first rewrite had made of things and says nothing about the quality of the pre-4.0 code. Or maybe they hadn’t quite replaced everything but had still messed things up enough that it was easier to start over. Or maybe it really was the legacy of the 1994 code—too much duct tape—that was the real problem. Ironically, if Zawinski had had his way it’d be much easier to know what really happened—he tells me in a recent email that he tried, in 1998, to get Netscape to open source both the 4.0 and 3.0 code bases but they wouldn’t go for it.

5 Responses to “Duct tape context: A tale of two rewrites”

  1. Roy Wilson Says:

    I am very much enjoying your Coders book (and still reading). Maybe I’m jumping the gun based only on a look at the index, but what I’ve read has just spawned the following.

    I wonder if you *decided* not to interview developers who work (more or less exclusively) in Eclipse/Netbeans/etc or it just turned out that way. I ask because the build system capability that Fitzpatrick (p. 70) desires is almost there using Maven and the Eclipse export capability.

  2. Peter Seibel Says:

    I didn’t choose people based on the tools they used or didn’t. However I did, as you can see, mostly choose folks who have been working for some time. I guess all I can say is that those folks, for whatever reason, haven’t moved onto tools like Eclipse. Perhaps in another couple decades a similar book would be full of folks who grew up using IDEs. (And perhaps at that point, those tools will seem primitive compared to the shiny new thing.)

  3. The Geek Peek (An insight into Blogosphere) « DreamXtream’s Blog Says:

    [...] in Computer Science and Programming, the world has seen. Some parts of the interview (quoted here, http://gigamonkeys.wordpress.com/2009/09/28/a-tale-of-two-rewrites/ ) with Jamie Zawinski gave a stir to the blogging [...]

  4. Matt Rose Says:

    jwz’s diary entries, along with anecdotes from some other people who were at Netscape at the time confirms the absolute insanity of their deadlines. From what I heard. 18 – 20 hour days, 7 days a week was the norm. Pushes were into the multi-day stretches at work.

    http://www.jwz.org/gruntle/nscpdorm.html

  5. Matt Rose Says:

    Laura Lemay’s take:
    http://www.lauralemay.com/essays/startuprant.html

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.