also visit: Theatre of Noise | Soundings

about: this site | me

subscribe: RSS

10 April 2006

Tab Versus Spaces

There are several famous holy wars in the history of computing. There's emacs versus vi, Mac versus PC, and tab versus spaces. These are commonly referred to as "religious issues", meaning that they are a matter of belief only. Now, the first I couldn't care less about, since I use neither text editor and preferences for keyboard mappings seem a relic of the days when dinosaurs walked the punch-card strewn floors of computing. Mac versus PC is an easy one too: use a Mac if you like spending more money for other people to make up your mind for you, otherwise use a PC. (Actually, Macs are becoming PCs1 so what's the difference?)

But on the tab versus spaces issue I must take a stand and make a few declarations. First, the common wisdom, as espoused most emblematically by Jamie Zawinski, is dead wrong. And second, it is not a religious issue at all, because there is a correct answer.

This relates to Python in a very direct way. Python is unique among programming languages in mandating indentation as part of the language syntax: every code block must be indented.

Some slack-brains take one look at this and run away screaming "Whitespace as syntax! Whitespace as syntax! Garrrrrrr!" But anyone who actually uses Python for longer than .3 seconds realises that this rule simply enforces what any good programmer already practices. This correct indentation becomes a guide to the compiler, so curly braces and other excess syntactic cruft are not needed. It's a neat, simple solution that results in code of greater readability -- a real win-win scenario as managers say on television. (And possibly elsewhere. I wouldn't know; I avoid managers2.)

But back to the issue at hand. First, why is tab versus spaces not a trivial issue? I suppose that for any given programmer it is, but if you work on a team you must come to some sort of agreement or be constantly converting back and forth. This is possibly time-consuming, possibly error-prone, and possibly not even that easy depending on your tools. So it's best avoided by agreeing on a standard.

Before going any further we should acknowledge the most important article on the subject and read jwz's "Tabs versus Spaces: An Eternal Holy War."

Jamie breaks the issue down into three parts. He notes that hitting the tab key does different things on different systems, but as you can configure this behaviour it's not the main problem at hand. Furthermore, encountering a tab character in a file is the same: you can do what you wish to do.

The third part of the issue (actually it's his #1) must then be the real problem: people care about how many columns a tab/indent represents. He calls this a "religious war" because he has misidentified the problem. His solution is to have tabs expand to spaces before writing a file to disk, so tabs never exist in interchanged files.

This assumes one will never have to open a file containing spaces and intuit what tabs are supposed to be there, or how the spaces should be interpreted. Perhaps jwz never has bugs or has to re-edit his code. Or perhaps he's used to write-only languages like C++ and Perl where the least of your problems are relating to tabs because you are already committed to spending inordinate amounts of your precious life wading through code trying to find the crucial lines that actually do something.

Then he spends the second half of the rant talking about specific settings in arcane software.

Where did jwz go wrong?

First, his three points are not separable, at least not in the way he would like. Discussing the tab key he says "this is an editor user interface issue" in order to dismiss the problem. Editors treat tabs here as "indent to a column position", there as "add so many spaces", and in a third case as a single character of value "ASCII 9". So, according to jwz, it's not an important part of the problem.

But how is this different from the point he says is the most important, that when reading and writing code, people "care about how many screen columns by which the code tends to indent when a new scope (or sexpr, or whatever) opens"? (Strangulated syntax in original.)

Answer: it is not different. There are not three points; there are two.

A tab has both syntax, a representation, and semantics, a meaning.

With this plain and simple approach it is dead obvious that using spaces to represent tabs at the level of encoding is inherently wrong because we are throwing away meaning. A tab no longer has its own representation but is instead subsumed into how we represent spaces.

An example illustrates the problem. If you save all tabs as two spaces and the next person who opens the file instead wants to see tabs as indenting to a given column position, how are they going to do that? First they'll have to assume that all two-space sequences are tabs, and then they can interpret those tabs. But the assumption is dangerous and wrong. Everywhere that two spaces do not in fact represent a tab there will be an error of interpretation. Meaning will have been lost.

Spaces cannot represent tabs as encoding. They can in a display, but that is the choice of the viewer at the instant and not something to be persisted eternally in the file.

Preserving tab characters allows them to mean something different to each user. This makes no assumptions about the capabilities of the tools used. It does not require everyone to use the same editor with certain macros playing to automatically convert, or any such nonsense.

Furthermore tabs have the following advantages:
* one key to hit instead of up to 8 3
* not open to error when 7 or 6 spaces are hit instead of 8
* makes diffing files easier (because of previous two points)
* smallest file size

Use tabs not spaces. Why would anyone be so foolish as to suggest any different?

1 Not only do they use Intel processors and ATI video cards but also boot Windows.

2 For that matter I avoid television as well.

3 You think you won't ever have to do this because you have your tabs set to 4 characters and inserted automatically? Just wait until you look at someone else's code and all you have are spaces as guides. Sucker.

7 comment(s) follow:

  Anonymous nasorenga wrote at 12 April, 2006 23:30...

The root of the problem is that tabs are visually indistinguishable from spaces.

This fact, and the obvious futility of the war cry "use tabs, not spaces" goes to show why it is a mistake to design a language syntax where indentation is significant.

Contrary to what is stated in the article, Python is not the first language to use indentation in this way. Occam, the Transputer programming language, pioneered the idea in the 1970s.

  Anonymous wx wrote at 13 April, 2006 13:38...

your cause is futile. reality does not work like you say it should.

  Blogger robin wrote at 17 April, 2006 18:40...

Since I deny the "obvious futility" I deny your conclusion. Further, I see that Python is not a mistake so your hypothesis is wrong.

Thanks for spotting Occam, though. Besides that, many languages that used punch-cards needed correct column positioning of certain code elements, though this was not a matter of code block indentation.

wx: I await with bated breath your attempt to explain how in fact "reality" "works". I was under the impression that the reality principle stopped working some time ago.

  Blogger robin wrote at 17 April, 2006 18:44...

This article has been picked up by the programming reddit with the description: "Nothing as great as a guy kicking down a door holding a machine gun in his hand yelling 'This isn't a religious issue!' and then proceeding to rampage through the room."

  Anonymous Anonymous wrote at 20 March, 2007 11:42...

Pretty much right on. I really don't understand the hatred of tabs to represent tabs.

The only problem is horrible programmers that mix tabs and spaces.

"Uh, let my figure out what weird tab stop they were editing this file with" *fiddles with tabstops for 15 seconds until it looks right*

The solution is to get rid of the spaces, and let the user display it at whatever tabstop he wants.

  Anonymous Anonymous wrote at 07 July, 2007 00:30...

"use a Mac if you like spending more money for other people to make up your mind for you"

Use Python if you like learning a completely different syntax [1] and if your like having other people [2] make up your mind as to how indentation should be done.

[1] Perl's syntax is wonderful because it intelligently builds upon other fine programming languages like C, sh, sed, and awk. Python went a different way.

[2] The designers of Python.

The thing that irks me about your blog entry is that I simply do not believe that Python is a magic wand for producing wonderful products comprised of readable code.

I am an enthusiastic Perl programmer. Perl is not a write only programming language. All of my Perl code is very readable and useful, even after more than a decade. You obviously decry Perl as a write-only programming language because of all of the cute hacks that can be accomplished in Perl, but (1) do you understand that such hacks are done for expediency and/or to show how clever the author is and are not written by *good programmers* in *production code* and (2) where are all of the clever Python one-liners? Oh, that's right, due to languange design restrictions, there AREN'T ANY...

  Blogger robin wrote at 07 July, 2007 12:17...

The entry already addresses your indentation phobia. If you are writing Perl code with no indentation, your code is crap. BTW, I have written production Perl code and am certainly not unaware of its virtues and shortcomings. I simply don't view encrypted syntax as a virtue. The lack of "clever Python one-liners" is an entirely good thing, as I espouse clarity over clever dick programmer tricks.

Python is not a magic wand but it is a magician's toolkit.

Post a Comment or go back to home