If you regularly follow comments on this blog, you'll know that I've been having a back-and-forth with a guy who doesn't know much about information theory, and who's been using his ignorance to try to assemble arguments against the Kolmogorov-Chaitin information-theoretic measure of information in a string. In the course of doing that, I came up with what I think are some interesting ways of explaining bits of it, so I thought I'd promote it to the top-level to share with more readers.
To be absolutely clear up front: I'm far from an expert on K-C theory. It's something that I find incredibly fascinating, and I've read a lot of Chaitin's work. I've been fortunate enough to meet Greg Chaitin a bunch of times when we both worked at IBM, and I've attended a bunch of his lectures. But this isn't my main area of expertise, not by a long-shot.
If you don't know any K-C information theory, then you can look at my introduction here. The rest is beneath the fold.
Information vs. Meaning
Kolmogorov-Chaitin information theory is focused on quantification - that is, on being able to measure the quantity of information in a string of characters. It doesn't care what the string means. Information theory cares about how much information is in a string; it doesn't care what the string means. In fact, you can say something much stronger about information theory: if you consider meaning at all, then you're not doing information theory. A truly abstract string could mean many different things: information theory's measurement of its information content must include everything that could be used relative to any possible system of meaning to determine the information content of a string.
I'll borrow a tiny bit of formality from denotational semantics. In denotational semantics, we take a domain of objects and functions that range over those objects, and we map statements - sentences that have some syntactic structure - onto the objects and functions in the domain. To assign meaning, what you're basically doing is creating a system where you can map from syntax - that is, from strings of symbols - to objects and functions. The string of characters "Mark Chu-Carroll" has absolutely no intrinsic meaning. But by interpreting in a domain of meaning, we can say that the domain maps that string of symbols in a way that makes it a representation of a reference to me. But there's nothing about that string that makes it necessarily be a reference to me. It could mean many different things, depending on domain of meaning which we use to interpret it.
To be a bit more specific, consider the string "0000000000000000000000000000000000000000". That's a string of 40 "0"s. If you interpret that as a number - that is, you understand the string of digits as having a meaning as a number, it's the number 0. But a string of 40 zeros is not equivalent to the number 0 in terms of information theory. That string could also mean the number 40 understood as a unary number. Or it could mean a sequence of 5 bytes containing nothing but zeros in a computer memory. Or it could mean sequence of 41 bytes, the first forty containing the ASCII code for the character "0", and one containing a terminator-mark. The string contains enough information to allow you to interpret it as any of those things. But if you choose any of those as the meaning of it, and work out a representation of that meaning, you'll wind up with a different answer to the quantity of information contained in that string than you would for the string considered as an abstract. Choosing a framework in which to assign it meaning can make a string appear to have either more or less information than the string-as-abstract.
You can make it appear to have more information by choosing a system of meaning in which the system itself contains extra information that it assigns to the string; for example, when I say "the complete table of printable unicode code-points in order", if I interpret that string in the right framework of meaning, it means a sequence of over 1100 characters. But those 1100 characters aren't containing in the string; they're part of the framework of meaning that's used to interpret the string.
You can also choose a framework in which to assign meaning which makes a string appear to have less information content than the string-as-abstract. For example, the string of 0s above. If we interpret a string of 40 "0"s as meaning zero, then in terms of meaning, that string has one character of information.
What's information?
Intuitively, it's hard for us to separation information from meaning. Informally, we consider them to be the same thing. But in information theory, they're very different. As I said above, meaning is what happens when you interpret a piece of information in some context. But absent any context, how can you quantify its information content?
To be strict, you can't. Without any context of any kind, you've got no way to quantify anything. But you can create a sort of minimal context, which gives you a really simple way to quantify the amount of information in the string.
The basic, fundamental idea is description. The amount of information contained in a string is the length of the shortest description that uniquely identifies that string.
And that's the essence of information theory: description. The amount of information in something is the shortest description that can uniquely reproduce that thing. It doesn't matter what the thing means. You don't need to understand it. You don't need to interpret it. All you need is a description that allows you to reproduce it. To produce the string 10101101, you don't need to know that it means the number ten million, one hundred and one thousand, one hundred and one interpreted as a number written in decimal notation. You don't need to know that it means the number 173, interpreted as a binary number. All you need to know is that it's a sequence of symbols.
That definition might seem like a trick - it just replaces "quantity of information" with "length of the shortest description". How can we describe something in a way that doesn't involve meaning?
The answer is: by computation. We can define a simple, minimal universal computing device. A description is a program that generates a string. The shortest description of a string is the shortest input-free program that can generate that string.
Of course, there's another copout there. What's minimal? And won't the information content of a string be different using different computing devices?
Minimality is a tricky topic to tackle formally. You could easily write a PhD dissertation (and someone probably already has) on a precise definition of a minimal computing device. But informally, the idea that we're getting at is very simple: no magic instructions. That is, each instruction performs a single action - so you can't output more than one symbol with a single instruction.
For the second question: yes, that's true - to a limited extent. But given any pair of computing machines, the difference between the length of the shortest programs to generate a specific string will vary by no more than a single fixed constant.
That last point is easy (and fun) to prove. First, we need to formalize the statement. Take two computing machines, P and Q. Let the information content of a string S relative to P be written as I(S, P), and relative to Q be I(S,Q). For all strings S, the absolute value of I(S,P) - I(S,Q) will be less than or equal to a constant C.
Here's the proof:
- if P and Q are both universal computing devices, then there is a program for P which allows it to emulate Q; and there is a program for Q which allows it to emulate P.
- Let the program for P to emulate Q have size Pq, and let the program for Q to emulate P be Qp.
- Consider a string S. Let the program for P to generate S be PS, and the program for Q to generate S be QS. If PS is shorter than QS, then the shortest program for Q to generate S cannot possibly be longer than QS + QP.
- Reverse the above statement for P and Q.
- Now, take the longer of PQ and QP. For any string S, it should be obvious that information content of S relative to P or Q cannot possibly differ by more that max(len(PQ), len(QP)).
No magic there. It's very straightforward.






Comments
Mark C. Chu-Carroll: You could easily write a PhD dissertation (and someone probably already has) on a precise definition of a minimal computing device.
Turing's "On Computable Numbers, with an Application to the Entscheidungsproblem" wasn't his dissertation, but definitively covers the material. There's a few papers on the simplest possible universal Turing Machines; (doi:10.1016/j.tcs.2006.06.002) isn't bad.
Posted by: abb3w | September 21, 2009 1:42 PM
Thanks for that short and clear introduction. I was familiar with the basics of K-C theory, so most of it wasn't new to me, but I wasn't aware of the proof that minimal descriptions all have to be within a fixed constant of each other. As soon as I read point 1 of the proof, though, I knew exactly where you were headed. I love computing. :)
Posted by: Nick Johnson | September 21, 2009 4:23 PM
MCC:We can define a simple, minimal universal computing device
There is no actual requirement that the device be minimal, other than aesthetics. Any two devices will still have their answers differ by at most a constant (albeit a very large constant).
A device in with the command "print_complete_works_of_Shakespeare" is just unsatisfying, but technically legal.
Posted by: Jeremy H | September 22, 2009 12:15 AM
Quarto or Folio? :)
Posted by: Stephen Wells | September 22, 2009 5:17 AM
Thanks for the intro/summary - I'd gathered as much from bits and pieces of your other posts and comments on information theory.
Re information vs. meaning, of course this is bound up in Dembski's fruitless and rather desperate pursuit of some definition of information that is on one hand scientifically valid, and on the other would require external (preferably divine) input in the origin and evolution of the nucleic acid instructions that regulate basic microbiological functions.
One could also cast the comparison as information vs. purpose, where the K-C information content doesn't depend on what the string is going to be used for, while for Dembski et al., it's of course vital that the string (i.e., the DNA or RNA sequence) be fit for use as a microcellular instruction set.
Posted by: Jud | September 22, 2009 7:37 AM
Just a quick question -- how do you write the number 0 in unary numbers?
Posted by: psweet | September 23, 2009 9:10 AM
Re #6:
In unary, 0 is an empty string.
Posted by: Mark C. Chu-Carroll | September 23, 2009 9:44 AM
Mark, thanks for this very clear summary. This is one of my favorite blogs on SB.
Posted by: David | September 23, 2009 11:50 AM
Thanks, Mark. I'm not sure how useful that would be in practice, but then that's why the Roman system disappeared -- it simply wasn't very useful.
Posted by: psweet | September 23, 2009 2:36 PM
Re #8:
No, it's not all that useful in practice. The main use of unary is in highly formalized explorations in automata theory. If you want to do things like play with a Turing machine, and you're considered with capability not speed, then unary is frequently the easiest notation to work with. But on a Turing tape, the fact that unary zero is invisible isn't a problem, because you use delimiters, and the side-by-side delimiters clearly indicate a zero.
Posted by: Mark C. Chu-Carroll | September 23, 2009 2:42 PM
Mark, do you agree then with the statement that this page (for example) does not contain "knowledge"? Lots of people talk about documents as if they contain knowledge. ("knowledge management") However, it can only become knowledge by attributing meaning to the information, which is done by an interpreter (usually a human).
Posted by: Leo | September 23, 2009 3:23 PM
Re #11:
I don't know. I'm honestly not sure of how to define "knowledge". Speaking intuitively, I don't think that "knowledge" can really be said to exist independently of someone who knows it - so a passive document can't contain "knowledge", since it can't "know" anything.
I would argue that this page contains both information and intended meaning, which I think might be what you're getting at. I wouldn't be comfortable saying that it contains meaning, because meaning only exists in the mind of a reader. But I wouldn't be comfortable saying that it's meaningless either, because it was written with the intention of producing a specific meaning in the mind of the reader.
Posted by: Mark C. Chu-Carroll | September 23, 2009 3:51 PM
Well, knowledge is "a belief justified by evidence" according a friend of mine who's doing AI research. Of course, then you have to define "belief", "evidence" and "justified"...
Posted by: Snoof | September 23, 2009 8:06 PM
But given any pair of computing machines, the difference between the length of the shortest programs to generate a specific string will vary by no more than a single fixed constant.
One interesting thing is that this constant is computable and represents the boundary between the programs which can be proved minimal and those which cannot be proved to be minimal.
Chaitin explicitly computes the constant for a LISP interpreter in one of his books.
Posted by: p | September 24, 2009 3:15 PM
Seems like Dembski would be ahead to try something more along the lines of Crutchfield/Shalizi computational mechanics, which provides a natural measure of complexity.
As it is, he's really just making a huge and basic category error.
Posted by: p | September 24, 2009 3:21 PM
"Well, knowledge is "a belief justified by evidence" according a friend of mine who's doing AI research. Of course, then you have to define "belief", "evidence" and "justified"..."
This isn't right, since it doesn't rule out "knowing" false claims. I can't know that that p if p isn't true--even if I have some (misleading) evidence for p. So the better approximation is: knowledge is justified *true* belief.
But even this won't work: there appear to be justified true beliefs that don't amount to knowledge. Suppose, for example, that I look at my watch; it says '6:00', and I come to believe on that basis that it is six o'clock. Suppose further that it *is* 6:00. (So I have a justified true belief that it is 6:00.) Now the twist: unbeknownst to me, my watch stopped working exactly 24 hours ago. Do I know that it is 6:00? Most people have the intuition that I don't--after all, had I looked at my watch fifteen minutes earlier, I *still* would have believed it was 6:00. Thus we have a case of justified true belief that isn't knowledge.
There's a very large philosophical literature trying to work out exactly what the necessary and sufficient conditions for knowledge are. (As well as literatures on belief, justification, and evidence.)
Posted by: jdkbrown | September 24, 2009 6:58 PM