The storage and transmission of digital data lies at the heart of modern computers and telecommunications. If data is corrupted in storage or transmission, the consequences can range from mildly annoying to disastrous. How can we code the data to make errors considerably less likely?

Suppose you want to send a message in the form of a binary string,
i.e., a string of 0's and 1's. Due to noise in the medium over which
you are transmitting the message, be it a telephone line, a radio link
or the Internet, it is possible that some of the 0's will be received
as 1's and vice versa. Let us denote the probability that any one bit
is garbled in transmission by *p*. Typically, *p* will be
small, say about one in a million, but that may still be too large for
our purposes. If we are transferring a file that has 10 million bits,
the chance that the file is incorrectly received is nearly 90%.

How can we improve the reliability of the transmission? The simplest
method is to repeat ourselves three times. For example, if we want to
send the string 010, we transmit 000111000. If a single error occurs
in transmission, this may be received for instance as 001111000. At
the receiver, we use what is called majority decoding. We look at each
set of 3 bits and decode it as the bit which occurs most often. Thus,
the received message above is correctly decoded as 010 even though an
error occured in transmission. In the above example, it could have
happened that two errors occured and we received the string
101111000. This would be decoded as 110, which is different from the
original message. This shows that we cannot eliminate errors
altogether, we can only make them less likely. To see how much less
likely, let us work out the probability that a single bit is
incorrectly received. Since each bit is transmitted three times, it
has to be incorrectly received at least twice in order for the decoder
to get it wrong. The probability of this is
3*p*²+*p*³, which, for *p* equal to one in a
million works out to about one in 330 billion. Thus, using the
repetition code, a 10 Mbit file can be sent correctly with probability
99.997%. While this coding scheme dramatically reduces the likelihood
of errors, it is rather inefficient. Can we do better?

Here's one way, using the famous (7,4) Hamming code. This works with a block of 4 bits at a time rather than with a single bit. Suppose we want to transmit the block 0011. Look at the picture below. We write these bits in the intersections of the circles. Next, we fill in the part of each circle lying outside the intersection using the rule that the total number of 1's contained in each circle should be even. The extra bits are called parity bits -- they ensure even parity for the number of 1's in each circle. The message we transmit will consist of all seven bits, in a pre-arranged order so that the receiver can reconstruct the diagram. Suppose now that a single error occurs, so that the receiver gets a 1 in the intersection of the top two circles. The receiver proceeds to work out the parity for each of the circles, and finds that the top two circles both have odd parity (odd number of 1's), while the bottom circle has even parity. This enables her to conclude that an error has occurred, and that the error is in the bit position common to the top two circles. Therefore, the intended message can be decoded correctly. It is not hard to see that a similar approach works if the error is in some other bit position. However, if there are two errors, then the decoding procedure fails in that it yields a message different from the intended one.

Historically, a major impetus for the development of coding theory came from the need to communicate with planetary probes. The fact that these can't afford to carry large power supplies, combined with the great distances from which they transmit, make the received signals very weak. Thanks to error-correction coding, it is nonetheless possible to get high-quality pictures like this one of Saturn sent by Voyager 2. Today, error-correcting codes maintain the integrity of data stored on computer tapes and disks and in semiconductor memories; they ensure that compact discs continue to provide flawless sound even if they are slightly scratched; they enable long-distance communications using satellites; they are even used in the bar-code readers in supermarkets.