Differential Cryptanalysis


In the modern era, it is often desirable to protect the privacy of information in some manner. Probably the most popular method to do this is by using something called encryption to render the message, or plaintext, unreadable to any but the intended receiver. However, as desirable as it is to keep one's information private, it is also desirable to break through this encryption to read a message even if one is not the intended receiver. This paper will attempt to introduce some concepts of cryptography, and especially some ideas pertaining to cryptanalysis, the breaking of encryption.

Before the actual ideas are introduced, some basics of encryption must be understood. In cryptography, one refers to the message that is not encrypted as the plaintext, or P. One refers to the message that is encrypted as ciphertext. Although different letters are used to indicate ciphertext, this paper will follow the convention that uses T to indicate the ciphertext. Also vitally important to cryptography is the key K. It is by this set of information that one is supposed to decrypt T. The idea is that unless one has K, one cannot transform T into P (decryption). Cryptanalysis is the attempt to violate this. Either one attempts to determine K without prior knowledge of K, or to determine P from T regardless of K.

The basic idea of cryptography is as follows. The plaintext P is permuted or encrypted in some manner by some method called a cryptoscheme to convert it into an unreadable form. However, the cryptoscheme uses some key K in order to convert P into this unreadable form, called T. Part of the cryptoscheme includes a related method to convert T back into P using K with about the same amount of work as was done to create T from P. This second process, called decryption is usually considered the inverse operation of the encryption scheme. However, without the key K, it is very difficult to decrypt T, and the goal is to make it as difficult as possible. Cryptography works on vectors of bits (0's or 1's). The length of the vector varies depending on the part of the cryptoscheme in question. However, the XOR of two vectors would be the resultant vector when each bit of the two input vectors is XORed (0 if the inputs are the same, 1 if they are different). The notation A .xor. B will mean the XOR of A and B.

Just as there are those who seek to create cryptoschemes to protect the privacy of information, there are those who seek to "break" a cryptoscheme. Breaking a scheme involves determining what the plaintext P was from T, without prior knowledge of the key K. It may involve figuring out what the key K used was, or it may involve creating an inverse operation that for a given cryptoscheme, will reliably produce P from T without knowledge of K. Those who seek to break a cryptoscheme are referred to as cryptanalysts.

The cryptanalyst is always assumed to be familiar with the cryptoscheme that they are attempting to break. Thus security through obfuscation (keeping your security by not letting anyone know the details of your encryption scheme) is not considered a good method in general, though it does make the practical task of breaking your scheme more difficult. There are two standard types of encryption, private key and public key. Although public key is far easier to analyze, private key is far more widely used. In this paper, one of the most famous examples of a private key system will be looked at, and a method to break it will be introduced.

A private key system has only one key that is kept completely secret. The example to be described here is DES, or the Data Encryption Standard. Introduced in the early to mid 1970s, this scheme has been adopted as the official US standard for sensitive but non-classified information. DES also is important for it describes a class of encryption schemes called iterative schemes. An iterative scheme takes a relatively simple and nonsecure function, and repeats or iterates it many times to increase security. For the diagrams and tables needed to understand DES, see figures 1 and 2, and the associated tables. DES works on a block of 64 bits and encrypts each block at a time. (A block is a vector of bits, in this case containing 64 entries). The way it is done is that the data is split into the two halves. The right half of the data is worked upon, while the left half is left alone (mostly). The right half enters the F function, or F box, which also takes as input a subkey, and then the result of the F box is XORed with the left half. Then the new left half becomes the right half, while the old right half (before going into the F function), becomes the new left half. That process is called a round. Full DES calls for 16 rounds. Now, the subkey is a subset of the key (a vector with 56 bits of information, and 8 checksum or parity bits). The particular subset used is determined in a regular method determined by a permutation and rotation. As the diagram shows, the key is split into halves (28 bits per half, the checksum bits are ignored), and each half is permuted cyclically (rotated left). The halves are then sent as the input to another permutation function which spits out the 48 bit subkey. The result of the rotation however is used to determine the next subkey (is rotated again, and again). The fact that the full key K determines the subkey k is important. The subkeys are said to be dependent upon each other. If they were not, then they would be called independent. However, because the key halves are rotated again and again, a different subkey is used each round, furthering the security of DES, though not as much as if the keys were independent.

DES has one curious feature in it called the Initial Permutation (IP) and Final Permutation (FP). These permutations come before and after all the rounds, and have no cryptographic value as they work independently of the key. For this reason, their existence is ignored in most discussions of DES. The two permutations will merely permute the bits in a fixed fashion, with the IP being the inverse operation of the FP.

Finally, one comes to the F function. The F function takes a 32 bit vector as input, as well as a 48 bit subkey, and has as output 32 bits. The input is expanded through an expansion function E (see tables) which repeats certain bits of the vector in a fixed manner to turn the input into a 48 bit vector. This 48 bit expanded vector is XORed with the subkey k (which is also 48 bits). The resultant of the XOR is sent through S boxes which translate 6 bit vectors into 4 bit vectors. The first 6 bits of the 48 bit vector would go through S1, the next 6 to S2, and so on, up to S8. For the full description of the S boxes, see the tables. The value of the first and last bit of the six bit vector is used to determine the line to read for the S box. So if the input to S1 were 101110, it would read line 10 (decimal 2, but 0 based numbering is used, so the third line). The tables are in decimal. The middle 4 bits determine which column to read. So, in our example, 0111 translates to decimal 7, but it is 0 based numbering so it would mean the 8th column. Looking up in the S1 table, line 3, column 8 has a 11. So the output of this function would be 1011 (which is binary for 11). The other S boxes work in a similar manner, but with different look up tables. The output of the 8 S boxes (a 32 bit vector) is sent through a permutation P (not to be confused with the plaintext P), which merely twists the bits around in some set fashion.

Decryption of DES is done by feeding T into DES backwards. Since all of the operations are reversible if you have K, decryption is essentially as fast as encryption.

Several things become apparent when one examines DES. First of all, it is very easy to implement in hardware. Permutations and rotations of a vector of bits are quite easy to do in hardware, but much harder to do in software. Second of all, cryptanalysis is not a trivial task. For this reason, no method was introduced to break DES faster than a brute force search until 1991.

Before this method is mentioned, one should be introduced to the basic idea of cryptanalysis. In any cryptoscheme, one can always try what is known as a brute force or exhaustive search. This method attempts to decrypt the data by trying each possible key. Needless to say, this method takes a very long time. In fact, it would take one up to 2 to the |K| tries to break an encryption scheme where |K| is the size of the key K. In DES, |K| is 56. Until very recently, this key size was sufficiently large to make exhaustive searches unreasonable even with the fastest computers in the world. Today, some very specialized computer hardware could break DES by such a search, but not cheaply, and only the simpler modes of DES. One is aided however by a special property of DES. In DES, if one encrypts P, one can test both P and the compliment of P (the same vector where all the bits are flipped). This property allows one to reduce the number of tries in a brute force search by a factor of 2, meaning one only needs to encrypt 2 to the 55th times to find the key. Today, DES is considered bad primarily because of its small key size. Given this fact in mind, and the difficulty of a brute force search on DES, which has less than half the size key of modern cryptosystems, the cryptanalyst tries to find some shortcut.

In this paper, the simplest mode of DES only will be examined. This mode involves applying DES on each vector of 64 bits, and then applying DES again on the next 64 bits. Other modes of DES include doing such things as XORing the previously encrypted block with the just encrypted block to produce the output block. These other modes are considered more secure, and are harder to analyze.

Ignoring the obvious shortcuts coming from a bad key choice, this task is where the job of the cryptanalyst becomes difficult. In 1991, Biham and Shamir published a paper that was the first method in open literature that would break the full DES in less than 2 to the 55th complexity. (Guaranteed to break DES in less than that number of encryptions). This method, called Differential Cryptanalysis also proved surprisingly effective against other, related cryptosystems that seemed quite similar to DES. It turns out that the people at IBM who designed DES were aware of this method in the 1970's, and designed DES to be optimally resistant to this method. However, differential cryptanalysis is still a viable method to use for reduced versions of DES (using less than the prescribed 16 rounds), and is still effective in making easier an attack on the full DES.

Differential cryptanalysis works by what is called a known plaintext attack. It takes plaintext blocks that the cryptanalyst knows, and attempts to determine the nature of the encryption algorithm by encrypting them with a random key that the cryptanalyst pretends that they do not know. This method is useful because if for a large number of plaintexts, you can determine a certain behavior of the algorithm, you can help determine how changing the key affects the ciphertext, and in fact begin to guess bits of the key. In fact, that is how differential cryptanalysis works--by guessing bits of the key through various methods, until a brute force search on the remaining bits of the key becomes feasible and faster than other, more complex methods. One important note about differential cryptanalysis is that it always works using a pair of plaintexts or inputs, so that they may be compared. The inputs are always known.

The basic method of differential cryptanalysis uses several definitions and concepts. However, before these concepts are introduced, several known formulas should be pointed out that justify the focusing on the S boxes. Because the F function uses an expansion function, if we wish to use pairs of plaintext, we wish to have a function to relate the output of two different inputs to the Expansion E function. This formula is given by:

E(X) .xor. E(X*) = E(X .xor. X*).

In this and later functions, if X is some plaintext or input, then X* is some other plaintext or input. We will later denote X XOR X* as the difference between X and X* or the input XOR (or plaintext XOR). Also, the subkey is XORed with the output of the E function. We wish to have some function to relate two different inputs to the XOR function that are being XORed with the plaintext. This formula is given by:

(X .xor. K) .xor. (X* .xor K) = X .xor. X*.

Further, the F box also has a P permutation. The formula which relates two inputs into the P permutation is given as:

P(X) .xor. P(X*) = P(X .xor. X*).

Finally, a function to relate the XOR function that connects the different rounds is needed. This function is:

(X .xor Y) .xor. (X* .xor. Y*) = (X .xor. X*) .xor. (Y .xor. Y*).

Y and Y* are some pair of inputs just like X and X*, but for the other input coming into the XOR function. The reason for these formulas is so that we may focus on the most important and most controversial part of the DES cryptoscheme, the S boxes. The S boxes are made controversial by their seeming arbitrariness in how they were chosen. However, it is the very choice of the S boxes that provides the security to DES. As the above formulas show, the rest of the DES function is not overly hard to analyze, it merely comes down to several simple XOR functions.

Thus any cryptanalysis of DES must quickly come down to attempting to analyze the S boxes. As was indicated above, the S boxes are basically bit lookup tables. The vector of length 6 determines some output that is a number from 0 to 15 (which can be expressed as a vector of length 4).

Differential cryptanalysis begins by making what is called a difference distribution table for each S box. What this table is, is a chart of input XORs and output XORs, and how many possible pairs exist with that status. This idea is very important for cryptanalysis, so some time will be spent here in explaining it. An input XOR is the result of the XOR of two inputs into the S box (known). So if we take two known plaintexts, say A and A*, their difference is defined to be A .xor. A*. Similarly, if we examine a pair of inputs into an S box, we define the input XOR to be the difference between those two inputs. Similarly, if we examine two outputs of the S box, we define the output XOR to be the XOR of the two outputs of that S box. Because of the design of DES, we can determine given some input XOR how many possible pairs could produce a given output XOR. As an example, given that the input XOR is 0 (the two inputs are identical), a difference distribution table of S1 would show that there are 64 possible output XORs for that input XOR and an output XOR of 0. Recall that since the input to an S box is 6 bits, there are 64 possible inputs, thus 64 x 64 possible input pairs. However, going back to our example, we see that if the input XOR and the output XOR is 0 (the inputs are identical and the outputs are identical) there are 64 possible such pairs. This result should make sense since in this case, we know that A = A* (they have no difference), so it makes sense that there are 64 possible such pairs (one for each possible input to S1). Similarly, we should note that if the input XOR is 0 (the two inputs are identical), all other output XORs have no possible pairs. (The outputs must be identical in that case.)

If we denote some input XOR as X, and some output XOR as Y, we say that X may cause Y by the S box if there is some pair of inputs which the input XOR = X and the output XOR = Y for that particular S box.(18) If however, there is no such pair inputs to the S box which the input XOR = X and the output XOR = Y, we say that X may not cause Y by the S box. These definitions allow us to talk easier about entries in the difference distribution table.

In the table, there are 16 columns, for each possible output XOR. The 64 rows are each possible input XOR. The entries in the table are the number of input pairs with the corresponding input XOR and output XOR. Thus if X may cause Y by the S box, then in the X,Y location of the table, you should find some number greater than 0. If however, X may not cause Y by the S box, you should find 0 in the X,Y location of the difference distribution table of the S box.

Finally in dealing with difference distribution tables, we say that X may cause Y with probability p by an S box "if for a fraction p of the pairs in which the input XOR of the S box equals X, the output XOR equals Y." (18) Thus, in our previous example where the input XOR is 0 and the output XOR is 0, X may cause Y with probability 1 by S1. This is because all entries for the input XOR of 0 are under the output XOR of 0. However, for an input XOR of 1 (the first entry in the vector is different, all others are the same), there is a probability of 10/64 that the output XOR will be 9. (That the outputs will be different in the first and fourth entries of the vector will be different, all others will be the same.) One should note that for any given row in the table, there are a total of 64 possible pairs, thus the sum of all entries in a row will be 64 for any given row of any given S box.

The reason that the difference distribution tables for the S boxes is important is that it allows us to find "possible input and output values of pairs given their input and output XORs." (18) Thus by only knowing the difference between two inputs and the difference in their outputs, we can help determine what those two pairs are. The table is most useful if the entry for some input XOR and output XOR is 2, since the 2 pairs are duals (the order simply being reversed, say X, X* and X*, X).

Since the S boxes can now be analyzed through a pair of plaintexts, it makes sense to expand this to the entire F function. In fact, Biham and Shamir defined the following: "X may cause Y with probability p by the F function if for a fraction p of all the possible input pairs encrypted by all the possible subkey values in which the input XOR of the F function equals X, the output XOR equals Y."(21) Again, remember that X and Y are just the result of the XOR of two inputs, read as a hexadecimal number for shorthand, but are really just a vector of bits. Also, the subkey is some other vector of bits. This definition is quite similar to the previous definition, only expanded out for the entire F function instead of just being applied to one of the S boxes.

This now gives us a tool to determine some of the bits of the subkey used for the F function. As certain entries in the table correspond to only a pair and its dual of inputs, we could determine some of the entries to the subkey used in that F function. Once we know a bit of the subkey, we can determine a bit of the entire 56 bit key by simply going backwards in the key selection algorithm. It is because of this ability to determine bits of the key used that differential cryptanalysis is a powerful technique, especially for a small number of rounds.

There is an important lemma that shows just how important the S boxes are to the security of DES. While one can use the difference distribution tables to determine the probability p that X may cause Y by an S box, one needs some way to determine the probability p that X may cause Y by the F function. This lemma is:

In DES, if X may cause Y with probability p by the F function then every fixed input pair Z, Z* with

X = Z .xor. Z*

causes the F function output XOR to be Y by the same fraction p of the possible subkey values. (21)

In other words, the probability p that X may cause Y in by the F function is the product of the p's for each S box. So given the p's for each S box which can be looked up on the difference distribution table, one can compute the probability that a given input XOR may produce a given output XOR under the F function, and even use this knowledge to determine bits of the key.

This startling result of finding the key is done by a several step method. The first step is to choose an appropriate plaintext XOR. Or, in other words, choose what the difference of two plaintexts will be, without necessarily deciding upon the two plaintexts. From the difference chosen, create an appropriate number of plaintext pairs with the chosen plaintext XOR. Encrypt the plaintexts, and keep the ciphertext pairs. For each of the ciphertext pairs kept, compute the expected output XOR of as many S boxes in the last round of DES as possible from the plaintext XOR. Remember that the input pair of the last round is known because it is part of the ciphertext pair. For each possible key value, count the number of pairs that result in the expected output XOR using this key value in the last round. The correct key value is the key value suggested by all the pairs. (21-22)

As one can probably tell, the amount of information one obtains is dependent on how many rounds are used for DES. For a many round version of DES, less information can be obtained, making analysis more difficult. For reduced round variations of DES, Biham and Shamir claim that their method can break DES in under two minutes for 8 round variations, and far faster for fewer rounds.(7) From these results, it becomes worthwhile to study their method, and the cryptographer is once again reminded that one cannot rely on a scheme to remain secure.

Posted 28 August 1996
Formatting updates 2004 February 15