Bruce Schneier

 
 

Schneier on Security

A weblog covering security and security technology.

« Surveillance Cameras in U.S. Cities | Main | Fearmongering About Bot Networks »

May 17, 2005

AES Timing Attack

Nice timing attack against AES.

For those of you who don't know, timing attacks are an example of side-channel cryptanalysis: cryptanalysis using additional information about the inner workings of the cryptographic algorithm. I wrote about them here.

What's the big idea here?

There are two ways to look at a cryptographic primitive: block cipher, digital signature function, whatever. The first is as a chunk of math. The second is a physical (or software) implementation of that math.

Traditionally, cryptanalysis has been directed solely against the math. Differential and linear cryptanalysis are good examples of this: high-powered mathematical tools that can be used to break different block ciphers.

On the other hand, timing attacks, power analysis, and fault analysis all makes assumptions about implementation, and uses additional information garnered from attacking those implementations. Failure analysis assumes a one-bit feedback from the implementation -- was the message successfully decrypted -- in order to break the underlying cryptographic primitive. Timing attacks assumes that an attacker knows how long a particular encryption operation takes.

Posted on May 17, 2005 at 10:05 AM

Trackback Pings

TrackBack URL for this entry:
http://www.schneier.com/cgi-bin/mt/mt-tb.cgi/236

Listed below are links to weblogs that reference AES Timing Attack:

» AES Timing Attack from The Lazy Genius
There was recent paper released on using a timing attck on the AES implementation.  This is not a bug in the AES algorithm, but in ... [Read More]

Tracked on May 17, 2005 01:16 PM

» DJB's AES attack from *scottstuff*
Bruce Schneier post posted a link to an AES timing attack from Dan Bernstein. This is similar to last week’s hyperthreading vulnerability, in that it exploits cache timing variances to extract sensitive information, but DJB’s attack is demo... [Read More]

Tracked on May 17, 2005 03:17 PM

» Crypto: AES and those whacky side-channel attacks from optionsScalper
It would appear that there has been a successful side-channel attack on AES. For the uninitiated, let's... [Read More]

Tracked on May 17, 2005 11:27 PM

» AES timing attack published from eyequeue
Well, it looks like AES is subject to a side-channel timing attack.

Daniel J. Bernstein (yes, that DJB) has published ... [Read More]

Tracked on May 18, 2005 04:38 AM

» crypto timing attacks from Michael Silk's Blog
Great paper: http://cr.yp.to/antiforgery/cachetiming-20050414.pdf. Targetted at the AES (Rijndael) it provides code for an actual attack that reveals the key based only through interactions with the server. [Read More]

Tracked on May 19, 2005 12:35 AM

» Cache-timing attacks on AES. from Oleg Zabluda's blog
Daniel J. Bernstein published a profound article article on AES Timing attack....it is extremely difficult to write constant-time high-speed AES software for common general-purpose CPUs. The underlying problem is that it is extremely difficult to loa... [Read More]

Tracked on May 19, 2005 08:32 PM

» AES Timing Attack from AES Timing Attack
AES Timing Attack [Read More]

Tracked on June 13, 2005 01:14 AM

» AES cache-timing attack from Whitehouse.org.nz
A paper by Daniel J. Bernstein has shown (from my very quick glance) how popular software implementations of AES are vulnerable to attack. It seems that he blames AES for making it so hard to design an implementation which doesn't have this flaw. The pa [Read More]

Tracked on June 16, 2005 05:48 PM

» AES Timing Attack from Monika Eghin
AES Timing Attack [Read More]

Tracked on July 20, 2005 06:46 PM

» AES Timing Attack from Becker Alice
AES Timing Attack [Read More]

Tracked on July 20, 2005 06:46 PM

Comments

Doesn't look complicated to me.....
I wonder why wasn't it discovered before, say during the AES process?

Posted by: Anonymous at May 17, 2005 10:50 AM


The author suggests that the algorithm is at fault for making it difficult for implementors to make fast-but-constant-time implementations, then they list a few algorithms that don't have this limitation. They mention Helix, but not Blowfish, so how good/bad is Blowfish in this area? (Blowfish was Mr. Schneier's entry into the AES competition wasn't it?)

Posted by: Andrew at May 17, 2005 10:55 AM


Bruce Schneier's AES candidate was Twofish, not Blowfish; Blowfish was an earlier Schneier cipher.

Posted by: g at May 17, 2005 11:06 AM


One of the author's opening statements is "This attack should be blamed on the AES design, not on the particular AES library used by the server;..." This seems like a somewhat harsh statement: is AES a poor design because it is susceptible to this attack? As a previous poster asked, how do other ciphers such as the AES finalists (Twofish, Serpent, MARS, RC6) fare in this area?

In addition, the author makes comparisons between AES and Helix. Since Helix is a stream cipher, is this an 'apples & oranges' situation? The original Helix paper remarked that AES vs. Helix is a somewhat unfair comparison.

Any thoughts?

Posted by: Simmoril at May 17, 2005 11:08 AM


Twofish uses S-boxes; higher-performance implementations use larger S-boxes (which I suspect, though I haven't thought about it hard) are more vulnerable to the sort of attacks Bernstein describes. I don't think Twofish would have done any better than Rijndael against attacks like this.

Posted by: g at May 17, 2005 11:12 AM


The paper does mention that all 15 AES candidates could be vulnerable to this attack.

Maybe we need to have crypto co-processors as standard (as floating point co-pros are, though now built into the CPU). As part of the CPU would be ideal. Give the crypto co-pro dedicated cache, and let the hardware designers learn from crypto co-processors in the smart card world.

Crypto co-processors in the smart card world work in a potentially very hostile environment where the attacker can control power, timing signal ...

Posted by: hamish at May 17, 2005 11:16 AM


Can anyone explain why a table lookup T[i] leaks information about the index i ?! Isn't it constant? Ok, it may not be constant for values fetched from the main memory, but in that case the time should be one of either quick (cache) or slow (memory) ?!

Posted by: Andy at May 17, 2005 11:33 AM


And hasn't it been found that many such coprocessors *are* in fact vulnerable to timing and power attacks? :-) Perhaps I'm out of date and this is no longer true.

Posted by: g at May 17, 2005 11:34 AM


Andy: Because T[i]'s quick or slow leaks information about the cache state, which is based on the history of previous accesses.


MARS is a nightmare and should not have made it into the final round for implementation reasons. And it would possibly be vulnerable because of the large S-box.

Serpent is probably vulnerable because a fast software imprementation ends up using several significant S-boxes.

Twofish is probably vulnerable, because it uses large, key dependant S-boxes.

RC6 is probably NOT vulnerable: Most multipliers are constant-time in current CPUs, and shift as well. (although << 0 might be shortcutted on some arch, eg P4 has a 4 cycle shift operation, so shortucting ROT/SHIFT 0 would be a performance win).

Posted by: Nicholas weaver at May 17, 2005 11:58 AM


Timing attacks are a just one of the TEMPEST tricks that have been known about for many years....

The first Timing attack that I know of was against a UK One Time Pad machine used to superencrypt trafic to the UK. The US where snooping on the line, and could by examining the pulse widths tell which way the relays in the unit used for the XOR function where going (open or closed)...

All physical realisations of encryption systems are subject to TEMPEST attacks and they are extreamly difficult to gaurd against. The reason for this is that it is an Energy/Bandwidth trade off. To ensure no inadvertant information escapes the communications bandwidth has to be so low that it is usually not practical for all but the most secure systems.

Oh one thing to note, TEMPEST is two way, ie even if you prevent all harmfull emmisions, you also have to protect against energy being sent into the unit which then can carry secure information out. Oh and by energy it is not just limited to electrical/Radio Frequency, but mechanical (sound/vibration) etc (I guess even gravity as well ;)

However from what has been said in the paper, this particular attack could have been easily predicted and dealt with in the first round of the AES selection process.

Posted by: Clive Robinson at May 17, 2005 11:58 AM


What are the implications of this, or do we know yet? How bad is it? It looks pretty bad, to my untrained eye.

Am I right in thinking it's not as big a deal in public key cryptography, because all you'd recover (assuming you could find a session that ran long enough) was the session key?

Posted by: Daedala at May 17, 2005 12:02 PM


Wow, what a stupid thing to say.... :) I'm better now.

Posted by: Daedala at May 17, 2005 01:05 PM


The author’s whole attack hinges on the fact that fetch time is affected by the caching done from prior memory fetches.

The caching is what induces the time variation for fetches. Or am I missing something here? It is the compilation steps (from C or assembler code) and more importantly the optimization algorithms (in the CPU or in the compiler) which create software implementations in native machine code where table look-ups become time-variable.

If caching and optimizations can be disabled (for the table-lookup) and the addressing calculations can be forced into a uniform base (e.g. all address calculation done in 32-bits or 64-bits), then the table-lookup performance would be slower, but time-constant.

The slower, but time-constant table look-up would still be orders of magnitude faster than performing the GF(2^8) mathematics ab initio for each use of an S-Box. The time-constant lookups would thwart this attack.

Would this implementation change (essentially software de-optimization) work?

Posted by: John Washburn at May 17, 2005 02:31 PM


If you have questions on this, please read the referenced document; it is very well written. Specifically, most of the questions above are answered in it.

This paper appears to describe not merely an attack on AES, but on a large group of alogrithms. This paper is not primarily written about the fact that timing attacks are an issue, but how to solve the problems.

Posted by: Zimbel42 at May 17, 2005 02:37 PM


This once again shows how hard getting security right actually is. This paper is basicly saying that all the experts who evaluated those NIST AES candidates missed this one and as the result complexity and performance problems have to be added to AES implementations to get around it (however, I don't agree that the CPU would the right place for any fixes and I think we generally have enough bloat in kernels already).

Posted by: Ari Heikkinen at May 17, 2005 02:42 PM


I believe that disabling the L1 cache, as Josh Washburn suggests, can only be done by the kernel, as it requires accessing the CPU configuration registers. (My reading is that the cache is the primary effect, not software table lookup optimizations.)

Taking an 11-cycle hit to the L2 cache, or an even larger hit to main memory, would start to make calculating the S-box look attractive, particularly if the values can be calculated in parallel.

Posted by: Mark Gritter at May 17, 2005 03:13 PM


Actually, disabling L1 or even L2 might not be enough, because DRAM paging might also reveal information...

I don't think its a crisis: you can actually use the performance counters to your advantage: Forget all the fancy bits about avoiding issues, just use the performance counter to timecheck the encryption cycle to ensure that it always takes the same "time", and just buisywait if you finish early. True, it makes it worst case, but you colud assume most-caching and most-hit, so you cut off at say, 4 sigma above average, and if it is >4 sigma above average, oh well, thats a minor leak which gets averaged out over encrypting a larger block of data anwyay.

Posted by: Nicholas Weaver at May 17, 2005 04:22 PM


Ari, what would you suggest? According to this paper, due to the problem of interrupts, you need place the alogrithm in the kernel to keep constant time lookups (ignoring the interrupt time itself), or you need to modify the CPU (see section 13). Are you suggesting, for example, using only CPUs that have constant lookup times regardless of the address? This seems less generally useful than putting the alogrithm in the kernel to me.

Posted by: Zimbel42 at May 17, 2005 04:43 PM


"This once again shows how hard getting security right actually is. This paper is basicly saying that all the experts who evaluated those NIST AES candidates missed this one and as the result complexity and performance problems have to be added to AES implementations to get around it (however, I don't agree that the CPU would the right place for any fixes and I think we generally have enough bloat in kernels already)."

Almost. We didn't miss side-channel cryptanalysis when we looked at AES candidates. We talked about them explicitly in our Twofish design, for example.

The problem is that side-channel attacks are practical against pretty much anything, so it didn't really enter into consideration.

Posted by: Bruce Schneier at May 17, 2005 07:01 PM


djb never ceases to amaze me

Posted by: Anonymous at May 18, 2005 02:00 AM


Having had a little time to think about the implications (and read the paper a second time) The answer is an unpalatable,

"You cannot use a general purpose shared CPU for OnLine crypto activities"

If you read the paper in section 13 DJB mentions similar work caried out by Osvik and Tromer with regard to hyperthreading. The question is what other technologies could leak information about the plaintext or keytext via dependent timing.

If you think back to your old computing course you where taught about a couple of basic CPU architectures (that originated in the 1950s).

Modern CPUs bare little resemblance to these architectures now except in their very core. The need for speed and the limitations of physics have ment that all sorts of technologies have been developed (caching, hyperthreding, out of sequence execution, TTA...) to meet this need. I suspect that all of these technologies can be exploited in one way or another to perform timing attacks (either activly or pasivly).

I guess if you want to do OnLine crypto you realy should use an "atomic" process such as a crypto processor.

An atomic process might be possible with DJBs "constant time" implementations providing they can lock the CPU but I suspect that there will always be an un accounted for technology getting in the way...

So the choice boils down to,

1, Use a Crypto processor
2, Do your crypto off line.

Oh and by OffLine I mean with no other users process running on the machine that kind of takes you back to option 1 any way...

Posted by: Clive Robinson at May 18, 2005 04:40 AM


Oh another thought,

If you remember back a little while Bruce bloged about work that enabled a laptop or PC to be tracked by the CPU clock drift.

This relied on the little known TCP timing information used for flow control.

DJBs example code used a time stamp function to make things easier to demonstrate, in real life you would not normally get a time stamp of that from the high level code (which might account for the comment from the Rijndael designers the DJB quoted).

However it may be possible to combine the information from the TCP time stamp to get the same effect...

Anyone want to try it just for fun of course ;)

Posted by: Clive Robinson at May 18, 2005 04:48 AM


The consensus seems to be that a general purpose CPU should not be used.

If that is the case, an AES coprocessors (such as http://www.altera.com/products/ip/dsp/encryption_decryption/m-cas-aes.html) need to be tested to see such DSP implimentations have the same vulnerability to this timing attack.

Posted by: John Washburn at May 18, 2005 09:26 AM


Aloha!

(Jumping into the fray here...)

The way I read the paper is that DJB tells us that:

(1) Fast implementations are hard to make constant-time. This is due to the fact that implementing the S-boxes in AES is efficient using a table lookup.

(2) Avoiding memory latency side-effects on a modern processor with multi-level memory systems, SDRAMs with pages etc are very hard.

This means that to even try to get to (1) we will have to sacrifice performance, and due to (2) we might not succeed anyway.

One thought I had is to do basically the reverse of what DJB does. That is just prior to performing a cipher operation (for example - something using table-lookups) we record the timestamp. After the operation completes we check the time again and keep track of the max time recorded. Operations that completes faster than the max time are delayed before returning to the user/caller with the result.

Yes, this will also eat performance, but it would at least mitigate the problem. We should be able to measure and do this time adjustment on a similar resolution as DJBs attack.

Another thing: I first thought that having the contents of the S-boxes being constant (as they are in AES and DES/3DES) was an important factor, but it seems that the index is the important factor. Blowfish and Twofish have been mentioned, but how about poor RC4. Here the S-box is permuted based on the key, but we still have a 256 Bye array. Also RC4 basically consists of 3 mem accesses so one could at least get the idea that the timing attack would be easier to observe. Mabye...

Posted by: Joachim at May 18, 2005 03:53 PM


@Joachim

The problem with your scheam (as described) is where do you get your maximum time from, it's keytext and plaintext dependent.

Therefore you would have to know in advance which combination of keytext and plaintext would produce the longest time for each CPU and hard code it into you software.

The problem is that it is not just memory/cache hits that cause problems but hyperthreading as well on the Pentium CPU.

The real problem finaly becomes,

1, I write my software today and it causes no probs on any CPU.

2, XXXY bings out a new wizbang CPU that has some new gofaster technology in it.

What do I do with my now legacy software and the customers who are probably not going to upgrade just because I tell them to (even if the download is free).

Posted by: Clive Robinson at May 19, 2005 03:05 AM


Re: Clive.

No, what I did try to suggest was a scheme where the cipher SW continously monitors the time it takes for itself to perform the operation. (This means storage of the time).

The first MAX value is the time it takes to perform the first operation (with key k1 and data d1).

If the next operation (for example with key k2 and data d2) takes shorter time to complete than MAX1, the operation does not return to the user until MAX time have expired. If the operation takes longer time than MAX1, we now have a new MAX time, MAX2.

This implies that after n operations, the MAX time to complete an operation would be the worst case possible for the operation on the given HW-platform.

Posted by: Joachim at May 19, 2005 04:17 AM


@Joachim

Unfortunatly you would have to do this for every key change (or keep the last value).

The problem is that while the system is "training" you are vulnerable to info leak.

It would be better to start with an artificialy high value and work down towards some point above the longest time the software takes.

Either way has problems (ie possible leak or excessive time).

Posted by: Clive Robinson at May 19, 2005 11:37 AM


i wonder ... would some sort of key-modifying protocol help? such that after each exchange the key is adjusted (on both sides) ... ?

Posted by: ms at May 19, 2005 07:43 PM


@Clive

Yes, keeping the value between key (and data) changes was what I had i mind. And you should probably have a good (i.e. conservative) guesstimate for the initial value of MAX too.

What we are talking about is a control loop for the MAX value. If it turns out that all values are lower than MAX you may reduce the limit, but any value above pushes MAX up.

To conclude: I see this as a way to mask execution time variance (which the side-effect utilizes). But this method will have an impact on the performance.

And the way I read DJBs paper: Unless you have some sort of HW-support, S-box/table based cipher primitives, any high speed/high performance implementation in SW running on a modern CPU will be hard pressed not to have variation in execution time. Agreed?

And all methods of fixing the current implementations will have some impact on the performande. Agreed?

Now, the question is: How big a problem is this in real life? Even though DJB built a lab setup with server and performed the attack remote, what is the likelihood of doing this on a real server? Or for that matter as a normal user on a server?

Not that DJBs paper is extremo-kewl and impressivs. And he rightly points out that an algorithm, however sound it seems in the paper needs to be implemented in a way that doesn't make it weak in terms of security. And next-gen cipher primitives (including hash functions) should be evaluated also in terms of resistance to side-channel attacks such as the one described in DJBs paper.

Posted by: Joachim at May 20, 2005 12:43 PM


So... should we be worried or not? From an admitted novice, the paper looks interesting, but doesn't really seem like it presents a very practical remote attack. Even if it were, given the large number of samples needed to recover a single key, wouldn't it be easy to defeat by using some form of periodic key change?

Posted by: Cryponot at May 25, 2005 06:34 AM


Schneier says "The problem is that side-channel attacks are practical against pretty much anything, so it didn't really enter into consideration".

Surely this is precisely the reason why they SHOULD be considered and eliminated as a possible cause of weakness?

Posted by: david Cornwell at June 17, 2005 12:40 PM


bad ink for the paragraph

» crypto timing attacks from Michael Silk's Blog
Great paper: http://cr.yp.to/antiforgery/cachetiming-20050414.pdf. Targetted at the AES (Rijndael) it provides code for an actual attack that reveals the key based only through interactions with the server. [Read More]

Posted by: rich painter at June 24, 2005 11:38 AM


Here you can leave your mark

Posted by: tracey at July 5, 2006 07:36 PM


Why not simply add a small amount of random time instead of trying to find the max? Yes, it also has an impact on performance (hopefully less than the max), but would it not make timing based attacks quite a bit more difficult?

Posted by: random at August 29, 2006 10:23 AM


hi send me AES matlab code and its advantages and disadvantage.......
also send its features and detail about implementation of AES in hardware.

Posted by: rexlin at September 1, 2006 12:58 AM


Post a comment



Real names aren't required, but please give us something to call you. Conversations among several people called "Anonymous" get too confusing.



E-mail is optional and will not be displayed on the site.


Remember Me?


Powered by Movable Type 3.2. Photo at top by Steve Woit.

Schneier.com is a personal website. Opinions expressed are not necessarily those of Counterpane Internet Security, Inc.

 
Bruce Schneier