HASH COLLISION Q&A
Cryptography Research has received many inquiries about the hash collision attacks that were recently announced at the CRYPTO 2004 conference. This document attempts to address these questions.
(This document was updated on February 16, 2005 to reflect new collision results reported against the SHA-1 algorithm.)
Q: What hash functions are now broken?
A: Collisions were announced in SHA-0, MD4, MD5, HAVAL-128, and RIPEMD. Antoine Joux presented the collision in SHA-0. The collisions against MD4, MD5, HAVAL-128, and RIPEMD were found by the Chinese researcher Xiaoyun Wang with co-authors Dengguo Feng, Xuejia Lai, and Hongbo Yu. (See http://eprint.iacr.org/2004/199.pdf.) In February 2005, an (as-yet unimplemented) attack against SHA-1 was reported by Xiaoyun Wang, Lisa Yiqun Yin, and Hongbo Yu that can find collisions in SHA-1 with an estimated effort of 2^69 hash computations.
Q: What is a collision attack and a preimage attack?
A: A preimage attack would enable someone to find an input message that causes a hash function to produce a particular output. In contrast, a collision attack finds two messages with the same hash, but the attacker can't pick what the hash will be. The attacks announced at CRYPTO 2004 are collision attacks, not preimage attacks.
Q: What is the connection between digital signatures and hash functions?
A: All major digital signature signing techniques (including DSA and RSA) involve first hashing the data then signing the hash. Raw message data is not signed because of both performance and security reasons.
Q: How might an attacker exploit a collision attack?
A: To exploit a collision attack, an adversary would typically begin by constructing two messages with the same hash where one message appears legitimate or innocuous. For example, suppose the attacker (Charlie) discovers that the message "I, Bob, agree to pay Charlie $ 5000.00 on 4/12/2005." has the same hash as "I, Bob, agree to pay Charlie $18542841.54 on 9/27/2012." Charlie could then try to get Bob (the victim) to digitally sign the first message (e.g., by purchasing $5000 of goods). Charlie would then claim that Bob actually signed the second message, and "prove" this assertion by showing that Bob's signature matches the second message.
Q: What are the implications of collision attacks for code signing systems?
A: Collisions can be a problem for systems that involve signed code. In particular, a collision attack can enable adversaries to construct an innocuous program and a malicious program with the same hash. For example, a trusted compiler/verifier might accept and sign the innocuous program, which could then be substituted for the malicious one. Collision attacks do not allow tampering with arbitrary programs; this would require a preimage attack. (Note: Java accepts MD5 hashes in signatures on JAR files, e.g. see http://www.hmug.org/man/1/jarsigner.html.)
Q: What are the implications for certificate authorities, such as those issuing SSL web server certificates containing MD5 or SHA-1 hashes?
A: Collision attacks do not enable tampering with existing certificates. There is, however, a concern that an adversary might be able to construct a valid certificate request that had a corresponding hash collision with a certificate conferring greater or different powers. For example, a devastating attack would be one that enabled adversaries to obtain a legitimate server certificate with a collision to one containing a wildcard for the domain name and an expiration date far in the future. The use of unpredictable serial numbers early in the certificate data structure may prevent such attacks, but further research is required. From a cryptographic perspective, the best solution to this problem is to transition away from MD5, but this is difficult since many CAs and software programs currently support MD5.
Q: Are all hash functions broken?
A: No. The new attacks affect specific hash functions which happen to share a related class of vulnerabilities. In particular, these attacks are all based on the neutral bit technique of Biham and Chen (see http://eprint.iacr.org/2004/146.ps). There is no evidence suggesting that strong hash functions cannot be constructed.
Q: How hard would it be to find collisions in SHA-1?
A: The reported attacks require an estimated work factor of 2^69 (approximately 590 billion billion) hash computations. While this is well beyond what is currently feasible using a normal computer, this is potentially feasible for attackers who have specialized hardware. For example, with 10,000 custom ASICs that can each perform 2 billion hash operations per second, the attack would take about one year. Computing improvements predicted by Moore 's Law will make the attack more practical over time, e.g. making it possible for a wide-spread Internet virus to use compromised computers to mount such attacks as well. Once a collision has been found, additional collisions can be found trivially by concatenating data to the matching messages.
Q: Do these attacks break HMAC using MD5 or SHA-1?
A: No. Because of the way hash functions are used in the HMAC construction, the techniques used in these recent attacks do not apply.
Q: Do these attacks allow somebody to break tools that use MD5 or SHA-1 to check for malicious binaries?
A: Not usually, as this would require a preimage attack. It would, however, be possible for someone to construct an innocuous program and a malicious program with the same hash. If this adversary could get the innocuous version on the "good" list (e.g. by having a trusted authority sign the hash value), the malicious program would also be accepted.
Q: What is the difference between SHA-0 and SHA-1? Is SHA-0 widely used?
A: SHA-0 was initially proposed in FIPS 180 (May 1993) as hashing standard by the U.S. government, but was replaced by SHA-1 in FIPS 180-1 (April 1995). SHA-1 adds an additional circular shift operation that appears to have been specifically intended to address the weaknesses found in SHA-0. SHA-0 is not widely used and should not be used in new systems. In light of the new attacks, careful consideration should be made before using SHA-1 in new systems.
Q: Is SSL 3.0/TLS affected by these results?
A: The SSL 3.0 protocol (which was co-authored by Cryptography Research President & Chief Scientist Paul Kocher) uses MD5 and SHA-1 in a redundant fashion in the handshake protocol and also supports MD5 HMAC. Neither use is affected by these attacks. While there is also some concern that signing authorities could be affected (see the question above on certificate authorities), certificate formats and procedures are beyond the scope of the SSL/TLS protocol.
Q: Can the problem be solved by updating hash function implementations to detect the messages that produce collisions?
A: No. The attack methods are general and enable the construction of additional collisions.