BETA


About
OFAQ
Statistics
LOG IN
MagicButton
Download

WebBoard
MailingList
ChatRoom

Thanks to:
SourceForge provided us with Mailing Lists, Web Forums and Compile Farm

Attacking MD5

This project isn't live yet. We are still developing.

News
  • 31Jan2004 - Client version 0.8 released.
  • 01Feb2004 - Database re-construction not needed. Work-around found - yay!
  • 01Feb2004 - New web website coming shortly - Thanks to Kieran Jones.
  • 01Feb2004 - Looking for translators for the other U.N. languages: Spanish, Cantonese, Russian, and Arabic. Jean-Luc will do French. Other languages are lower priority, but still welcome.
  • About the Applet
    • If you were brought to this site by clicking on the MD5CRK button, the following information may be of interest to you.
      The button is in fact a micro-application commonly called a "Java applet". During your visit to any website carrying this button, your computer is part of world wide effort to demonstrate the weakness in a commonly used security technology called MD5.

      While you are visiting, the button will only operate if there is absolutely nothing else to be done on your computer (like playing music, movies, browse the internet, etc). Once you leave a website with the button, the button goes away and will not bother you.

      It is not a virus, worm, or anything of the kind. It is more of an advertisement for this project.

      Should you however want to disable this button - or enable it again - to so below.

      Disable MD5CRK
      Enable MD5CRK

    FAQ: Frequently Asked Questions about MD5CRK

    1. What is MD5?
    2. What is the MD5CRK project about?
    3. How will MD5CRK show MD5 is insecure?
    4. Why do people still use MD5?
    5. How does MD5CRK compare with SETI@Home or Distributed.net?
    6. How long before the project is a success?
    7. How do I help out?
    8. What will be done with the collected data?
    9. About the MD5 client
      • Intel Cores
      • PowerPC Cores
      • Other Cores

    What is MD5? [ top ]

    MD5 is an algorithm commonly used to verify the integrity of data on the internet. MD5 is a message digest or hash algorithm used to produce a fingerprint of a given piece of data. This fingerprint is often used to verify the integrity of programs, drivers, documents, passwords and digital signatures. It is vital to use a secure message digest when deploying such applications.

    Data MD5 Digest or Fingerprint
    "abc" 900150983cd24fb0d6963f7d28e17f72
    "My Password" 14ddb8585ddfc6c4670b9c18aed1fe8b
    "aaa ... aaa"
    (1,000,000 a's)
    7707d6ae4e027c70eea2a935c2296f21
    Table 1: Examples of MD5 Digests

    What is the MD5CRK project about? [ top ]

    The aim of MD5CRK is to raise awareness in the IT industry that MD5 is not secure for many applications. We aim to disprove one of the fundamental requirements of a secure message digest: No two inputs can be found which produce the same digest - this is also known as a collision.

    In cryptography, there can be no doubt in an algorithm's security. Demonstrating a counter-example to a required property in an algorithm effectivly renders the algorithm insecure.

    How will MD5CRK show MD5 is insecure? [ top ]

    This project does not directly attack any file, program or password program - that would be irresponsible. We intend to find at least one collision in the full MD5 function. In 1994, Paul van Oorschot wrote a paper describing how to build a computer to find two similar looking documents (or vastly different for that matter) with the same MD5 digest in a matter of days for $10,000,000 USD. With today's technology this computer would cost only $100,000 USD.

    The problem of attacking MD5 is no longer a theoretical matter - it is a business proposition.

    If someone were to invest $100,000 USD to build this machine, they could forge digital signatures, circumvent password systems and tamper with sensitive documents without detection.

    To raise awareness we will find at least two strings of printable text that produce an identical MD5 hash. After each MD5 transform, the 128 bit (16 byte) hash is translated into a 32 character string. This 32 letter string becomes the input to the next MD5 transform and so on. To map the 16 byte digest into a 32 letter string we use the 16 most common letters used in the English language as our radix alphabet in the likely chance we get a true English word out of this process.

    Why do people still use MD5? [ top ]

    MD5 is used primarily out of habit, and like all bad habits they are hard to break. The IT industry must move to better algorithms for secure applications.

    There is no reason to be using MD5 anymore. There are many drop-in replacements to MD5 which are more secure. These more secure alternatives (RIPEMD-160, Tiger, SHA-256, SHA-384, SHA-512) were created to match the security level of the encryption algorithms commonly used in security applications. Some may argue MD5 is faster then these more secure substitutes, yet there is a still faster digest algorithm known as MD4. The predecessor of MD5 - MD4 - also produces 128 bit digests but was retired because a collision was found analytically. [Ref] Clearly, MD5 is no longer needed.

    Table 2 illustrates recommended lifetimes for various algorithms. Note that industry should have abandoned MD5 before 1995, SHA-1 can be used to protect data no further than 2025, and SHA-256 will be safe for use for well into the second half of this century. All these algorithms are available without cost, royalty or copyright on the Internet.

    Digest  1995  2000  2005  2010  2015  2020  2025  2030  2035  2040+
    MD5 
     
     
    SHA‑1 
     
     
    SHA‑256 
     
     
    Table 2: Message digest algorithm security lifetimes.

    References:

  • Selecting Cryptographic Key Sizes by Arjen K. Lenstra, Eric R. Verheul - View as HTML using Google

    Everyone can replace their use of md5sum with sha256sum found at http://www.certainkey.com/resources/hashsum.php.

    How does MD5CRK compare with SETI@Home or Distributed.net? [ top ]

    Attacking MD5 is unlike other distributed computing projects. SETI@Home assigns work units that take hours to process. MD5CRK work units are expected to take anywhere from seconds to weeks - but the average expected time is 4.3 billion MD5 computations (8 minutes on a 2.0 GHz Pentium 4 computer).

    The distributed.net RC5-72 project is best described as looking for a needle in a haystack. MD5CRK is best described as looking for two identical snowflakes. While the RC5-72 project only has one winning answer, MD5CRK possibly has billions. It is also important to note that the probability of collision given n amount of effort is an en2 curve (Figure 1,2) while the RC5-72 project's probability of success is a n curve (Figure 3,4). That means the more data we collect, the more dangerous we become.

    Distributed.net
    RC5-72
    MD5CRK
    MD5
    Operations Required
    approx.
    2.36 x 1021 keys
    or 271 keys
    3.32 x 1019 digests
    or 1.2 x 264 digests
    Performance
    2.0 GHz Pentium 4
    2.8 MegaKeys/sec 8.6 MegaDigests/sec
    Possible Solutions 1 Likely billions
    Probability of Success n x 2 -72 Approx. 1 - e(n x (1-n) x 2 -129)
    Current Uses No known applications Secure websites
    Intrusion detection systems
    Software integrity (RPM,DEB,etc)
    Password systems
    Many other applications
    Table 3: Quantitative comparison of distributed projects

    Click to Enlarge
    Figure 1: MD5 Digests vs. Probability of Success

    Click to Enlarge
    Figure 2: MD5 Digests vs. Probability of Success (Logarithmic Graph)

    Click to Enlarge
    Figure 3: RC5-72 Keys vs. Probability of Success

    Click to Enlarge
    Figure 4: RC5-72 Keys vs. Probability of Success (Logarithmic Graph)

    How long before the project is a success? [ top ]

    When VeriSign, Tripwire, and other organizations stop using MD5 to protect information, this project will be a success.

    We expect to find our first collision after 25 million work units and our second relatively short time after. The estimated time to first collision is approximately 2 years. The estimated required disk space is approximately 200 Gigabytes.

    How do I help out? [ top ]

    There are several ways to participate in the project.
    The first is to download the client for your computer(s) and install it (more info).
    The second is to add the MD5CRK magic-button to your web pages so all your visitors will contribute to the project for the duration of their visit to your site (more info).

    Both the client and the applet run at the lowest possible program priority, in order not to interfere with the target computer - all other applications take priority.

    All work units submitted by your client or your web sites will be added to your statistics and your team's statistics. To attribute work units to yourself, use the stats tool.

    What will be done with the collected data? [ top ]

    The data from this project will be provided to academia and the public on a cost recovery basis (CD and DVD disks are too small; we will load the database onto a large hard disk and ship it by insured courier).

    No private information such as network, web, or email addresses will be given away.

    Our approach [ top ]

    Our approach, described below, is heavily influenced by the van Oorschot 1994 paper.

    • All discrete functions with finite range and domain will cycle if placed in feedback. This is obvious if you think about it. MD5 only has 128 bits of output so if we feed this output back into the input of another MD5 transform, eventually we will see the same output produced. After all, there are only 2128 possible MD5 outputs.
    • The number of finite discrete random values required before the probability of two values being the same is proportional to the square root of the total number of possible values. First year university students will know this as the Birthday Paradox, a great way to break the ice at a party!

      To understand the significance of this, consider this question: How many people do you need at a party before the odds to two people having the same birthday is better then 50/50?

      The answer is 23. This is significantly less then 182 (365/2) which most people jump to first time they're asked.

      If the number of possible values is n then you require about:

      1.17741 x n1/2
      values before you can expect to see two with the same value.

    MD5 has 2128 (or about 3.4 x 1038) possible values. This means about 2.16 x 1019 outputs are needed before a collision is expected to be found. It would be impossible to transmit or store all this information.

    So we'll use a technique often called Pollard's Rho cycle detecting algorithm (see figure 1). By only storing MD5 messages digests who's most significant 32 bits are zero - call these distinguished points - we have reduced the amount of data to be transmitted and stored by a factor of 4.3 billion.

    Should our server ever detect two identical distinguished points, we will begin searching from the previously submitted distinguished point of each of the colliding points. Nobody has ever found a collision (two input which produce the same output) in the MD5 hash. So finding a collision would be a very significant discovery.


    Figure 5: Pollard's Rho collision search for a single path.

    The central server will hand out unique start points to the world. These starting points will dictate exactly which path the client software should take. The client software will walk along this path noting all distinguished points it finds. Periodically, the client will report these points to our server.

    Unlike other distributed projects, once we have found our answer our database of distinguished points is of great value to researchers and academics. With the hundreds of gigabytes of data we will have gathered, new attacks on MD5 can be performed leveraging the work we have already done.

    About the MD5 client [ top ]

    • Intel cores
      Used by Windows and Linux operating systems.

      We have developed eight (8) core implementations of the MD5 transform on the Intel CPUs. At client start-up time, the best performing one is chosen after a benchmark is taken.

      1. ANSI-C: Single start point path is computed, taken almost verbatim from RFC-1321.
        Written by Jean-Luc Cooke.
      2. ALU: Single start point path is computed, written in optimized assembly.
        Written by Jean-Luc Cooke.
      3. MMX: Two simultaneous start point paths are computed using the MMX units, written in optimized assembly.
        Written by Tom St Denis and Jean-Luc Cooke.
      4. MMX-ALU: Three simultaneous start point paths are computed using the MMX and ALU units, written in optimized assembly.
        Written by Tom St Denis and Jean-Luc Cooke.
      5. MMX-DualALU: Four simultaneous start point paths are computed using the MMX and ALU units, written in optimized assembly.
        Written by Tom St Denis and Jean-Luc Cooke.
      6. SSE2: Four simultaneous start point paths are computed using the SSE2 units, written in optimized assembly.
        Written by Mark Larson of Dell Computer, and Jean-Luc Cooke.
      7. SSE2-DualALU: Six simultaneous start point paths are computed using the SSE2 and ALU units, written in optimized assembly.
        Written by Mark Larson of Dell Computer, and Jean-Luc Cooke.
      8. SSE2-QuadALU: Eight simultaneous start point paths are computed using the SSE2 and ALU units, written in optimized assembly.
        Written by Mark Larson of Dell Computer, and Jean-Luc Cooke.

       
    • PowerPC cores
      Used by Macintosh and Linux operating systems.

      We have developed three core implementations of the MD5 transform on the PowerPC. At client start-up time, the best performing one is chosen after a benchmark is taken.

      1. ANSI-C: Single start point path is computed, taken almost verbatim from RFC-1321.
        Written by Jean-Luc Cooke.
      2. ALU: Single start point path is computed, written in optimized assembly.
        Written by Kevin Springle.
      3. DualALU: Two simultaneous start point paths are computed, written in optimized assembly.
        Written by Kevin Springle.
      4. AltiVec: Four simultaneous start point paths are computed using the AltiVec unit, written in optimized assembly.
        Written by Kevin Springle.
      5. AltiVec-ALU: Five simultaneous start point paths are computed using the AltiVec unit and ALU, written in optimized assembly.
        Written by Kevin Springle.
      6. AltiVec-DualALU: Six simultaneous start point paths are computed using the AltiVec unit and ALU, written in optimized assembly.
        Written by Kevin Springle.

       
    Verify SHA-256 checksum with the sha256sum command line utility found at http://www.certainkey.com/resources/hashsum.php

    Windows 95/98/2k/ME/XP - version 0.8
    CPU SHA-256 Checksum Updated on
    pentium-mmxe722f91e9bc63444e864ab5a9254c407
    49097df2449ffffbd64c1086299b422c
    2004/01/31 13:50:50
    pentium2783977e0c33c50d038350a3bb575389b
    fc46a94636bee59a744d78490b391f24
    2004/01/31 13:50:55
    pentium3b8ab776d6bb022306c16d29133f2a0c6
    e7b88056e5941913bb1190a333b588ad
    2004/01/31 13:51:01
    pentium4ddb1f3b33ae2783c3f2851cabb6bd7eb
    f824be0c349ff80d41eaf3aa3fb7acf5
    2004/01/31 13:51:06
    k6d0d7c52af832b0a6fbb4ea2fbd55652f
    191283e4c25e8942ad5bfcd17fd15489
    2004/01/31 13:51:12
    athlon4225f4f1fbc04eaf91093dc4118f219c
    f2b287c6caa01dab701db249353303a6
    2004/01/31 13:51:17
    centrino7e294c1093718c3db0501cf37eff2012
    211dc71eedd4a92acf8f83608d6c02b3
    2004/01/31 13:51:23
    Linux x86 - version 0.8
    CPU SHA-256 Checksum Updated on
    pentium-mmx41e6b3832977a9bd7f109125c27184cf
    e2c9cd1af9afd3a36a8c3da3656bb90c
    2004/01/31 13:46:16
    pentium28604003162384e2c7d4b4da561b72d2a
    7096ffe8760f7225df2da5a8f00cd925
    2004/01/31 13:46:18
    pentium31ade3558436ba36476cf7a55d451ddb4
    1e3fb8f39ad84b16466de12389e17de9
    2004/01/31 13:46:20
    pentium424844a77f038365e767dcfbcb65b00d9
    1d99d03f58a2f48a885559a65ebc1df7
    2004/01/31 13:46:22
    k6d8608d01fd51ca3073b22119c2e8397a
    6ad1d655f0ea57c9e015bf1a03c39672
    2004/01/31 13:46:24
    athlonc96fb84ce139d4ef7ffd460e56801f6e
    8ba8a043ccf129bc87eb0e45e26ac8f0
    2004/01/31 13:46:25
    centrino84f696ccd402a7b32b89a5cf73032685
    2a17f2283d07bf0ae6a74c754fd7dd10
    2004/01/31 13:46:27
    Linux IA 64bit - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    Mac OS X PowerPC - version 0.8
    CPU SHA-256 Checksum Updated on
    G2f284712723cb695a37cd98c49ccfb8d9
    5dd304c3010119c277f24e266bee9ada
    2004/01/31 13:43:57
    G3dd48f0ba96dc81b98e8c9c4cd3db0260
    5e922408e7dae0dccc9223bb7c851268
    2004/01/31 13:44:07
    Linux PPC - version 0.8
    CPU SHA-256 Checksum Updated on
    G28e673c2562e10a3fa54be1f04bcecbef
    eb1a62a71a8b637e6f977d8f87c553ab
    2004/01/31 13:29:27
    G3cb2c6e5a45452d35fa5a88514d28bf2c
    de00d5630704de76f3f2bcbd676d3859
    2004/01/31 13:29:43
    G40eb3343ef09dcb7a189a86f045a98868
    bd5f9f8e6c9e6ccf91cc3ea5555dc27e
    2004/01/31 13:29:59
    AIX PPC - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    Linux Alpha - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    ev482a4b0a5afdd0ce056f70aed5d951a95
    ed70e2f6301a4be0b0307efa6718b323
    2004/01/31 13:40:54
    ev5346f704ddf40d08156a8435edba0705f
    dab572d2f19927a356250d0053a7ff79
    2004/01/31 13:41:04
    ev566ee2d69b7f76088eed141757ea1c41df
    369407de3b8b4815f453f32eecbe673e
    2004/01/31 13:41:12
    pca568c34b2849c6543043f9e2f8c32efba78
    840e71e3d07e68f5627e480c485f6285
    2004/01/31 13:41:19
    ev6069dc72bec9057be44d7cdc9d09d8318
    c1c910194df0a14470eadb13b3232287
    2004/01/31 13:41:28
    Tru64 Alpha - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    Linux SPARC - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    Sun SPARC - version 0.8 Developers Wanted
    CPU SHA-256 Checksum Updated on
    v73ed59aef4f2731f361495f0ca88c6f4e
    25de86cb53d664a06b14049dd2f5a08d
    2004/01/31 13:46:01
    v86e41b0955710f892fa3e84d4ebe30007
    df56175e7d21174e82096c6e1bc2c4a3
    2004/01/31 13:46:09
    v94d7fcac56fef8051edc0e60c7f254fe5
    e83d6a75ea574e2dc31196984a1a90c4
    2004/01/31 13:46:16
    sparclite4eedcbb6632892f65983187eba55c2a7
    3e671a724f7b1d5ce076786b09618b02
    2004/01/31 13:46:24

    Adding the MagicButton to your site [ top ]

    These three (3) lines of HTML code will place the applet on your website in a 100x32 pixel element. Updates to the applet are automatic, there is no need to download anything to your servers.


    <iframe src="http://web.archive.org/web/20040202200333if_/http://www.md5crk.com/applet.html" scrolling="NO" frameborder="0" width="100" height="32"> <layer src="http://www.md5crk.com/applet.html" width="100" height="32" clip="0,0,100,32"></layer> </iframe>

    For those who feel the need to be compliant with XHTML and other standards, the following will not break your site.


    <object data="http://web.archive.org/web/20040202200333im_/http://www.md5crk.com/applet.html" ; type="text/html" width="100" height="37"/>

    Privacy Policy
    The MD5CRK project collects user names, email addresses, and website URLs for the purpose of cryptographic and statistical analysis.

    You will not receive any email from this project unless absolutely necessary. Your private information will not be given away or sold.

     

     

  • Privacy Policy © 2003-2004 Jean-Luc Cooke