Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
What to do in case of hash collision? #24
ipfs uses a sha256 hash for addressing content. Meaning that there are 2^256 different possible hashes. Lets assume that the entire bitcoin mining economy decides to try and find an ipfs object hash collision, checking hashes at a rate of 400 Petahash (400,000,000,000,000,000 hashes per second) it would take them 2.810^59 seconds, or 910^51 years to compute the entire space. Even factoring in the birthday paradox and moores law, we shouldnt need to worry about that happening for a few thousand years or so. Assuming sha256 is a perfect hash function (which it might not be).
If sometime down the road, we get worried about our very limited hash space, we can upgrade the hash function we use as ipfs uses the multihash format for specifying hashes along with their type and length.
tldr, the world will likely end before we find a sha256 collision. but if we get worried, we can use a bigger hash at pretty much any time.
in general, what @whyrusleeping said is accurate. Consider: the bitcoin network has not seen sha256 collisions. (People may not be wasting their resources finding a pre-image for an ipfs object, when they could just steal all the bitcoin instead -- it's the same problem).
The one future-proofing caveat is that many cryptographic hash functions have been broken over time, and that's one reason -- as @whyrusleeping mentioned -- we use multihash: we can upgrade. (this is a costly thing though, as things will have to be rehashed and/or linked to, so we ought to think about how to upgrade well before we do... good thing we have a while :) ).
also, feel free to switch to sha3, you can recompile go-ipfs to do it. the rest of the ipfs network supports it already. (and we'll add blake2 support too)
and, cryptocurrency people may want to build upgradeability into their blockchains. see https://github.com/jbenet/multihash and related protocols.
Though, a word of warning for the ages. Even the best people do make mistakes thinking that certain cryptographic artifacts are safe for a long time:
(HT Zooko for these links)
So we should not rest on our laurels, and improve the upgradeability paths of our cryptographic protocols, to make easy to ratchet up the security of a system. Multihash and related protocols are a good start, but not the end by any means.
I'll close this, as the better question (to be asked elsewhere) is "how do we upgrade from a broken cryptographic hash function".
referenced this issue
Sep 26, 2016
ipfs currently uses a 'multihash' where every hash value is tagged with the hash function used to generate it. As of now, the default hash function is sha256, and if sha256 is shown to be broken, aparently its possible to switch to sha512 on the fly - lengthen our root address hash function to something like sha512 without breaking any existing data.
This issue has been moved to https://discuss.ipfs.io/t/what-to-do-in-case-of-hash-collision/482.