A cryptographic hash function is a mathematical equation that enables many everyday forms of encryption, like digital signatures. This includes everything from the HTTPS protocol to payments made on e-commerce websites. Cryptographic hash functions are also used extensively in blockchain technology.
While the term itself may seem intimidating, cryptographic hash functions are relatively easy to understand. In this article, you'll learn exactly how a cryptographic hash function works.
A Brief Overview Of Cryptographic Hash Functions
A cryptographic hash function is just a mathematical equation. You may remember learning a few equations in high school, such as linear equations of the form y=mX+b or quadratic equations of the form y=aX2+bX+c.
A cryptographic hash function is more or less the same thing. It’s a formula with a set of specific properties that makes it extremely useful for encryption. Let’s learn more about these properties now.
Properties Of A Useful Cryptographic Hash Function
While there are several different classes of cryptographic hash functions, they all share the same five properties. Here are the 5 qualities a cryptographic hash function must have to be useful.
Property #1: Computationally Efficient
First and foremost, hash functions must be computationally efficient. This is just a fancy way of saying that computers must be able to perform a hash function’s mathematical labor in an extremely short period of time.
This property is probably somewhat obvious. If an ordinary computer needed several minutes to process a cryptographic hash function and receive the output, it would not be very practical. To be useful, hash functions must be computationally efficient.
In reality, this is not as large of a concern as it was 40 or 50 years ago. Nowadays, an average home computer can process an advanced hash function in just a small fraction of a second.
Property #2: Deterministic
Cryptographic hash functions must be deterministic. In other words, for any given input, a hash function must always give the same result. If you put in the same input ten million times in a row, a hash function must produce the same exact output ten million times over.
This may also be rather obvious. If a cryptographic hash function were to produce different outputs each time the same input was entered, the hash function would be random and therefore useless. It would be impossible to verify a specific input, which is the whole point of hash functions— to be able to verify that a private digital signature is authentic without ever having access to the private key.
Property #3: Pre-Image Resistant
The output of a cryptographic hash function must not reveal any information about the input. This is called pre-image resistance.
It’s important to note that cryptographic hashing algorithms can receive any kind of input. The input can be numbers, letters, words, or punctuation marks. It can be a single character, a sentence from a book, a page from a book, or an entire book.
However, a hash function will always produce a fixed-length output. Regardless of what the input is, the output will be an alphanumeric code of fixed length.
Consider why this is so important: if a longer input produced a longer output, then attackers would already have a seriously helpful clue when trying to discover someone’s private input.
For example, if an input always produced an output 1.5 times its length, then the hash function would be giving away valuable information to hackers. If hackers saw an output of, say, 36 characters, they would immediately know that the input was 24 characters.
Instead, a useful hash function must conceal any clues about what the input may have looked like. It needs to be impossible to determine whether the input was long or short, numbers or letters, even or odd, random characters or a string of recognizable words. In addition, changing one character in a long string of text must result in a radically different digest.
Property #4: Collision Resistant
The final property that all cryptographic hash functions must have is what’s known as collision resistance. This means that it must be extremely unlikely— in other words, practically impossible— to find two different inputs that produce the same output.
As noted above, the inputs to a hash function can be of any length. This means there are infinite possible inputs that can be entered into a hash function.
However, outputs are of a fixed length. This means that there are a finite number— albeit an extremely large number— of outputs that a hash function can produce. A fixed-length means a fixed number of possibilities.
Since the number of inputs are essentially infinite, but the outputs are limited to a specific number, it is a mathematical certainty that more than one input will produce the same output.
The goal is to make finding two inputs that produce the same output so astronomically improbable that the possibility can be practically dismissed outright. It should not pose a risk.
Property #5: Impossible To Reverse Engineer
The fifth and final property of a useful cryptographic hash function is that it must be impossible to reverse the mathematical process used to create the output. There is no inverse operation for a hash function.
At this point you might be wondering what kind of incredible equations possess all five of these properties. The answer is probably far simpler than you think.
An Intro To One-Way Hash Functions
Hash functions are often called one-way functions because, according to the properties listed above, they must not be reversible. If an attacker could easily reverse a hash function, it would be totally useless. Therefore, cryptography requires one-way hash functions.
The best way to demonstrate a one-way function is with a simple modular function, also called modular arithmetic. Modular functions are mathematical functions that, put simply, produce the remainder of a division problem.
So, for example, 10 mod 3 = 1. This is true because 10 divided by 3 is 3 with a remainder of 1. We ignore the number of times 3 goes into 10 (which is 3 in this case) and the only output is the remainder: 1.
Let’s use the equation X mod 5 = Y as our function. Here’s a table to help get the point across:
You can probably spot the pattern. There are only five possible outputs for this function. They rotate in this order to infinity. That's also why modular functions are sometimes called "clock math"-- the results go around in circles forever.
This is significant because both the hash function and the output can be made public but no one will ever be able to learn your input. As long as you keep the number you chose to use as X a secret, it’s impossible for an attacker to figure it out.
Let’s say that your input is 27. This gives an output of 2. Now, imagine that you announce to the world that you’re using the hash function X mod 5 = Y and that your personal output is 2. Would anyone be able to guess your input?
Obviously not. There are literally an infinite number of possible inputs that you could have used to get a result of 2. For instance, your number could be 7, 52, 3492, or 23390787. Or, it could be any of the other infinite number of possible inputs.
The important point to understand here is that one-way hash functions are just that: one-way. They cannot be reversed.
When these same principles are applied to a much more sophisticated hash function, and much, much bigger numbers, it becomes impossible to determine the inputs. This is what makes a cryptographic hash function so secure and useful.
Classes Of Cryptographic Hash Functions
There are several different classes of hash functions. Here are some of the most commonly used:
- Secure Hashing Algorithm (SHA-2 and SHA-3)
- RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
- Message Digest Algorithm 5 (MD5)
Each of these classes of hash function may contain several different algorithms. For example, SHA-2 is a family of hash functions that includes SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256.
While all of these hash functions are similar, they differ slightly in the way the algorithm creates a digest, or output, from a given input. They also differ in the fixed length of the digest they produce.
SHA-256 is perhaps the most famous of all cryptographic hash functions because it’s used extensively in blockchain technology. It is used in Satoshi Nakamoto’s original Bitcoin protocol.
An Example Of A Cryptographic Hash Function Output
Let’s see what the input and corresponding digest of a real hash function looks like. Since SHA-256 is the preferred hash function of many blockchains, let’s use for our example.
This is our first example input:
Komodo Platform strives to accelerate the global adoption of blockchain technology and to lead the world in blockchain integration.
When put through the SHA-256 hash function, this sentence creates the following digest:
You can see that the digest is a combination of letters and numbers. You can see that it is exactly 64 characters in length. But, apart from that, there’s really not much else you can learn from looking at this digest. There are no patterns or clues as to what the input was. It’s more or less complete nonsense.
Now, let’s see what happens when we make one subtle change to the first example input:
Komodo Platform strives to accelerate the global adoptoin of blockchain technology and to lead the world in blockchain integration.
Notice that the letters “i” and “o” have been switched in the word “adoption.” Here’s the new digest:
You can see that this is a radically different result from the first digest. Even though the inputs were practically identical, changing a single character generated a completely different output.
It’s worth emphasizing that literally any input can be put into the SHA-256 hash function. Regardless of the length of the input, the output will always be the same fixed length and it will always appear completely random. Play around with this tool to see for yourself.
This concludes our intro to cryptographic hash functions. If you’ve enjoyed learning about hash functions, you might be interested in learning more about cryptography and blockchain-related topics. Check out this post about Merkle Trees or this post on asymmetric encryption if you're interested in cryptography. If you're more inclined to learn about blockchain technology, check out Komodo's Blockchain Fundamental series to learn about a variety of topics, from blockchain consensus protocols and distributed ledger technology to Proof of Work and Proof of Stake.
To get all the latest updates from Komodo, join the monthly email list. On the first Friday of every month, you'll receive a newsletter with information about all of the most important developments from the previous month. You can also join the Komodo Community Portal to chat with other community members and the Komodo team.