Take me back!

The Basics of (Cryptographic) Hashing

What is Hashing?

Cryptographic hashing is a process where data can be turned into a different piece of data of a fixed size. Basically, you put it say, the string 'password123' and it spits out something like 'EF92B778BAFE771E89245B89ECBC08A44A4E166C06659911881F383D4473E94F'. It also:

Also, it is important to note that hashing can be used for non-cryptographic purposes. That will not be discussed here.


Why Hash?

Hashing, because of its properties, has many uses. Two of the most common ones are password storage, and verification.


Password Storage

When you create an account at some website, and enter a password, what happens afterward? Usually, all that data is sent to the site backend, which then hashes the password, and stores all that information in a database. Why does the password get hashed? For security reasons. If the password was stored unhashed, in plaintext, the site operators or, even worse, hackers, could access the database and steal all the passwords. But, if it was hashed, the hackers wouldn't know the actual password, because hash functions are irreversible.

Importantly, verifying the password would still be possible - hash the inputted password, and make sure that hash is the same as the hash stored.


Verification

Because no two inputs into a hash function should have the same output, that means hashing is perfect for verifying or comparing content.

A real world example is software releases - when downloading software from the internet, it may be hard to tell whether it is legitimate. You could think you are downloading Firefox for example, but if you are downloading it from a third party website, it could be malware for all you know.

Software makers can publish a hash of their application. Then, people downloading it can verify it has not been tampered with by producing a hash of the downloaded version and making sure the resulting hashes are identical.


Vulnerabilities

Although hashing is generally secure, there are some potential vulnerabilities. Luckily, most of them can be prevented.


Hash Collisions

Hash functions are supposed to return unique outputs. Simply put, no two inputs should ever have the same outputs.

Sometimes, someone finds two inputs with the same output. This is very bad, at least for most purposes.

This means that someone could fool a program into thinking they have the right password, or an authentic, unmodified file, when neither is actually true.

This happened to SHA-1, although to be fair it took Google, many experts, and "6,500 years of single-CPU computations and 110 years of single-GPU computations". The method that was found was 100,000 times faster than a brute force attack.

The best way to prevent this from happening is using the most modern, secure, and tested hash functions.

Argon2 is considered the best for password hashing. SHA-2 is fairly popular.


Rainbow Table Attacks

Since hash functions will always return the same output for a given input, hackers can get a list of common passwords and their variations, then compute their hashes ahead of time. This is called a rainbow table. That way, should a hacker stumble upon a database where the passwords are hashed, they can simply look for hashes that match what is on their rainbow table. If they find a match, they now know that user's password.

A great way to prevent this is salting. Salting is adding a couple random characters (the sakt) to the user's password before hashing, then storing those random characters added alongside the hashed user password.

This makes it still simple to verify passwords, because the program can just get the stored salt, add it to password being verified, then compare with the stored hash. With not much added complexity, the hacker's rainbow tables are now useless, because the hashes have changed.

Another idea is to store the salt separately from the hashed password. This is called a pepper. It may be a bit more secure, but it definitely complicates the procedure too.