An often overlooked and misunderstood concept in application development is the one involving secure hashing of passwords. We have evolved from plain text password storage, to hashing a password, to appending salts and now even this is not considered adequate anymore. In this post I will discuss what hashing is, what salts and peppers are and which algorithms are to be used and which are to be avoided.

Hashing is an algorithm which takes any size of data and turns it into a fixed-length of data. This is often used to ease the retrieval of data as you can shorten large amounts of data to a shorter string (which is easier to compare). For instance let's say you have a DNA sample of a person, this would consist of a large amount of data (about 2.2 - 3.5 MB), and you would like to find out to who this DNA sample belongs to. You could take all samples and compare 2.2 MB of data to all DNA samples in the database, but comparing 2.2 MB against 2.2 MB of data cant take quite a while, especially when you need to traverse thousands of samples. This is where hashing can come in handy, instead of comparing the data, you calculate the hash of this data (in reality, several hashes will be calculated for the different locations on the chromosomes, but for the sake of the example let's assume it's one hash), which will return a fixed length value of, for instance, 128 bits. It will be easier and faster to query a database for 128-bits than for 2.2 MB of data. The main difference between hashing and encryption is that a hash is not reversible. When we are talking about cryptographic hash functions, we are referring to hash functions which have these properties:

- It is easy to compute the hash value for any given message.
- It is infeasible to generate a message that has a given hash.
- It is infeasible to modify a message without changing the hash.
- It is infeasible to find two different messages with the same hash.

- Collisions (two different messages generating the same hash)
- Pre-image resistance: Given a hash h it should be difficult to find any message m such that h = hash(m).
- Resistance to second-preimages: given m, it is infeasible to find m' distinct from m and such that MD-5(m) = MD-5(m').

- MD-5
- SHA-1
- SHA-2
- SHA-3

- SHA-1: A 160-bit hash function which resembles the earlier MD-5 algorithm. This was designed by the National Security Agency (NSA) to be part of the Digital Signature Algorithm. Cryptographic weaknesses were discovered in SHA-1, and the standard was no longer approved for most cryptographic uses after 2010.
- SHA-2: A family of two similar hash functions, with different block sizes, known as SHA-256 and SHA-512. They differ in the word size; SHA-256 uses 32-bit words where SHA-512 uses 64-bit words. There are also truncated versions of each standardized, known as SHA-224 and SHA-384. These were also designed by the NSA.
- SHA-3: SHA-3 is not yet defined. NIST is working on the exact parameters they will use; SHA-3 will be Keccak, or "close enough", but not necessarily the Keccak which was submitted (it is a configurable function, and they seem to want to tweak the parameters a bit differently than what was first proposed).

- Dictionary Attacks
- Bruteforce
- Rainbow Tables (generate everything upfront in a database and do a look up for each hash)

Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. The period often quoted as "18 months" is due to Intel executive David House, who predicted that period for a doubling in chip performance (being a combination of the effect of more transistors and their being faster).Computers have become faster and faster over the years, which is something we need to take into account. From a cryptographical point of view, 106 years is still a short period. We want infinity (something which will take several hundred-thousand to millions of years).

- Have a unique salt per password (salt may only be used once across the database containing all password hashes) to prevent a bruteforce attack of compromising the data in one run
- Fast on software (executing the function once must be relatively fast)
- Slow on hardware (executing the function concurrently should be slow, this to prevent brute forcing on distributed systems)

- MD-5
- SHA-1
- SHA-2
- SHA-3

- MD-5, SHA-1, SHA-256 (and SHA-224) use 32-bit operations, which are very fast on x86 CPU but also on ARM and, crucially, on GPU as they exist today.
- SHA-512 (and SHA-384) use 64-bit arithmetic operations which make life harder for GPU, but also for 32-bit architectures (ARM in particular).
- SHA-3 (Keccak) uses only boolean operations, which makes it fast everywhere but also quite faster on FPGA (no carry propagation).

- PBKDF2
- bcrypt
- scrypt

DK = PBKDF2(PRF, Password, Salt, c, dkLen)Where DK is the derived key, PRF is the preferred HMAC function (this can be a SHA-1/2 HMAC, the password is used as a key for the HMAC and the salt as text), c is the amount of iterations and dkLen is the length of the derived key. A salt should, by definition of the standard, be at least 64-bits of length and the minimum amount of iterations should be 1024. What the algorithm will do is SHA-1-HMAC(password+salt), and repeat the calculation 1024 times on the result. This means the hashing of a password will be 1024 times slower. Still this does not actually offer a lot of protection when bruteforcing on distributed systems or GPU (Graphic Processing Unit). There's also a caveat when the password exceeds 64 bytes, the password will be shortened by applying a hash to it by the PBKDF2 algorithm so it does not exceed the block size. For instance when using HMAC-SHA-1 a password longer than 64 bytes will be reduced to SHA-1(password), which is 20 bytes in size. This means passwords longer than 64 bytes do not provide additional security when it comes to breaking the key used to make the HMAC, but may even reduce security as the length of the key will be reduced (note that even when reduced to 20 bytes, currently our great-great-great-great-great-great-great-great-great-great grand children will be long dead before the key is brute forced).

I would also like to thank Thomas Pornin for supplying so much insights, reviewing my cryptography assumptions and for suggesting amendments. I would also like to thank CodesInChaos for answering my questions regarding PBKDF2 key generation.