Implementation of SHA512-crypt vs MD5-crypt

If you have a new installation, you’re probably using SHA512-based passwords instead of the older MD5-based passwords described in detail in the previous post, which I’ll assume you’ve read. sha512-crypt is very similar to md5-crypt, but with some interesting differences.

Since the implementation of sha512 is really less interesting than the comparison with md5-crypt, I’ll describe it by striking out the relevant parts of the md5-crypt description and writing in what sha512-crypt does instead.

Like md5-crypt, it can be divided into three phases. Initialization, loop, and finalization.

  1. Generate a simple md5 sha512 hash based on the salt and password
  2. Loop 1000 5000 times, calculating a new sha512 hash based on the previous hash concatenated with alternatingly the hash of the password and the salt. Additionally, sha512-crypt allows you to specify a custom number of rounds, from 1000 to 999999999
  3. Use a special base64 encoding on the final hash to create the password hash string

 

The main differences are the higher number of rounds, which can be user selected for better (or worse) security, the use of the hashed password and salt in each round, rather than the unhashed ones, and a few tweaks of the initialization step.

 

Here’s the real sha512-crypt initialization.

  1. Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 16 chars) , and “magic” the string “$1$”
  2. Start by computing the Alternate sum, sha512(password + salt + password)
  3. Compute the Intermediate0 sum by hashing the concatenation of the following strings:
    1. Password
    2. Magic
    3. Salt
    4. length(password) bytes of the Alternate sum, repeated as necessary
    5. For each bit in length(password), from low to high and stopping after the most significant set bit
      • If the bit is set, append a NUL byte the Alternate sum
      • If it’s unset, append the first byte of the password
  4. New: Let S_factor be 16 + the first byte of Intermediate0
  5. New: Compute the S bytes, length(salt) bytes of sha512(salt, concatenated S_factor times).
  6. New: Compute the P bytes, length(password) bytes of sha512(password), repeated as necessary

 

Step 3.5 — which was very strange in md5-crypt — now makes a little more sense. We also calculated the S bytes and P bytes, which from here on will be used just like salt and password was in md5-crypt.

From this point on, the calculations will only involve the password P bytes, salt S bytes, and the Intermediate0 sum. Now we loop 5000 times (by default), to stretch the algorithm.

  • For i = 0 to 4999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:
    1. If i is even, Intermediatei
    2. If i is odd, password P bytes
    3. If i is not divisible by 3, salt S bytes
    4. If i is not divisible by 7, password P bytes
    5. If i is even, password P bytes
    6. If i is odd, Intermediatei

    At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate5000. Let’s call this the Final sum. Since sha512 is 512bit, this is 64 bytes long.

The bytes will be rearranged, and then encoded as 86 ascii characters using the same base64 encoding as md5-crypt.

  1. Output the magic, “$6$”
  2. New: If using a custom number of rounds, output “rounds=12345$”
  3. Output the salt
  4. Output a “$” to separate the salt from the encrypted section
  5. Pick out the 64 bytes in this order: 63 62 20 41 40 61 19 18 39 60 59 17 38 37 58 16 15 36 57 56 14 35 34 55 13 12 33 54 53 11 32 31 52 10 9 30 51 50 8 29 28 49 7 6 27 48 47 5 26 25 46 4 3 24 45 44 2 23 22 43 1 0 21 42
    • For each group of 6 bits (there’s 86 groups), starting with the least significant
      • Output the corresponding base64 character with this index

 

And yes, I do have a shell script for this as well: sha512crypt. This one takes about a minute to generate a hash, due to the higher number of rounds. However, it doesn’t support custom rounds.

I hope these two posts have provided an interesting look at two exceedingly common, but often overlooked, algorithms!

Password hashing with MD5-crypt in relation to MD5

If you haven’t reinstalled recently, chances are you’re using MD5-based passwords. However, the password hashes you find in /etc/shadow look nothing like what md5sum returns.

Here’s an example:

/etc/shadow:
$1$J7iYSKio$aEY4anysz.gtXxg7XlL6v1

md5sum:
7c6483ddcd99eb112c060ecbe0543e86 

What’s the difference in generating these hashes? Why are they different at all?

Just running md5sum on a password and storing that is just marginally more secure than storing the plaintext password.

Thanks to GPGPUs, a modern gaming rig can easily try 5 billion such passwords per second, or go over the entire 8-character alphanumeric space in a day. With rainbow tables, a beautiful time–space tradeoff, you can do pretty much the same in 15 minutes.

MD5-crypt employs salting to make precomputational attacks exponentially more difficult. Additionally, it uses stretching to make brute force attacks harder (but just linearly so).

As an aside, these techniques were used in the original crypt from 1979, so there’s really no excuse to do naive password hashing anymore. However, at that time the salt was 12 bits and the number of rounds 25 — quite adorable in comparison with today’s absolute minimum of 64 bits and 1000 rounds.

The original crypt was DES based, but used a modified algorithm to prevent people from using existing DES cracking hardware. MD5-crypt doesn’t do any such tricks, and can be implemented in terms of any MD5 library, or even the md5sum util.

As regular reads might suspect, I’ve written a shell script to demonstrate this: md5crypt. There are a lot of workarounds for Bash’s inability to handle NUL bytes in strings. It takes 10 seconds to generate a hash, and is generally awful..ly funny!

Let’s first disect a crypt hash. man 3 crypt has some details.

If salt is a character string starting with the characters
"$id$" followed by a string terminated by "$":

       $id$salt$encrypted

then instead of using the DES machine, id  identifies  the
encryption  method  used  and this then determines how the
rest of the password string is interpreted.  The following
values of id are supported:

       ID  | Method
       -------------------------------------------------
       1   | MD5
       2a  | Blowfish (on some Linux distributions)
       5   | SHA-256 (since glibc 2.7)
       6   | SHA-512 (since glibc 2.7)

Simple and easy. Split by $, and then your fields are Algorithm, Salt and Hash.

md5-crypt is a function that takes a plaintext password and a salt, and generate such a hash.

To set a password, you’d generate a random salt, input the user’s password, and write the hash to /etc/shadow. To check a password, you’d read the hash from /etc/shadow, extract the salt, run the algorithm on this salt and the candidate password, and then see if the resulting hash matches what you have.

md5-crypt can be divided into three phases. Initialization, loop, and finalization. Here’s a very high level description of what we’ll go through in detail:

  1. Generate a simple md5 hash based on the salt and password
  2. Loop 1000 times, calculating a new md5 hash based on the previous hash concatenated with alternatingly the password and the salt.
  3. Use a special base64 encoding on the final hash to create the password hash string

 

Put like this, it relatively elegant. However, there are a lot of details that turn this from elegant to eyerolling.

Here’s the real initialization.

  1. Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 chars), and “magic” the string “$1$”
  2. Start by computing the Alternate sum, md5(password + salt + password)
  3. Compute the Intermediate0 sum by hashing the concatenation of the following strings:
    1. Password
    2. Magic
    3. Salt
    4. length(password) bytes of the Alternate sum, repeated as necessary
    5. For each bit in length(password), from low to high and stopping after the most significant set bit
      • If the bit is set, append a NUL byte
      • If it’s unset, append the first byte of the password

 

I know what you’re thinking, and yes, it’s very arbitrary. The latter part was most likely a bug in the original implementation, carried along as UNIX issues often are. Remember to stay tuned next week, when we’ll compare this to SHA512-crypt as used on new installations!

From this point on, the calculations will only involve the password, salt, and Intermediate0 sum. Now we loop 1000 times, to stretch the algorithm.

  • For i = 0 to 999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:
    1. If i is even, Intermediatei
    2. If i is odd, password
    3. If i is not divisible by 3, salt
    4. If i is not divisible by 7, password
    5. If i is even, password
    6. If i is odd, Intermediatei

    At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate1000. Let’s call this the Final sum. Since MD5 is 128bit, this is 16 bytes long.

The bytes will be rearranged, and then encoded as 22 ascii characters with a special base64-type encoding. This is not the same as regular base64:

Normal base64 set:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Crypt base64 set:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Additionally, there is no padding. The leftover byte will be encoded into 2 base64 ascii characters.

  1. Output the magic
  2. Output the salt
  3. Output a “$” to separate the salt from the encrypted section
  4. Pick out the 16 bytes in this order: 11 4 10 5 3 9 15 2 8 14 1 7 13 0 6 12.
    • For each group of 6 bits (there are 22 groups), starting with the least significant
      • Output the corresponding base64 character with this index

Congratulations, you now have a compatible md5-crypt hash!

As you can see, it’s quite far removed from a naive md5(password) attempt.

Fortunately, one will only ever need this algorithm for compatibility. New applications can use the standard PBKDF2 algorithm, implemented by most cryptography libraries, which does the same thing only in a standardized and parameterized way.

As if this wasn’t bad enough, the next post next week will be more of the same, but with SHA512-crypt!

Why Bash is like that: Signal propagation

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

How do I simulate pressing Ctrl-C when running this in a script:
while true; do echo sleeping; sleep 30; done

Are you thinking “SIGINT, duh!”? Hold your horses!

I tried kill -INT pid, but it doesn't work the same:

Ctrl-C    kills the sleep and the loop
SIGINTing the shell does nothing (but only in scripts: see Errata)
SIGINTing sleep makes the loop continue with the next iteration

HOWEVER, if I run the script in the background and kill -INT %1
instead of kill -INT pid, THEN it works :O

Why does Ctrl-C terminate the loop, while SIGINT doesn’t?

Additionally, if I run the same loop with ping or top instead of sleep,
Ctrl-C doesn't terminate the loop either!

Yeah. Well… Yeah…

This behaviour is due to an often overlooked feature in UNIX: process groups. These are important for getting terminals and shells to work the way they do.

A process group is exactly what it sounds like: a group of processes. They have a leader, which is the process that created it using setsid(2). The leader’s pid is also the process group id. Child processes are in the same group as their parent by default.

Terminals keep track of the foreground process group (set by the shell using tcsetpgrp(3)). When receiving a Ctrl-C, they send the SIGINT to the entire foreground group. This means that all members of the group will receive SIGINT, not just the immediate process.

kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid! This explains why it works like Ctrl-C.

You can do the same thing with kill -INT -pgrpid. Since the process group id is the same as the process group leader, you can kill the group by killing the pid with a minus in front.

But why do you have to kill both?

When the shell is interrupted, it will wait for the running command to exit. If this child’s status indicates it exited abnormally due to that signal, the shell cleans up, removes its signal handler, and kills itself again to trigger the OS default action (abnormal exit). Alternatively, it runs the script’s signal handler as set with trap, and continues.

If the shell is interrupted and the child’s status says it exited normally, then Bash assumes the child handled the signal and did something useful, so it continues executing. Ping and top both trap SIGINT and exit normally, which is why Ctrl-C doesn’t kill the loop when calling them.

This also explains why interrupting just the shell does nothing: the child exits normally, so the shell thinks the child handled the signal, though in reality it was never received.

Finally, if the shell isn’t interrupted and a child exits, Bash just carries on regardless of whether the signal died abnormally or not. This is why interrupting the sleep just continues with the loop.

In case one would like to handle such cases, Bash sets the exit code to 128+signal when the process exits abnormally, so interrupting sleep with SIGINT would give the exit code 130 (kill -l lists the signal values).

Bonus problem:

I have this C app, testpg:
int main() {
    setsid();
    return sleep(10);
}

I run bash -c './testpg' and press Ctrl-C. The app is killed.
Shouldn't testpg be excluded from SIGINT, since it used setsid?

A quick strace unravels this mystery: with a single command to execute, bash execve’s it directly — a little optimization trick. Since the pid is the same and already had its own process group, creating a new one doesn’t have any effect.

This trick can’t be used if there are more commands, so bash -c './testpg; true' can’t be killed with Ctrl-C.

Errata:

Wait, I started a loop in one terminal and killed the shell in another. 
The loop exited!

Yes it did! This does not apply to interactive shells, which have different ways of handling signals. When job control is enabled (running interactively, or when running a script with bash -m), the shell will die when SIGINTed

Here’s the description from the bash source code, jobs.c:2429:

  /* Ignore interrupts while waiting for a job run without job control
     to finish.  We don't want the shell to exit if an interrupt is
     received, only if one of the jobs run is killed via SIGINT. 
   ...