### Archive

Archive for the ‘Advanced Linux-related things’ Category

## Technically correct: floating point calculations in bc

June 14th, 2015

Whenever someone asks how to do floating point math in a shell script, the answer is typically `bc`:

```\$  echo "scale=9; 22/7" | bc
3.142857142
```

However, this is technically wrong: `bc` does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: `001234.56`. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

```> truncate (16777216 - 1 :: Float)
16777215
> truncate (16777216 + 1 :: Float)
16777216
```

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the `bc` solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for `float` and 52 for `double`. Here is some bc code for that:

```scale=30

define trunc(x) {
auto old, tmp
old=scale; scale=0; tmp=x/1; scale=old
return tmp
}
define fp(bits, x) {
auto i
if (x < 0) return -fp(bits, -x);
if (x == 0) return 0;
i=bits
while (x < 1) { x*=2; i+=1; }
while (x >= 2) { x/=2; i-=1; }
return trunc(x * 2^bits + 0.5) / 2^(i)
}

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
print "Float:  ", float(x), "\n"
print "Double: ", double(x), "\n"
}
```

With this file named `fp`, we can try it out:

```\$ bc -ql fp <<< "22/7"
3.142857142857142857142857142857

\$ bc -ql fp <<< "float(22/7)"
3.142857193946838378906250000000
```

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

```{-# LANGUAGE RankNTypes #-}
import Data.Number.CReal
import System.Environment

main = do
putStrLn . ("Float:  " ++) \$ showNumber (read input :: Float)
putStrLn . ("Double: " ++) \$ showNumber (read input :: Double)
where
showNumber :: forall a. Real a => a -> String
showNumber = showCReal 30 . realToFrac
```

Here's a comparison of the two:

```\$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

\$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875
```

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

## Basics of a Bash action game

February 24th, 2014

If you want to write an action game in bash, you need the ability to check for user input without actually waiting for it. While bash doesn’t let you poll the keyboard in a great way, it does let you wait for input for a miniscule amount of time with `read -t 0.0001`.

Here’s a snippet that demonstrates this by bouncing some text back and forth, and letting the user control position and color. It also sets (and unsets) the necessary terminal settings for this to look good:

``` #!/usr/bin/env bash # Reset terminal on exit trap 'tput cnorm; tput sgr0; clear' EXIT # invisible cursor, no echo tput civis stty -echo text="j/k to move, space to color" max_x=\$((\$(tput cols) - \${#text})) dir=1 x=1 y=\$((\$(tput lines)/2)) color=3 while sleep 0.05 # GNU specific! do # move and change direction when hitting walls (( x == 0 || x == max_x )) && \ ((dir *= -1)) (( x += dir )) # read all the characters that have been buffered up while IFS= read -rs -t 0.0001 -n 1 key do [[ \$key == j ]] && (( y++ )) [[ \$key == k ]] && (( y-- )) [[ \$key == " " ]] && color=\$((color%7+1)) done # batch up all terminal output for smoother action framebuffer=\$( clear tput cup "\$y" "\$x" tput setaf "\$color" printf "%s" "\$text" ) # dump to screen printf "%s" "\$framebuffer" done ```

## Technically correct: Inversed regex

April 2nd, 2012

How do I write a regex that matches lines that do not contain `hi`?

That’s a frequently asked question if I ever saw one.

Of course, the proper answer is: you don’t. You write a regex that does match `hi` and then invert the matching logic, ostensibly with `grep -v`. But where’s the fun in that?

One interesting theorem that pops up in any book or class on formal grammars is that regular languages are closed under complement: the inverse of a regular expression is also a regular expression. This means that writing inverted regular expressions is theoretically possible, though it turns out to be quite tricky

Just try writing a regex that matches strings that does not contain “hi”, and test it against “hi”, “hhi” and “ih”, “iih” and such variations. Some solutions are coming up.

A way to cheat is using PCRE negative lookahead: `^(?!.*foo)` matches all strings not containing the substring “foo”. However, lookahead assertions require a stack, and thus can’t be modelled as a finite state machine. In other words, they don’t fit the mathematical definition of a regular expression, and therefore disqualify.

There are simple, well-known algorithms for turning regular expressions into non-deterministic finite automata, and from there to deterministic FA. Less commonly used and known are algorithms for inverting a DFA and for generating familiar textual regex from it.

You can find these described in various lecture notes and slides, so I won’t recite them.

What I had a harder time finding was software that actually did this. So here is a Haskell program. It’s highly suboptimal but it does the job. When executed, it will ask for a regex and will then output a grep command that matches everything the regex does not (without `-v`, obviously).

The expressions it produces are quite horrific; it’s computer generated code, after all.

A regular expression for matching strings that do not match `.*hi.*` could be `grep -E '^([^h]|h+\$|h+[^hi])*\$'`.

This app, however, suggests `grep -E '^([^h]([^h]|)*||([^h][^h]*h|h)|([^h][^h]*(h(hh*[^hi]|[^hi]))|(h(hh*[^hi]|[^hi])))((hh*[^hi]|[^h])|)*|([^h][^h]*(h([^hi][^h]*h|h))|(h([^hi][^h]*h|h)))(([^hi][^h]*h|h)|)*)\$'`

It still works exactly as stated though!

The app just supports a small subset of regex, just enough to convince someone that it works, and as a party trick lets you answer the original question exactly as stated.

Technically correct is the best kind of correct.

## Implementation of SHA512-crypt vs MD5-crypt

August 16th, 2011

If you have a new installation, you’re probably using SHA512-based passwords instead of the older MD5-based passwords described in detail in the previous post, which I’ll assume you’ve read. sha512-crypt is very similar to md5-crypt, but with some interesting differences.

Since the implementation of sha512 is really less interesting than the comparison with md5-crypt, I’ll describe it by striking out the relevant parts of the md5-crypt description and writing in what sha512-crypt does instead.

Like md5-crypt, it can be divided into three phases. Initialization, loop, and finalization.

1. Generate a simple md5 sha512 hash based on the salt and password
2. Loop 1000 5000 times, calculating a new sha512 hash based on the previous hash concatenated with alternatingly the hash of the password and the salt. Additionally, sha512-crypt allows you to specify a custom number of rounds, from 1000 to 999999999
3. Use a special base64 encoding on the final hash to create the password hash string

The main differences are the higher number of rounds, which can be user selected for better (or worse) security, the use of the hashed password and salt in each round, rather than the unhashed ones, and a few tweaks of the initialization step.

Here’s the real sha512-crypt initialization.

1. Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 16 chars) , and “magic” the string “\$1\$”
2. Start by computing the Alternate sum, `sha512(password + salt + password)`
3. Compute the Intermediate0 sum by hashing the concatenation of the following strings:
2. Magic
3. Salt
4. length(password) bytes of the Alternate sum, repeated as necessary
5. For each bit in length(password), from low to high and stopping after the most significant set bit
• If the bit is set, append a NUL byte the Alternate sum
• If it’s unset, append the first byte of the password
4. New: Let S_factor be 16 + the first byte of Intermediate0
5. New: Compute the S bytes, length(salt) bytes of sha512(salt, concatenated S_factor times).
6. New: Compute the P bytes, length(password) bytes of sha512(password), repeated as necessary

Step 3.5 — which was very strange in md5-crypt — now makes a little more sense. We also calculated the S bytes and P bytes, which from here on will be used just like salt and password was in md5-crypt.

From this point on, the calculations will only involve the password P bytes, salt S bytes, and the Intermediate0 sum. Now we loop 5000 times (by default), to stretch the algorithm.

• For i = 0 to 4999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:
1. If i is even, Intermediatei
2. If i is odd, password P bytes
3. If i is not divisible by 3, salt S bytes
4. If i is not divisible by 7, password P bytes
5. If i is even, password P bytes
6. If i is odd, Intermediatei

At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate5000. Let’s call this the Final sum. Since sha512 is 512bit, this is 64 bytes long.

The bytes will be rearranged, and then encoded as 86 ascii characters using the same base64 encoding as md5-crypt.

1. Output the magic, “\$6\$”
2. New: If using a custom number of rounds, output “rounds=12345\$”
3. Output the salt
4. Output a “\$” to separate the salt from the encrypted section
5. Pick out the 64 bytes in this order: 63 62 20 41 40 61 19 18 39 60 59 17 38 37 58 16 15 36 57 56 14 35 34 55 13 12 33 54 53 11 32 31 52 10 9 30 51 50 8 29 28 49 7 6 27 48 47 5 26 25 46 4 3 24 45 44 2 23 22 43 1 0 21 42
• For each group of 6 bits (there’s 86 groups), starting with the least significant
• Output the corresponding base64 character with this index

And yes, I do have a shell script for this as well: sha512crypt. This one takes about a minute to generate a hash, due to the higher number of rounds. However, it doesn’t support custom rounds.

I hope these two posts have provided an interesting look at two exceedingly common, but often overlooked, algorithms!

## Password hashing with MD5-crypt in relation to MD5

August 9th, 2011

If you haven’t reinstalled recently, chances are you’re using MD5-based passwords. However, the password hashes you find in /etc/shadow look nothing like what `md5sum` returns.

Here’s an example:

```/etc/shadow:
\$1\$J7iYSKio\$aEY4anysz.gtXxg7XlL6v1

md5sum:
7c6483ddcd99eb112c060ecbe0543e86
```

What’s the difference in generating these hashes? Why are they different at all?

Just running md5sum on a password and storing that is just marginally more secure than storing the plaintext password.

Thanks to GPGPUs, a modern gaming rig can easily try 5 billion such passwords per second, or go over the entire 8-character alphanumeric space in a day. With rainbow tables, a beautiful time–space tradeoff, you can do pretty much the same in 15 minutes.

MD5-crypt employs salting to make precomputational attacks exponentially more difficult. Additionally, it uses stretching to make brute force attacks harder (but just linearly so).

As an aside, these techniques were used in the original crypt from 1979, so there’s really no excuse to do naive password hashing anymore. However, at that time the salt was 12 bits and the number of rounds 25 — quite adorable in comparison with today’s absolute minimum of 64 bits and 1000 rounds.

The original crypt was DES based, but used a modified algorithm to prevent people from using existing DES cracking hardware. MD5-crypt doesn’t do any such tricks, and can be implemented in terms of any MD5 library, or even the md5sum util.

As regular reads might suspect, I’ve written a shell script to demonstrate this: md5crypt. There are a lot of workarounds for Bash’s inability to handle NUL bytes in strings. It takes 10 seconds to generate a hash, and is generally awful..ly funny!

Let’s first disect a crypt hash. man 3 crypt has some details.

```If salt is a character string starting with the characters
"\$id\$" followed by a string terminated by "\$":

\$id\$salt\$encrypted

then instead of using the DES machine, id  identifies  the
encryption  method  used  and this then determines how the
rest of the password string is interpreted.  The following
values of id are supported:

ID  | Method
-------------------------------------------------
1   | MD5
2a  | Blowfish (on some Linux distributions)
5   | SHA-256 (since glibc 2.7)
6   | SHA-512 (since glibc 2.7)

```

Simple and easy. Split by \$, and then your fields are Algorithm, Salt and Hash.

md5-crypt is a function that takes a plaintext password and a salt, and generate such a hash.

To set a password, you’d generate a random salt, input the user’s password, and write the hash to /etc/shadow. To check a password, you’d read the hash from /etc/shadow, extract the salt, run the algorithm on this salt and the candidate password, and then see if the resulting hash matches what you have.

md5-crypt can be divided into three phases. Initialization, loop, and finalization. Here’s a very high level description of what we’ll go through in detail:

1. Generate a simple md5 hash based on the salt and password
2. Loop 1000 times, calculating a new md5 hash based on the previous hash concatenated with alternatingly the password and the salt.
3. Use a special base64 encoding on the final hash to create the password hash string

Put like this, it relatively elegant. However, there are a lot of details that turn this from elegant to eyerolling.

Here’s the real initialization.

1. Let “password” be the user’s ascii password, “salt” the ascii salt (truncated to 8 chars), and “magic” the string “\$1\$”
2. Start by computing the Alternate sum, `md5(password + salt + password)`
3. Compute the Intermediate0 sum by hashing the concatenation of the following strings:
2. Magic
3. Salt
4. length(password) bytes of the Alternate sum, repeated as necessary
5. For each bit in length(password), from low to high and stopping after the most significant set bit
• If the bit is set, append a NUL byte
• If it’s unset, append the first byte of the password

I know what you’re thinking, and yes, it’s very arbitrary. The latter part was most likely a bug in the original implementation, carried along as UNIX issues often are. Remember to stay tuned next week, when we’ll compare this to SHA512-crypt as used on new installations!

From this point on, the calculations will only involve the password, salt, and Intermediate0 sum. Now we loop 1000 times, to stretch the algorithm.

• For i = 0 to 999 (inclusive), compute Intermediatei+1 by concatenating and hashing the following:
1. If i is even, Intermediatei
2. If i is odd, password
3. If i is not divisible by 3, salt
4. If i is not divisible by 7, password
5. If i is even, password
6. If i is odd, Intermediatei

At this point you don’t need Intermediatei anymore.

You will now have ended up with Intermediate1000. Let’s call this the Final sum. Since MD5 is 128bit, this is 16 bytes long.

The bytes will be rearranged, and then encoded as 22 ascii characters with a special base64-type encoding. This is not the same as regular base64:

```Normal base64 set:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Crypt base64 set:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
```

Additionally, there is no padding. The leftover byte will be encoded into 2 base64 ascii characters.

1. Output the magic
2. Output the salt
3. Output a “\$” to separate the salt from the encrypted section
4. Pick out the 16 bytes in this order: 11 4 10 5 3 9 15 2 8 14 1 7 13 0 6 12.
• For each group of 6 bits (there are 22 groups), starting with the least significant
• Output the corresponding base64 character with this index

Congratulations, you now have a compatible md5-crypt hash!

As you can see, it’s quite far removed from a naive `md5(password)` attempt.

Fortunately, one will only ever need this algorithm for compatibility. New applications can use the standard PBKDF2 algorithm, implemented by most cryptography libraries, which does the same thing only in a standardized and parameterized way.

As if this wasn’t bad enough, the next post next week will be more of the same, but with SHA512-crypt!

## Why Bash is like that: Signal propagation

August 3rd, 2011

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

```How do I simulate pressing Ctrl-C when running this in a script:
while true; do echo sleeping; sleep 30; done```

Are you thinking “SIGINT, duh!”? Hold your horses!

```I tried kill -INT pid, but it doesn't work the same:

Ctrl-C    kills the sleep and the loop
SIGINTing the shell does nothing (but only in scripts: see Errata)
SIGINTing sleep makes the loop continue with the next iteration

HOWEVER, if I run the script in the background and `kill -INT %1`
instead of `kill -INT pid`, THEN it works :O```

Why does Ctrl-C terminate the loop, while SIGINT doesn’t?

```Additionally, if I run the same loop with ping or top instead of sleep,
Ctrl-C doesn't terminate the loop either!```

Yeah. Well… Yeah…

This behaviour is due to an often overlooked feature in UNIX: process groups. These are important for getting terminals and shells to work the way they do.

A process group is exactly what it sounds like: a group of processes. They have a leader, which is the process that created it using setsid(2). The leader’s pid is also the process group id. Child processes are in the same group as their parent by default.

Terminals keep track of the foreground process group (set by the shell using tcsetpgrp(3)). When receiving a Ctrl-C, they send the SIGINT to the entire foreground group. This means that all members of the group will receive SIGINT, not just the immediate process.

`kill -INT %1` sends the signal to the job’s process group, not the backgrounded pid! This explains why it works like Ctrl-C.

You can do the same thing with `kill -INT -pgrpid`. Since the process group id is the same as the process group leader, you can kill the group by killing the pid with a minus in front.

But why do you have to kill both?

When the shell is interrupted, it will wait for the running command to exit. If this child’s status indicates it exited abnormally due to that signal, the shell cleans up, removes its signal handler, and kills itself again to trigger the OS default action (abnormal exit). Alternatively, it runs the script’s signal handler as set with `trap`, and continues.

If the shell is interrupted and the child’s status says it exited normally, then Bash assumes the child handled the signal and did something useful, so it continues executing. Ping and top both trap SIGINT and exit normally, which is why Ctrl-C doesn’t kill the loop when calling them.

This also explains why interrupting just the shell does nothing: the child exits normally, so the shell thinks the child handled the signal, though in reality it was never received.

Finally, if the shell isn’t interrupted and a child exits, Bash just carries on regardless of whether the signal died abnormally or not. This is why interrupting the sleep just continues with the loop.

In case one would like to handle such cases, Bash sets the exit code to `128+signal` when the process exits abnormally, so interrupting sleep with SIGINT would give the exit code 130 (`kill -l` lists the signal values).

Bonus problem:

```I have this C app, testpg:
int main() {
setsid();
return sleep(10);
}

I run `bash -c './testpg'` and press Ctrl-C. The app is killed.
Shouldn't testpg be excluded from SIGINT, since it used setsid?```

A quick strace unravels this mystery: with a single command to execute, bash execve’s it directly — a little optimization trick. Since the pid is the same and already had its own process group, creating a new one doesn’t have any effect.

This trick can’t be used if there are more commands, so `bash -c './testpg; true'` can’t be killed with Ctrl-C.

Errata:

```Wait, I started a loop in one terminal and killed the shell in another.
The loop exited!
```

Yes it did! This does not apply to interactive shells, which have different ways of handling signals. When job control is enabled (running interactively, or when running a script with `bash -m`), the shell will die when SIGINTed

Here’s the description from the bash source code, `jobs.c:2429`:

```  /* Ignore interrupts while waiting for a job run without job control
to finish.  We don't want the shell to exit if an interrupt is
received, only if one of the jobs run is killed via SIGINT.
...
```

## Why Bash is like that: suid

July 26th, 2011

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

`Why can't bash scripts be SUID?`

Bash scripts can’t run with the suid bit set. First of all, Linux doesn’t allow any scripts to be setuid, though some other OS do. Second, bash will detect being run as setuid, and immediately drop the privileges.

This is because shell script security is extremely dependent on the environment, much more so than regular C apps.

Take this script, for example, addmaildomain:

```#!/bin/sh
[[ \$1 ]] || { man -P cat \$0; exit 1; }

if grep -q "^\$(whoami)\\$" /etc/accesslist
then
echo "\$1" > /etc/mail/local-host-names
else
echo "You don't have permissions to add hostnames"
fi```

The intention is to allow users in /etc/accesslist to run `addmaildomain example.com` to write new names to local-host-names, the file which defines which domains sendmail should accept mail for.

Let’s imagine it runs as suid root. What can we do to abuse it?

We can start by setting the path:

```echo "rm -rf /" > ~/hax/grep && chmod a+x ~/hax/grep

Now the script will run our grep instead of the system grep, and we have full root access.

Let’s assume the author was aware of that, had set `PATH=/bin:/usr/bin` as the first line in the script. What can we do now?

We can override a library used by grep

```gcc -shared -o libc.so.6 myEvilLib.c

When grep is invoked, it’ll link with our library and run our evil code.

Ok, so let’s say LD_LIBRARY_PATH is closed up.

If the shell is statically linked, we can set LD_TRACE_LOADED_OBJECTS=true. This will cause dynamically linked executables to print out a list of library dependencies and return true. This would cause our grep to always return true, subverting the test. The rest is builtin and wouldn’t be affected.

Even if the shell is statically compiled, all variables starting with LD_* will typically be stripped by the kernel for suid executables anyways.

There is a delay between the kernel starting the interpretter, and the interpretter opening the file. We can try to race it:

```while true
do
nice -n 20 ./foo &
echo 'rm -rf /' > foo
done```

But let’s assume the OS uses a /dev/fd/* mechanism for passing a fd, instead of passing the file name.

We can rename the script to confuse the interpretter:

```ln /usr/bin/addmaildomain ./-i
PATH=.
-i```

Now we’ve created a link, which retains suid, and named it “-i”. When running it, the interpretter will run as “/bin/sh -i”, giving us an interactive shell.

So let’s assume we actually had “#!/bin/sh –” to prevent the script from being interpretted as an option.

If we don’t know how to use the command, it helpfully lists info from the man page for us. We can compile a C app that executes the script with “\$0” containing “-P /hax/evil ls”, and then man will execute our evil program instead of cat.

So let’s say “\$0” is quoted. We can still set MANOPT=-H/hax/evil.

Several of these attacks were based on the inclusion of ‘man’. Is this a particularly vulnerable command?

Perhaps a bit, but a whole lot of apps can be affected by the environment in more and less dramatic ways.

• POSIXLY_CORRECT can make some apps fail or change their output
• LANG/LC_ALL can thwart interpretation of output
• LC_CTYPE and LC_COLLATE can modify string comparisons
• Some apps rely on HOME and USER
• Various runtimes have their own paths, like RUBY_PATH, LUA_PATH and PYTHONPATH
• Many utilities have variables for adding default options, like RUBYOPT and MANOPT
• Tools invoke EDITOR, VISUAL and PAGER under various circumstances

So yes, it’s better not to write suid shell scripts. Sudo is better than you at running commands safely.

Do remember that a script can invoke itself with sudo, if need be, for a simulated suid feel.

So wait, can’t perl scripts be suid?

They can indeed, but there the interpretter will run as the normal user and detect that the file is suid. It will then run a suid interpretter to deal with it.

## What’s in a SSH RSA key pair?

June 3rd, 2011

You probably have your own closely guarded ssh key pair. Chances are good that it’s based on RSA, the default choice in ssh-keygen.

RSA is a very simple and quite brilliant algorithm, and this article will show what a SSH RSA key pair contains, and how you can use those values to play around with and encrypt values using nothing but a calculator.

RSA is based on primes, and the difficulty of factoring large numbers. This post is not meant as an intro to RSA, but here’s a quick reminder. I’ll use mostly the same symbols as Wikipedia: you generate two large primes, p and q. Let φ = (p-1)(q-1). Pick a number e coprime to φ, and let d ≡ e^-1 mod φ.

The public key is then (e, n), while your private key is (d, n). To encrypt a number/message m, let the ciphertext c ≡ m^e mod n. Then m ≡ c^d mod n.

This is very simple modular arithmetic, but when you generate a key pair with ssh-keygen, you instead get a set of opaque and scary looking files, id_rsa and id_rsa.pub. Here’s a bit from the private key id_rsa (no passphrase):

``` -----BEGIN RSA PRIVATE KEY----- MIIBygIBAAJhANj3rl3FhzmOloVCXXesVPs1Wa++fIBX7BCZ5t4lmMh36KGzkQmn jDJcm+O9nYhoPx6Bf+a9yz0HfzbfA5OpqQAyC/vRTVDgHhGXY6HFP/lyWQ8DRzCh tsuP6eq9RYHnxwIBIwJhAKdf+4oqqiUWOZn//vXrV3/19LrGJYeU```

` `

```... -----END RSA PRIVATE KEY----- ```

How can we get our nice RSA parameters from this mess?

The easy way is with openssl: (I apologize in advance for all the data spam in the rest of the article).

``` vidar@vidarholen ~/.ssh \$ openssl rsa -text -noout < id_rsa Private-Key: (768 bit) modulus: 00:d8:f7:ae:5d:c5:87:39:8e:96:85:42:5d:77:ac: 54:fb:35:59:af:be:7c:80:57:ec:10:99:e6:de:25: ... publicExponent: 35 (0x23) privateExponent: 00:a7:5f:fb:8a:2a:aa:25:16:39:99:ff:fe:f5:eb: 57:7f:f5:f4:ba:c6:25:87:94:48:64:93:fb:3d:a7: ... prime1: ... prime2: ... exponent1: ... exponent2: ... coefficient: ... ```

Here, modulus is n, publicExponent is e, privateExponent is d, prime1 is p, prime2 is q, exponent1 is dP from the Wikipedia article, exponent2 is dQ and coefficient is qInv.

Only the first three are strictly required to perform encryption and decryption. The latter three are for optimization and the primes are for verification.

It’s interesting to note that even though the private key from RSA’s point of view is (d,n), the OpenSSH private key file includes e, p, q and the rest as well. This is how it can generate public keys given the private ones. Otherwise, finding e given (d,n) is just as hard as finding d given (e,n), except e is conventionally chosen to be small and easy to guess for efficiency purposes.

If we have one of these hex strings on one line, without colons, and in uppercase, then bc can work on them and optionally convert to decimal.

``` # If you don't want to do this yourself, see end for a script vidar@vidarholen ~/.ssh \$ { echo 'ibase=16'; cat | tr -d ':\n ' | tr a-f A-F; echo; } | bc 00:d8:f7:ae:5d:c5:87:39:8e:96:85:42:5d:77:ac: 54:fb:35:59:af:be:7c:80:57:ec:10:99:e6:de:25: 98:c8:77:e8:a1:b3:91:09:a7:8c:32:5c:9b:e3:bd: .... Ctrl-d to end input 13158045936463264355006370413708684112837853704660293756254884673628\ 63292... ```

We also need a power-modulo function, since b^e % m is unfeasibly slow if you go by way of b^e. Luckily, bc is programmable.

``` vidar@vidarholen ~/.ssh \$ bc bc 1.06.94 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'.```

```# Our powermod function: define pmod(b,e,m) { if(e == 0 ) return 1; if(e == 1) return b%m; rest=pmod(b^2%m,e/2,m); if((e%2) == 1) return (b*rest)%m else return rest; } ```

``` #Define some variables (this time unabbreviated) n=13158045936463264355006370413708684112837853704660293756254884673628\ 63292777770859554071108633728590995985653161363101078779505801640963\ 48597350763180843221886116453606059623113097963206649790257715468881\ 4303031148479239044926138311 e=35 d=10150492579557375359576342890575270601332058572166512326253768176799\ 23111571423234513140569517447770196903218153051479115016036905320557\ 80231250287900874055062921398102953416891810163858645414303785372309\ 5688315939617076008144563059 # Encrypt the number 12345 c=pmod(12345, e, n) # Show the encrypted number c 15928992191730477535088375321366468550579140816267293144554503305092\ 03492035891240033089011563910196180080894311697511846432462334632873\ 53515625 ```

``` #Decrypt the number pmod(c, d, n) 12345 ```

Yay, we’ve successfully encrypted and decrypted a value using real life RSA parameters!

What’s in the public key file, then?

`ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAGEA2PeuXcWHOY6WhUJdd6xU+zVZr758gFfsEJnm3iWYyHfoobORCaeMMlyb472diGg/HoF/5r3LPQd/Nt8Dk6mpADIL+9FNUOAeEZdjocU/+XJZDwNHMKG2y4/p6r1FgefH vidar@vidarholen.spam`

This is a very simple file format, but I don’t know of any tools that will decode it. Simply base64-decode the middle string, and then read 4 bytes of length, followed by that many bytes of data. Repeat three times. You will then have key type, e and n, respectively.

Mine is 00 00 00 07, followed by 7 bytes “ssh-rsa”. Then 00 00 00 01, followed by one byte of 0x23 (35, our e). Finally, 00 00 00 61 followed by 0x61 = 97 bytes of our modulus n.

If you want to decode the private key by hand, base64-decode the middle bit. This gives you an ASN.1 encoded sequence of integers.

This is an annotated hex dump of parts of a base64-decoded private key

```30 82 01 ca   - Sequence, 0x01CA bytes
02 01: Integer, 1 byte
00
02 61:    - Integer, 0x61 bytes (n).
00 d8 f7 ae 5d c5 87 39 8e 96 ... Same as from openssl!
02 01:  - Integer, 1 byte, 0x23=35 (e)
23
02 61  - Integer, 0x61 bytes (d)
00 a7 5f fb 8a 2a aa 25 16 39 ...
...```

Here’s a bash script that will decode a private key and output variable definitions and functions for bc, so that you can play around with it without having to do the copy-paste work yourself. It decodes ASN.1, and only requires OpenSSL if the key has a passphrase.

When run, and its output pasted into bc, you will have the variables n, e, d, p, q and a few more, functions encrypt(m) and decrypt(c), plus a verify() that will return 1 if the key is valid. These functions are very simple and transparent.

Enjoy!

## MP3 to Video using GStreamer visualizations

January 25th, 2011

Everyone loves music visualization, but not all apps support it in a sensible way. Maybe you want to shuffle a random assortment of video and audio files in a player that doesn’t handle that well (VLC!), or not at all (mplayer!). Or maybe you want to upload something to youtube, with gorgeous HD visualizations instead of that lame static cover art image?

The few google results on the topic that weren’t spam suggested screencapping software. Yeah, that’s great… until you have more than two files.

Once again, everyone’s favourite multimedia swiss army knife – GStreamer – steps up to the plate.

Here’s an example of encoding an MP3 to an H.264 .mkv file using the gorgeous goom visualizer (requires the mp3 and x264 plugins for gstreamer):

``` gst-launch filesrc location=input.mp3 ! queue ! tee name=stream ! queue ! mp3parse ! matroskamux name=mux ! filesink location="output.mkv" stream. ! queue ! mp3parse ! mad ! audioconvert ! queue ! goom ! ffmpegcolorspace ! video/x-raw-yuv,width=1280,height=720 ! x264enc ! mux. ```

It’s beautiful – and the video is pretty sweet as well.

It’s worth noting that this approach does not re-encode the MP3, like some less awesome approaches would do (causing loss of quality). It simply muxes it together with the visualizer’s video stream. x264 even seems to distribute itself well across cores.

No, wait, what? MP3 and H.264? Of course, I meant Vorbis and Theora! Let me rephrase:

``` gst-launch filesrc location=input.ogg ! queue ! tee name=stream ! queue ! oggdemux ! vorbisparse ! oggmux name=mux ! filesink location="output.ogg" stream. ! queue ! oggdemux ! vorbisdec ! audioconvert ! queue ! goom ! ffmpegcolorspace ! video/x-raw-yuv,width=1920,height=1080 ! theoraenc ! mux. ```

The same goodness applies, except for the parallelism. If you have a multicore CPU, there’s massive speedup to be had through simple shell script based multithreading. (Why full HD this time? VLC on Windows crashes on 720p Theora!)

And there you have it. A simple, hack-free, modular and flexible way of encoding visualization videos for MP3 and Ogg Vorbis files. Thanks, GStreamer!

## Is it terminal?

November 9th, 2010

Applications often behave differently in subtle ways when stdout is not a terminal. Most of the time, this is done so smoothly that the user isn’t even aware of it.

When it works like magic

Consider `ls`:

```vidar@vidarholen ~/src \$ ls
PyYAML-3.09      bsd-games-2.17       nltk-2.0b9
alsa-lib-1.0.23  libsamplerate-0.1.7  pulseaudio-0.9.21
bash-4.0         linux                tmp
bitlbee-1.2.8    linux-2.6.32.8
vidar@vidarholen ~/src \$```

Now, say we want a list of projects in our ~/src dir, ignoring version numbers:

```# For novelty purposes only; parsing ls is a bad idea
vidar@vidarholen ~/src \$ ls | sed -n 's/-[^-]*\$//p'
PyYAML
alsa-lib
bash
bitlbee
bsd-games
libsamplerate
linux
nltk
pulseaudio
vidar@vidarholen ~/src \$```

Piece of cake, right?

But think about the magic that actually happened there: We started out with three lines of coloured text, ran it through sed to search&replace on each line, and ended up with nine lines of uncoloured text.

How did sed filter the colours? How did it put each filename a separate line, when the same does not happen for `echo "foo bar" | sed ..`?

The answer, of course, is that it didn’t. `ls` detected that output wasn’t a terminal and altered its output accordingly.

When outputting to a terminal, you can be fairly sure that the user will be reading it directly, so you can make it as pretty and unparsable as you want. When output is not a terminal, it’s likely going to some program or file where pretty output will just complicate things.

Life without magic

Try the previous example with `ls -C --color=always` instead of just `ls`, and see how different life would have been without this terminal detection. You can also try this with xargs, to see how colours could break things:

```vidar@vidarholen ~/src \$ ls -C --color=always | xargs ls -ld
ls: cannot access PyYAML-3.09: No such file or directory
ls: cannot access alsa-lib-1.0.23: No such file or directory
...```

The directories obviously exist, but the ANSI escape codes that give them that cute colour also prevents utilities from working with them. For additional fun, copy-pasting this error message from a terminal strips the colours, so anyone you reported it to would be quite stumped.

Magic efficiency tricks

It’s not all about making output pretty or parsable depending on the situation. Read/write syscalls are notoriously expensive; reading anything less than about 4k bytes at a time will make disk reads CPU bound.

glibc knows this, and will alter write buffering depending on the context. If the output is a terminal, a user is probably watching and waiting for it, so it will flush output immediately. If it’s a file, it’s better to buffer it up for efficiency:

``` vidar@kelvin ~ \$ strace -e write -o log grep God text/bible12.txt 01:001:001 In the beginning God created the heaven and the earth. ... vidar@kelvin ~ \$ wc -l log 3948 log ```

In other words, grep wrote about god 3948 times (insert your own bible forum jokes).

``` vidar@kelvin ~ \$ strace -e write -o log grep God text/bible12.txt > tmp vidar@kelvin ~ \$ wc -l log 64 log ```

This time, grep produced the exact same output, but wrote to a file instead. This resulted in 64 writes – about 1% of the more interactive mode!

Spells of confusion

Sometimes magic can confuse and astound. What if output is kinda like a terminal, only not?

`ls -l` gives the user pretty colours. `ls -l | more` does not. The reason is not at all obvious for users who just consider ” | more” a way to scroll in output. But it works, even if it’s not as pretty as we’d like.

Here’s a much more confusing example (just go along with the simplified grep):

```# Show apache traffic (works)
cat access.log

# Show 404 errors with line numbers (works)
cat access.log | grep 404 | nl```

Basic stuff.

```# Show apache traffic in realtime (works)
tail -f access.log

# Show 404 errors with line numbers in realtime (FAILS)
tail -f access.log | grep 404 | nl```

While the logic is the same as before, our realtime error log doesn’t show anything!

Why? Because grep’s output isn’t a terminal, so it will buffer up about 4k worth of data before writing it all in one go. In the mean time, the command will just seem to hang for no apparent reason!

(Observant readers might ask, “Isn’t tail buffering?”. And it might be or it might not. It depends on your version and distro patches.)

Mastering magic

Ok, so what can we do to take charge of these useful peculiarities?

Many apps have flags for this, though none of them are POSIX.

GNU `ls` lets you specify `-C` for columned mode, and `--color=always` for colours, regardless of the nature of stdout.

`sed` has `-u`, `grep` has a `--line-buffered`. `awk` has a `fflush` function. `tail`, if yours buffers at all, has a `-u` since about 2008 which as of now isn’t in debian stable.

If your app doesn’t have such an option, there’s always `unbuffer` from Expect, the interactive tool scripting package.

`unbuffer` starts applications within its own pseudo-tty, much like how xterm and sshd does it. This usually tricks the application into not buffering (and maybe to prettify its output).

Obviously, this depends on the app using standard C stdio, or that it checks for a terminal itself. Apps can unintentionally be written to avoid this, like when setting Java’s System.Out to a BufferedOutputStream.

And finally… how can you create such behaviour yourself?

```if [[ -t 1 ]] #if stdout is a terminal
then
tput setaf 3 #Set foreground to yellow
fi
echo "Pure gold"```