[ -z $var ] works unreasonably well

There is a subreddit /r/nononoyes for videos of things that look like they’ll go horribly wrong, but amazingly turn out ok.

[ -z $var ] would belong there.

It’s a bash statement that tries to check whether the variable is empty, but it’s missing quotes. Most of the time, when dealing with variables that can be empty, this is a disaster.

Consider its opposite, [ -n $var ], for checking whether the variable is non-empty. With the same quoting bug, it becomes completely unusable:

Input Expected [ -n $var ]
“” False True!
“foo” True True
“foo bar” True False!

These issues are due to a combination of word splitting and the fact that [ is not shell syntax but traditionally just an external binary with a funny name. See my previous post Why Bash is like that: Pseudo-syntax for more on that.

The evaluation of [ is defined in terms of the number of argument. The argument values have much less to do with it. Ignoring negation, here’s a simplified excerpt from POSIX test:

# Arguments Action Typical example
0 False [ ]
1 True if $1 is non-empty [ "$var" ]
2 Apply unary operator $1 to $2 [ -x "/bin/ls" ]
3 Apply binary operator $2 to $1 and $3 [ 1 -lt 2 ]

Now we can see why [ -n $var ] fails in two cases:

When the variable is empty and unquoted, it’s removed, and we pass 1 argument: the literal string “-n”. Since “-n” is not an empty string, it evaluates to true when it should be false.

When the variable contains foo bar and is unquoted, it’s split into two arguments, and so we pass 3: “-n”, “foo” and “bar”. Since “foo” is not a binary operator, it evaluates to false (with an error message) when it should be true.

Now let’s have a look at [ -z $var ]:

Input Expected [ -z $var ] Actual test
“” True: is empty True 1 arg: is “-z” non-empty
“foo” False: not empty False 2 args: apply -z to foo
“foo bar” False: not empty False (error) 3 args: apply “foo’ to -z and bar

It performs a completely wrong and unexpected action for both empty strings and multiple arguments. However, both cases fail in exactly the right way!

In other words, [ -z $var ] works way better than it has any possible business doing.

This is not to say you can skip quoting of course. For “foo bar”, [ -z $var ] in bash will return the correct exit code, but prints an ugly error in the process. For ” ” (a string with only spaces), it returns true when it should be false, because the argument is removed as if empty. Bash will also incorrectly pass var="foo -o x" because it ends up being a valid test through code injection.

The moral of the story? Same as always: quote, quote quote. Even when things appear to work.

ShellCheck is aware of this difference, and you can check the code used here online. [ -n $var ] gets an angry red message, while [ -z $var ] merely gets a generic green quoting warning.

Swearing in the Linux kernel: now interactive

Graphs showing a rise in "crap" and fall in "fuck" over time.
 

If you’ve followed discussions on Linux, you may at some point have bumped into a funny graph showing how many times frustrated Linux kernel developers have put four letter words into the source code.

Today, for the first time in 12 years, it’s gotten a major revamp!

You can now interactively plot any words of your choice with commit level granularity.

 

Did you find any interesting insights? Post a comment!

Technically correct: floating point calculations in bc

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc
3.142857142

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
16777215
> truncate (16777216 + 1 :: Float)
16777216

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:

scale=30

define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
}
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  i=bits
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)
}

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"
}

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"
3.142857142857142857142857142857

$ bc -ql fp <<< "float(22/7)"
3.142857193946838378906250000000

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
  where
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

dd is not a disk writing tool

If you’ve ever used dd, you’ve probably used it to read or write disk images:

# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M

Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use dd. Want to write to a raw device? Use dd.

This belief can make simple tasks complicated. How do you combine dd with gzip? How do you use pv if the source is raw device? How do you dd over ssh?

The fact of the matter is, dd is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.

dd just reads and writes file.

On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and dd can copy files, dd be used to copy raw disks.

But do you know what else can read and write files? Everything:

# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb

# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso

# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz

However, this does not mean that dd is useless! The reason why people started using it in the first place is that it does exactly what it’s told, no more and no less.

If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.

dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files. When this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.

This is not to say I think dd is overrated! Au contraire! It’s one of my favorite Unix tools!

dd is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd! Want to open a file with O_SYNC? Use dd! Want to read groups of three byte pixels from a PPM file? Use dd!

It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to and inappropriately hailed for its most generic and least interesting capability: simply copying a file from start to finish.


* dd actually has two jobs: Convert and Copy. A post on comp.unix.misc (incorrectly) claimed that the intended name “cc” was taken by the C compiler, so the letters were shifted in the same way we ended up with a Window system called X. A more likely explanation is given in that thread as pointed out by PaweĊ‚ and Bruce in the comments: the name, syntax and purpose is almost identical to the JCL “Dataset Definition” command found in 1960s IBM mainframes.

I’m not paranoid, you’re just foolish

Remember this dialog from when you installed your distro?

Fake dialog saying "In the event of physical theft, grant perpetrators access to" with options for "My browsing history, My email and social media, My photos and documents, and similar". All boxes are checked by default.

Most distros have a step like this. If you don’t immediately recognize it, you might have used a different installer with different wording. For example, the graphical Ubuntu installer calls it “Encrypt the new Ubuntu installation for security”, while the text installer even more opaquely calls it “Use entire disk and set up encrypted LVM”.

Somehow, some people have gotten it into their heads that not granting the new owner access to all your data after they steal your computer is a sign of paranoia. It’s 2015, and there actually exists people who have information based jobs and spend half their lives online, who not only think disk encryption is unnecessary but that it’s a sign you’re doing something illegal.

I have no idea what kind of poorly written crime dramas or irrational prime ministers they get this ridiculous notion from.

An office desk covered in broken glass after a break-in.
The last time my laptop was stolen from a locked office building.

Here’s a photo from 2012, when my company laptop was taken from a locked office with an alarm system.

Was this the inevitable FBI raid I was expecting and encrypted my drive to thwart?

Or was it a junkie stealing an office computer from a company and user who, thanks to encryption, didn’t have to worry about online accounts, design documents, or the source code for their unreleased product?

Hundreds of thousands of computers are lost or stolen every year. I’m not paranoid for using disk encryption, you’re just foolish if you don’t.

Why Bash Is Like That: Rewrite hacks

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Let’s say you wanted to enforce a policy in which no files on the system could contain swearing. How would you write a script that checks it? Let’s use the word “damn”, and let’s write a script “checklanguage” that checks whether a file contains that word.

Our first version might be:

#!/usr/bin/env bash
grep -q "damn" "$@" 

The problem with this is that it triggers on itself: ./checklanguage checklanguage returns true. How can we write the script in such a way that it reliably detects the word, but doesn’t detect itself? (Think about it for a second).

There are many ways of doing this: a="da"; b="mn"; grep "$a$b", grep "da""mn", grep da\mn. All of these check for the four characters d-a-m-n in sequence, but doesn’t contain the sequence itself. These methods rely on two things being A. identical in one context (shell script) and B. different in another (plaintext).

This type of trick is the basis of three common command line hacks:

Finding processes from ps, while excluding the grep that does the filtering.

If we do a simple ps ax | grep processname, we might get output like this:

$ ps ax | grep processname
13003 pts/2    S      0:00 /bin/bash ./processname
13496 pts/4    R+     0:00 grep --color=auto processname

How do we get the same list, but without the grep process? You’ll see people wrapping the first character in square brackets:

$ ps ax | grep "[p]rocessname"
13003 pts/2    S      0:00 /bin/bash ./processname

In this case, the regex “[p]rocessname” is identical to the regex “processname”, but since they’re written differently, the latter matches itself while the former doesn’t. This means that the grep won’t match itself, and we only get the process we’re interested in (this job is better done by pgrep).

There is no syntax rule that says “if the first character is enclosed in square brackets, grep shall ignore itself in ps output”.

It’s just a logical side effect of rewriting the regex to work the same but not match itself. We could have used grep -E 'process()name' or grep -E 'proces{2}name' instead.

Running commands instead of aliases

Maybe you’re sick of Debian’s weird perl rename, and you aliased it to rename.ul instead.

$ rename -v .htm .html *
`foo.htm' -> `foo.html'

Yay, that’s way easier than writing regex! But what if we need to use the unaliased rename?

$ rename -v 's/([1-9])x([0-9]*)/S$1E$2/' *
rename.ul: not enough arguments

Instead, you’ll see people prefixing the command with a backslash:

$ \rename -v 's/([1-9])x([0-9]*)/S0$1E$2/' *
Foo_1x20.mkv renamed as Foo_S01E20.mkv

Shell aliases trigger when a command starts with a word. However, if the command starts with something that expands into a word, alias expansion does not apply. This allows us to use e.g. \ls or \git to run the command instead of the alias.

There is no syntax rule that says that “if a command is preceded by a backslash, alias expansion is ignored”.

It’s just a logical side effect of rewriting the command to work the same, but not start with a literal token that the shell will recognize as an alias. We could also have used l\s or 'ls'.

Deleting files starting with a dash

How would you go about deleting a file that starts with a dash?

$ rm -v -file
rm: invalid option -- 'l'

Instead, you’ll see people prefixing the filename with ./:

$ rm -v ./-file
removed `./-file'

A command will interpret anything that starts with a dash as a flag. However, to the file system, -file and ./-file mean exactly the same thing.

There is no syntax rule that says that “if an argument starts with ./, it shall be interpretted as a filename and not an option”.

It’s just a logical side effect of rewriting a filename to refer to the same file, but start with a different character. We could have used rm /home/me/-file or rm ../me/-file instead.


Homework: What do you tell someone who thinks that ./myscript is a perfect example of how weird UNIX is? Why would anyone design a system where the run command is “./” instead of “run”?