Buffers and windows: The mystery of ‘ssh’ and ‘while read’ in excessive detail

If you’ve ever tried to use ssh, and similarly ffmpeg or mplayer, in a while read loop, you’ll have stumbled upon a surprising interaction: the loop mysteriously aborts after the first iteration!

The solution, using ssh -n or ssh < /dev/null, is quickly spoiled by ShellCheck (shellcheck this code online), but why stop there? Let’s take a deep dive into the technical details surrounding this issue.

Note that all numbers given here are highly tool and platform specific. They apply to GNU coreutils 8.26, Linux 4.9 and OpenSSH 7.4, as found on Debian in July, 2017. If you use a different platform, or even just sufficiently newer versions, and wish to repeat the experiments, you may have to follow the reasoning and update the numbers accordingly.

Anyways, to first demonstrate the problem, here’s a while read loop that runs a command for each line in a file:

while IFS= read -r host
do
  echo ssh "$host" uptime
done < hostlist.txt

It works perfectly and iterates over all lines in the file:

ssh localhost uptime
ssh 10.0.0.4 uptime
ssh 10.0.0.7 uptime

However, if we remove the echo and actually run ssh, it will stop after the first iteration with no warnings or errors:

 16:12:41 up 21 days,  4:24, 12 users,  load average: 0.00, 0.00, 0.00

Even uptime itself works fine, but ssh localhost uptime will stop after the first one, even though it runs the same command on the same machine.

Of course, applying the aforementioned fix, ssh -n or ssh < /dev/null solves the problem and gives the expected result:

16:14:11 up 21 days,  4:24, 12 users,  load average: 0.00, 0.00, 0.00
16:14:11 up 14 days,  6:59, 15 users,  load average: 0.00, 0.00, 0.00
01:14:13 up 73 days, 13:17,  8 users,  load average: 0.08, 0.15, 0.11

If we merely wanted to fix the problem though, we'd just have followed ShellCheck's advice from the start. Let's keep digging.

You see similar behavior if you try to use ffmpeg to convert media or mplayer to play it. However, now it's even worse: not only does it stop after one iteration, it may abort in the middle of the first one!

All other commands work fine -- even other converters, players and ssh-based commands like sox, vlc and scp. Why do certain commands fail?

The root of the problem is that when you pipe or redirect to a while read loop, you're not just redirecting to read but to the entire loop body. Everything in both condition and body will share the same file descriptor for standard input. Consider this loop:

while IFS= read -r line
do
  cat > rest
done < file.txt

First read will successfully read a line and start the first iteration. Then cat will read from the same input source, where read left off. It reads until EOF and exits, and the loop iterates. read again tries to read from the same input, which remains at EOF. This terminates the loop. In effect, the loop only iterated once.

The question remains, though: why do our three commands in particular drain stdin?

ffmpeg and mplayer are simple cases. They both accept keyboard controls from stdin.

While ffmpeg encodes a video, you can use '+' to make the process more verbose or 'c' to input filter commands. While mplayer plays a video, you can use 'space' to pause or 'm' to mute. The commands drain stdin while processing these keystrokes.

They both also share a shortcut to quit: they will stop abruptly if any of the input they read is a "q".

But why ssh? Shouldn't it mirror the behavior of the remote command? If uptime doesn't read anything, why should ssh localhost uptime?

The Unix process model has no good way to detect when a process wants input. Instead, ssh has to preemptively read data, send it over the wire, and have sshd offer it on a pipe to the process. If the process doesn't want it, there's no way to return the data to the FD from whence it came.

We get a toy version of the same problem with cat | uptime. Output in this case is the same as when using ssh localhost uptime:

 16:25:51 up 21 days,  4:34, 12 users,  load average: 0.16, 0.03, 0.01

In this case, cat will read from stdin and write to the pipe until the pipe's buffer is full, at which time it'll block until something reads. Using strace, we can see that GNU cat from coreutils 8.26 uses a 128KiB buffer -- more than Linux's current 64KiB pipe buffer -- so one 128KiB buffer is the amount of data we can expect to lose.

This implies that the loop doesn't actually abort. It will continue if there is still data left after 128KiB has been read from it. Let's try that:

{
  echo first
  for ((i=0; i < 16384; i++)); do echo "garbage"; done
  echo "second"
} > file

while IFS= read -r line
do
  echo "Read $line"
  cat | uptime > /dev/null
done < file

Here, we write 16386 lines to the file. "first", 16384 lines of "garbage", followed by "second". "garbage" + linefeed is 8 bytes, so 16384 of them make up exactly 128KiB. The file prevents any race conditions between producer and consumer.

Here's what we get:

Read first
Read second

If we add a single line additional line of "garbage", we'll see that instead. If we write one less, "second" disappears. In other words, the expected 128KiB of data were lost between iterations.

ssh has to do the same thing, but more: it has to read input, encrypt it, and transmit it over the wire. On the other side, sshd receives it, decrypts it, and feeds it into the pipe. Both sides work asynchronously in duplex mode, and one side can shut down the channel at any time.

If we use ssh localhost uptime we're racing to see how much data we can push before sshd notifies us that the command has already exited. The faster the computer and slower the roundtrip time, the more we can write. To avoid this and ensure deterministic results, we'll use sleep 5 instead of uptime from now on.

Here's one way of measuring how much data we write:

$ tee >(wc -c >&2) < /dev/zero | { sleep 5; }
65536

Of course, by showing how much it writes, it doesn't directly show how much sleep reads: the 65536 bytes here is the Linux pipe buffer size.

This is also not a general way to get exact measurements because it relies on buffers aligning perfectly. If nothing is reading from the pipe, you can successfully write two blocks of 32768 bytes, but only one block of 32769.

Fortunately, GNU tee currently uses a buffer size of 8192, so given 8 full reads, it will perfectly fill the 65536 byte pipe buffer. strace also reveals that ssh (OpenSSH 7.4) uses a buffer size of 16384, which is exactly 2x of tee and 1/4x of the pipe buffer, so they will all align nicely and give an accurate count.

Let's try it with ssh:

$ tee >(wc -c >&2) < /dev/zero | ssh localhost sleep 5
2228224 

As discussed, we'll subtract the pipe buffer, so we can surmise that 2162688 bytes has been read by ssh. We can verify this manually with strace if we want. But why 2162688?

On the other side, sshd has to feed this data into sleep through a pipe with no readers. That's another 65536. We're now left with 2097152 bytes. How can we account for these?

This number is in fact the OpenSSH transport layer's default window size for non-interactive channels!

Here's an excerpt from channels.h in the OpenSSH source code:

/* default window/packet sizes for tcp/x11-fwd-channel */
#define CHAN_SES_PACKET_DEFAULT	(32*1024)
#define CHAN_SES_WINDOW_DEFAULT	(64*CHAN_SES_PACKET_DEFAULT)

There it is: 64*32*1024 = 2097152.

If we adapt our previous example to use ssh anyhost sleep 5 and write "garbage"
(64*32*1024+65536)/8 = 270336 times, we can again game the buffers and cause our iterator to get exactly the lines we want:

{
  echo first
  for ((i=0; i < $(( (64*32*1024 + 65536) / 8)); i++)); do echo "garbage"; done
  echo "second"
} > file

while IFS= read -r line
do
  echo "Read $line"
  ssh localhost sleep 5
done < file

Again, this results in:

Read first
Read second

An entirely useless experiment of course, but pretty nifty!

Compiling Haskell for Windows on Travis CI

or: How I finally came around and started appreciating Docker.

tl;dr: ShellCheck is now automatically compiled for Windows using Wine+GHC in Docker, without any need for additional Windows CI.

I don’t know what initially surprised me more: that people were building ShellCheck on Windows before and without WSL, or that it actually worked.

Unless you’re a fan of the language, chances are that if you run any Haskell software at all, it’s one of pandoc, xmonad, or shellcheck. Unlike GCC, Haskell build tools are not something you ever just happen to have lying around.

While Haskell is an amazing language, it doesn’t come cheap. At 550 MB, GHC, the Haskell Compiler, is the single largest package on my Debian system. On Windows, the Haskell Platform — GHC plus standard tools and libraries — weighs in at 4,200 MB.

If you need to build your own software from source, 4 GB of build dependencies unique to a single application is enough to make you reconsider. This is especially true on Windows where you can’t just close your eyes and hit “yes” in your package manager.

Thanks to the awesome individuals who package ShellCheck for various distros, most people have no idea. To them, ShellCheck is just a 5 MB download without external dependencies.

Starting today, this includes Windows users!

This is obviously great for them, but the more interesting story is how this happens.

There’s no parallel integration with yet another CI system like Appveyor, eating the cost of Windows licenses in the hopes of future business. There’s not been a rise of a much-needed de facto standard package manager, with generous individuals donating their time.

It’s also not me booting Windows at home to manually compile executables on every release, nor a series of patches trying to convince GHC to target Windows from GNU/Linux.

It’s a Docker container with GHC and Cabal running in Wine.

Ugly? Yes. Does it matter? No. The gory details are all hidden away by Docker.

Anyone, including Travis CI, can now easily and automatically compile ShellCheck (or any other Haskell project for that matter) for Windows in two lines, without a Windows license.

If you want ShellCheck binaries for Windows, they’re linked to on the ShellCheck github repo. If you want to take a look at the Docker image, there’s a repo for that too.

ShellCheck has had an official Docker build for quite a while, but it was contribution (thanks, Nikyle!). I never really had any feelings for Docker, one way or the other.

Consider me converted.

[ -z $var ] works unreasonably well

There is a subreddit /r/nononoyes for videos of things that look like they’ll go horribly wrong, but amazingly turn out ok.

[ -z $var ] would belong there.

It’s a bash statement that tries to check whether the variable is empty, but it’s missing quotes. Most of the time, when dealing with variables that can be empty, this is a disaster.

Consider its opposite, [ -n $var ], for checking whether the variable is non-empty. With the same quoting bug, it becomes completely unusable:

Input Expected [ -n $var ]
“” False True!
“foo” True True
“foo bar” True False!

These issues are due to a combination of word splitting and the fact that [ is not shell syntax but traditionally just an external binary with a funny name. See my previous post Why Bash is like that: Pseudo-syntax for more on that.

The evaluation of [ is defined in terms of the number of argument. The argument values have much less to do with it. Ignoring negation, here’s a simplified excerpt from POSIX test:

# Arguments Action Typical example
0 False [ ]
1 True if $1 is non-empty [ "$var" ]
2 Apply unary operator $1 to $2 [ -x "/bin/ls" ]
3 Apply binary operator $2 to $1 and $3 [ 1 -lt 2 ]

Now we can see why [ -n $var ] fails in two cases:

When the variable is empty and unquoted, it’s removed, and we pass 1 argument: the literal string “-n”. Since “-n” is not an empty string, it evaluates to true when it should be false.

When the variable contains foo bar and is unquoted, it’s split into two arguments, and so we pass 3: “-n”, “foo” and “bar”. Since “foo” is not a binary operator, it evaluates to false (with an error message) when it should be true.

Now let’s have a look at [ -z $var ]:

Input Expected [ -z $var ] Actual test
“” True: is empty True 1 arg: is “-z” non-empty
“foo” False: not empty False 2 args: apply -z to foo
“foo bar” False: not empty False (error) 3 args: apply “foo’ to -z and bar

It performs a completely wrong and unexpected action for both empty strings and multiple arguments. However, both cases fail in exactly the right way!

In other words, [ -z $var ] works way better than it has any possible business doing.

This is not to say you can skip quoting of course. For “foo bar”, [ -z $var ] in bash will return the correct exit code, but prints an ugly error in the process. For ” ” (a string with only spaces), it returns true when it should be false, because the argument is removed as if empty. Bash will also incorrectly pass var="foo -o x" because it ends up being a valid test through code injection.

The moral of the story? Same as always: quote, quote quote. Even when things appear to work.

ShellCheck is aware of this difference, and you can check the code used here online. [ -n $var ] gets an angry red message, while [ -z $var ] merely gets a generic green quoting warning.

Swearing in the Linux kernel: now interactive

Graphs showing a rise in "crap" and fall in "fuck" over time.
 

If you’ve followed discussions on Linux, you may at some point have bumped into a funny graph showing how many times frustrated Linux kernel developers have put four letter words into the source code.

Today, for the first time in 12 years, it’s gotten a major revamp!

You can now interactively plot any words of your choice with commit level granularity.

 

Did you find any interesting insights? Post a comment!

Technically correct: floating point calculations in bc

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc
3.142857142

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
16777215
> truncate (16777216 + 1 :: Float)
16777216

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:

scale=30

define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
}
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  i=bits
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)
}

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"
}

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"
3.142857142857142857142857142857

$ bc -ql fp <<< "float(22/7)"
3.142857193946838378906250000000

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
  where
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

Useless Use Of dd

tl;dr: dd works for reading and writing disks, but it has no advantages and some disadvantages over just treating the disk as a regular file. In many commands it’s entirely pointless.

If you’ve ever used dd, you’ve probably used it to read or write disk images:

# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M

Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use dd. Want to write to a raw device? Use dd.

This belief adds complexity to simple commands. How do you combine dd with gzip? How do you use pv if the source is raw device? How do you dd over ssh?

People cleverly find ways to insert dd at the front and end of pipelines. dd if=/dev/sda | gzip > image.gz, they say. dd if=/dev/sda | pv | dd of=/dev/sdb.

In both these cases, dd serves no purpose. It’s purely a superstitious charm trying to ensure safe passage of the data. It’s like cat /dev/sda | pv | cat > /dev/sdb except not as efficient.

The fact of the matter is, dd is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.

dd just reads and writes file.

On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and dd can be used to copy files, dd be used to copy raw disks.

But do you know what else can read and write files? Everything:

# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb

# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso

# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz

dd can even end up doing a worse job. By specification, its default 512 block size has had to remain unchanged for decades. Today, this tiny size makes it CPU bound by default. A script that doesn’t specify a block size is very inefficient, and any script that picks the current optimal value may slowly become obsolete — or start obsolete if it’s copied from

Meanwhile, cat is free to choose its buffer size that best serves a modern system, and the GNU cat buffer size has grown steadily over the years from 512 bytes in 1991 to 131072 bytes in 2014. ./src/ioblksize.h in the coreutils source code has benchmarks backing up this decision.

However, this does not mean that dd should necessarily be categorically shunned! The reason why people started using it in the first place is that it does exactly what it’s told: no more and no less.

If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.

dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.

However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.

This is not to say I think dd is overrated! Au contraire! It’s one of my favorite Unix tools!

dd is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd! Want to open a file with O_SYNC? Use dd! Want to read groups of three byte pixels from a PPM file? Use dd!

It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to and inappropriately hailed for its most generic and least interesting capability: simply copying a file from start to finish.


* dd actually has two jobs: Convert and Copy. A post on comp.unix.misc (incorrectly) claimed that the intended name “cc” was taken by the C compiler, so the letters were shifted in the same way we ended up with a Window system called X. A more likely explanation is given in that thread as pointed out by PaweĊ‚ and Bruce in the comments: the name, syntax and purpose is almost identical to the JCL “Dataset Definition” command found in 1960s IBM mainframes.