Buffers and windows: The mystery of ‘ssh’ and ‘while read’ in excessive detail

If you’ve ever tried to use ssh, and similarly ffmpeg or mplayer, in a while read loop, you’ll have stumbled upon a surprising interaction: the loop mysteriously aborts after the first iteration!

The solution, using ssh -n or ssh < /dev/null, is quickly spoiled by ShellCheck (shellcheck this code online), but why stop there? Let’s take a deep dive into the technical details surrounding this issue.

Note that all numbers given here are highly tool and platform specific. They apply to GNU coreutils 8.26, Linux 4.9 and OpenSSH 7.4, as found on Debian in July, 2017. If you use a different platform, or even just sufficiently newer versions, and wish to repeat the experiments, you may have to follow the reasoning and update the numbers accordingly.

Anyways, to first demonstrate the problem, here’s a while read loop that runs a command for each line in a file:

while IFS= read -r host
  echo ssh "$host" uptime
done < hostlist.txt

It works perfectly and iterates over all lines in the file:

ssh localhost uptime
ssh uptime
ssh uptime

However, if we remove the echo and actually run ssh, it will stop after the first iteration with no warnings or errors:

 16:12:41 up 21 days,  4:24, 12 users,  load average: 0.00, 0.00, 0.00

Even uptime itself works fine, but ssh localhost uptime will stop after the first one, even though it runs the same command on the same machine.

Of course, applying the aforementioned fix, ssh -n or ssh < /dev/null solves the problem and gives the expected result:

16:14:11 up 21 days,  4:24, 12 users,  load average: 0.00, 0.00, 0.00
16:14:11 up 14 days,  6:59, 15 users,  load average: 0.00, 0.00, 0.00
01:14:13 up 73 days, 13:17,  8 users,  load average: 0.08, 0.15, 0.11

If we merely wanted to fix the problem though, we'd just have followed ShellCheck's advice from the start. Let's keep digging.

You see similar behavior if you try to use ffmpeg to convert media or mplayer to play it. However, now it's even worse: not only does it stop after one iteration, it may abort in the middle of the first one!

All other commands work fine -- even other converters, players and ssh-based commands like sox, vlc and scp. Why do certain commands fail?

The root of the problem is that when you pipe or redirect to a while read loop, you're not just redirecting to read but to the entire loop body. Everything in both condition and body will share the same file descriptor for standard input. Consider this loop:

while IFS= read -r line
  cat > rest
done < file.txt

First read will successfully read a line and start the first iteration. Then cat will read from the same input source, where read left off. It reads until EOF and exits, and the loop iterates. read again tries to read from the same input, which remains at EOF. This terminates the loop. In effect, the loop only iterated once.

The question remains, though: why do our three commands in particular drain stdin?

ffmpeg and mplayer are simple cases. They both accept keyboard controls from stdin.

While ffmpeg encodes a video, you can use '+' to make the process more verbose or 'c' to input filter commands. While mplayer plays a video, you can use 'space' to pause or 'm' to mute. The commands drain stdin while processing these keystrokes.

They both also share a shortcut to quit: they will stop abruptly if any of the input they read is a "q".

But why ssh? Shouldn't it mirror the behavior of the remote command? If uptime doesn't read anything, why should ssh localhost uptime?

The Unix process model has no good way to detect when a process wants input. Instead, ssh has to preemptively read data, send it over the wire, and have sshd offer it on a pipe to the process. If the process doesn't want it, there's no way to return the data to the FD from whence it came.

We get a toy version of the same problem with cat | uptime. Output in this case is the same as when using ssh localhost uptime:

 16:25:51 up 21 days,  4:34, 12 users,  load average: 0.16, 0.03, 0.01

In this case, cat will read from stdin and write to the pipe until the pipe's buffer is full, at which time it'll block until something reads. Using strace, we can see that GNU cat from coreutils 8.26 uses a 128KiB buffer -- more than Linux's current 64KiB pipe buffer -- so one 128KiB buffer is the amount of data we can expect to lose.

This implies that the loop doesn't actually abort. It will continue if there is still data left after 128KiB has been read from it. Let's try that:

  echo first
  for ((i=0; i < 16384; i++)); do echo "garbage"; done
  echo "second"
} > file

while IFS= read -r line
  echo "Read $line"
  cat | uptime > /dev/null
done < file

Here, we write 16386 lines to the file. "first", 16384 lines of "garbage", followed by "second". "garbage" + linefeed is 8 bytes, so 16384 of them make up exactly 128KiB. The file prevents any race conditions between producer and consumer.

Here's what we get:

Read first
Read second

If we add a single line additional line of "garbage", we'll see that instead. If we write one less, "second" disappears. In other words, the expected 128KiB of data were lost between iterations.

ssh has to do the same thing, but more: it has to read input, encrypt it, and transmit it over the wire. On the other side, sshd receives it, decrypts it, and feeds it into the pipe. Both sides work asynchronously in duplex mode, and one side can shut down the channel at any time.

If we use ssh localhost uptime we're racing to see how much data we can push before sshd notifies us that the command has already exited. The faster the computer and slower the roundtrip time, the more we can write. To avoid this and ensure deterministic results, we'll use sleep 5 instead of uptime from now on.

Here's one way of measuring how much data we write:

$ tee >(wc -c >&2) < /dev/zero | { sleep 5; }

Of course, by showing how much it writes, it doesn't directly show how much sleep reads: the 65536 bytes here is the Linux pipe buffer size.

This is also not a general way to get exact measurements because it relies on buffers aligning perfectly. If nothing is reading from the pipe, you can successfully write two blocks of 32768 bytes, but only one block of 32769.

Fortunately, GNU tee currently uses a buffer size of 8192, so given 8 full reads, it will perfectly fill the 65536 byte pipe buffer. strace also reveals that ssh (OpenSSH 7.4) uses a buffer size of 16384, which is exactly 2x of tee and 1/4x of the pipe buffer, so they will all align nicely and give an accurate count.

Let's try it with ssh:

$ tee >(wc -c >&2) < /dev/zero | ssh localhost sleep 5

As discussed, we'll subtract the pipe buffer, so we can surmise that 2162688 bytes has been read by ssh. We can verify this manually with strace if we want. But why 2162688?

On the other side, sshd has to feed this data into sleep through a pipe with no readers. That's another 65536. We're now left with 2097152 bytes. How can we account for these?

This number is in fact the OpenSSH transport layer's default window size for non-interactive channels!

Here's an excerpt from channels.h in the OpenSSH source code:

/* default window/packet sizes for tcp/x11-fwd-channel */
#define CHAN_SES_PACKET_DEFAULT	(32*1024)

There it is: 64*32*1024 = 2097152.

If we adapt our previous example to use ssh anyhost sleep 5 and write "garbage"
(64*32*1024+65536)/8 = 270336 times, we can again game the buffers and cause our iterator to get exactly the lines we want:

  echo first
  for ((i=0; i < $(( (64*32*1024 + 65536) / 8)); i++)); do echo "garbage"; done
  echo "second"
} > file

while IFS= read -r line
  echo "Read $line"
  ssh localhost sleep 5
done < file

Again, this results in:

Read first
Read second

An entirely useless experiment of course, but pretty nifty!

Swearing in the Linux kernel: now interactive

Graphs showing a rise in "crap" and fall in "fuck" over time.

If you’ve followed discussions on Linux, you may at some point have bumped into a funny graph showing how many times frustrated Linux kernel developers have put four letter words into the source code.

Today, for the first time in 12 years, it’s gotten a major revamp!

You can now interactively plot any words of your choice with commit level granularity.


Did you find any interesting insights? Post a comment!

Why Bash Is Like That: Rewrite hacks

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Let’s say you wanted to enforce a policy in which no files on the system could contain swearing. How would you write a script that checks it? Let’s use the word “damn”, and let’s write a script “checklanguage” that checks whether a file contains that word.

Our first version might be:

#!/usr/bin/env bash
grep -q "damn" "$@" 

The problem with this is that it triggers on itself: ./checklanguage checklanguage returns true. How can we write the script in such a way that it reliably detects the word, but doesn’t detect itself? (Think about it for a second).

There are many ways of doing this: a="da"; b="mn"; grep "$a$b", grep "da""mn", grep da\mn. All of these check for the four characters d-a-m-n in sequence, but doesn’t contain the sequence itself. These methods rely on two things being A. identical in one context (shell script) and B. different in another (plaintext).

This type of trick is the basis of three common command line hacks:

Finding processes from ps, while excluding the grep that does the filtering.

If we do a simple ps ax | grep processname, we might get output like this:

$ ps ax | grep processname
13003 pts/2    S      0:00 /bin/bash ./processname
13496 pts/4    R+     0:00 grep --color=auto processname

How do we get the same list, but without the grep process? You’ll see people wrapping the first character in square brackets:

$ ps ax | grep "[p]rocessname"
13003 pts/2    S      0:00 /bin/bash ./processname

In this case, the regex “[p]rocessname” is identical to the regex “processname”, but since they’re written differently, the latter matches itself while the former doesn’t. This means that the grep won’t match itself, and we only get the process we’re interested in (this job is better done by pgrep).

There is no syntax rule that says “if the first character is enclosed in square brackets, grep shall ignore itself in ps output”.

It’s just a logical side effect of rewriting the regex to work the same but not match itself. We could have used grep -E 'process()name' or grep -E 'proces{2}name' instead.

Running commands instead of aliases

Maybe you’re sick of Debian’s weird perl rename, and you aliased it to rename.ul instead.

$ rename -v .htm .html *
`foo.htm' -> `foo.html'

Yay, that’s way easier than writing regex! But what if we need to use the unaliased rename?

$ rename -v 's/([1-9])x([0-9]*)/S$1E$2/' *
rename.ul: not enough arguments

Instead, you’ll see people prefixing the command with a backslash:

$ \rename -v 's/([1-9])x([0-9]*)/S0$1E$2/' *
Foo_1x20.mkv renamed as Foo_S01E20.mkv

Shell aliases trigger when a command starts with a word. However, if the command starts with something that expands into a word, alias expansion does not apply. This allows us to use e.g. \ls or \git to run the command instead of the alias.

There is no syntax rule that says that “if a command is preceded by a backslash, alias expansion is ignored”.

It’s just a logical side effect of rewriting the command to work the same, but not start with a literal token that the shell will recognize as an alias. We could also have used l\s or 'ls'.

Deleting files starting with a dash

How would you go about deleting a file that starts with a dash?

$ rm -v -file
rm: invalid option -- 'l'

Instead, you’ll see people prefixing the filename with ./:

$ rm -v ./-file
removed `./-file'

A command will interpret anything that starts with a dash as a flag. However, to the file system, -file and ./-file mean exactly the same thing.

There is no syntax rule that says that “if an argument starts with ./, it shall be interpretted as a filename and not an option”.

It’s just a logical side effect of rewriting a filename to refer to the same file, but start with a different character. We could have used rm /home/me/-file or rm ../me/-file instead.

Homework: What do you tell someone who thinks that ./myscript is a perfect example of how weird UNIX is? Why would anyone design a system where the run command is “./” instead of “run”?

Basics of a Bash action game

If you want to write an action game in bash, you need the ability to check for user input without actually waiting for it. While bash doesn’t let you poll the keyboard in a great way, it does let you wait for input for a miniscule amount of time with read -t 0.0001.

Here’s a snippet that demonstrates this by bouncing some text back and forth, and letting the user control position and color. It also sets (and unsets) the necessary terminal settings for this to look good:

#!/usr/bin/env bash

# Reset terminal on exit
trap 'tput cnorm; tput sgr0; clear' EXIT

# invisible cursor, no echo
tput civis
stty -echo

text="j/k to move, space to color"
max_x=$(($(tput cols) - ${#text}))
dir=1 x=1 y=$(($(tput lines)/2))

while sleep 0.05 # GNU specific!
    # move and change direction when hitting walls
    (( x == 0 || x == max_x )) && \
        ((dir *= -1))
    (( x += dir ))

    # read all the characters that have been buffered up
    while IFS= read -rs -t 0.0001 -n 1 key
        [[ $key == j ]] && (( y++ ))
        [[ $key == k ]] && (( y-- ))
        [[ $key == " " ]] && color=$((color%7+1))

    # batch up all terminal output for smoother action
        tput cup "$y" "$x"
        tput setaf "$color"
        printf "%s" "$text"

    # dump to screen
    printf "%s" "$framebuffer"

Making bash run DOS/Windows CRLF EOL scripts

If you for any reason use a Windows editor to write scripts, it can be annoying to remember to convert them and bash fails in mysterious ways when you don’t. Let’s just get rid of that problem once and for all:

cat > $'/bin/bash\r' << "EOF"
#!/usr/bin/env bash
exec bash <(tr -d '\r' < "$script") "$@"

This allows you to execute scripts with DOS/Windows \r\n line endings with ./yourscript (but it will fail if the script specifies parameters on the shebang, or if you run it with bash yourscript). It works because from a UNIX point of view, DOS/Windows files specify the interpretter as "bash^M", and we override that to clean the script and run bash on the result.

Of course, you can also replace the helpful exec bash part with echo "Run dos2unix on your file!" >&2 if you'd rather give your users a helpful reminder rather than compatibility or a crazy error.

Approaches to data recovery

There are a lot of howtos and tutorials for using data recovery tools in Linux, but far less on how to choose a recovery tool or approach in the first place. Here’s an overview with suggestions for which route to go or tool to use:

Cause Outlook Tools
Forgotten login password Fantastic Any livecd
This barely qualifies as data recovery, but is included for completeness. If you forget the login password, you can just boot a livecd and mount the drive to access the files. You can also chroot into it and reset the password. Google “linux forgot password”.
Accidentally deleting files in use Excellent lsof, cp
When accidentally deleting a file that is still in use by some process — like an active log file or the source of a video you’re encoding — make sure the process doesn’t exit (sigstop if necessary) and copy the file from the /proc file handle. Google “lsof recover deleted files”
Accidentally deleting other files Fair for harddisks, bad for SSDs testdisk, ext3grep, extundelete
When deleting a file that’s not currently being held open, stop as much disk activity as you can to prevent the data from being overwritten. If you’re using an SSD, the data was probably irrevocably cleared within seconds, so bad luck there. Proceed with an fs specific undeletion tool: Testdisk can undelete NTFS, VFAT and ext2, extundelete/ext3grep can help with ext3 and ext4. Google “YourFS undeletion”. If you can’t find an undeletion tool for your file systems, or if it fails, try PhotoRec.
Trashing the MBR or deleting partitions Excellent gpart (note: not gparted), testdisk
If you delete a partition with fdisk or recover the MBR from a backup while forgetting that it also contains a partition table, gpart or testdisk will usually easily recover them. If you overwrite any more than the first couple of kilobytes though, it’s a different ballgame. Just don’t confuse gpart (guess partitions) with gparted (gtk/graphical partition editor). Google “recover partition table”.
Reformatting a file system Depends on fs e2fsck, photorec, testdisk
If you format the wrong partition, recovery depends on the old and new file system. Try finding unformat/recovery tools for your old fs. Accidentally formatting a ext3 fs to ntfs (like Windows helpfully suggests when it detects a Linux fs) can often be almost completely reverted by running fsck with an alternate superblock. Google “ext3 alternate superblock recovery” or somesuch.

Reformatting ext2/3/4 with ext2/3/4 will tend to overwrite the superblocks, making this harder. Consider PhotoRec.

Repartition and reinstall Depends on progress
If you ran a distro installer and accidentally repartitioned and reformatted a disk, try treating it as a case of deleted partitions plus reformatted partitions as described above. Chances of recovery are smaller the more files the installer copied to the partitions. If all else fails, PhotoRec.
Bad sectors and drive errors Ok, depending on extent ddrescue
If the drive has errors, use ddrescue to get as much of the data as possible onto another drive, then treat it as a corrupted file system. Try the fs’ fsck tool, or if the drive is highly corrupted, PhotoRec.
Lost encryption key Very bad bash, cryptsetup
I don’t know of any tools made for attempting to crack a LUKS password, though you can generate permutations and script a simple cracker if you have limited number of permutations (“it was Swordfish with some l33t, and a few numbers at the end”). If you have no idea, or if your encryption software uses TPM (rare for Linux), you’re screwed.
Reformatted or partially overwritten LUKS partition Horrible
LUKS uses your passphrase to encrypt a master key, and stores this info at the start of the partition. If this gets overwritten, you’re screwed even if you know the passphrase.
Other kinds of corruptions or unknown FS Indeterminable PhotoRec, strings, grep
PhotoRec searches by file signature, and can therefore recover files from a boatload of FS and scenarios, though you’ll often lose filenames and hierarchies. If you have important ASCII data, strings can dump ASCII text regardless of FS, and you can grep that as a last resort.

If you have other suggestions for scenarios, tools or approaches, leave a commment. Otherwise, I’ll wish you a speedy recovery!