Why Bash is like that: Builtin or not

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why don't the options in "man time" work?
time -f %w myapp

Short answer: ‘time’ runs the builtin version, ‘man time’ shows the external version

time is a builtin in the shell, as well as an external command (this also goes for kill, pwd, and test). The man time shows info about the external command, while help time shows the internal one.

To run the external version, one can use command time or /usr/bin/time or just \time.

The reason why time is built in is so timing pipelines will work properly. time true | sleep 10 would say 0 seconds with an external command (which can’t know what it’s being piped into), and while the internal version can say 10 seconds since it knows about the whole pipeline.

POSIX leaves the behaviour of time a | b undefined.

# This finds the full path to ls. Why isn't there a 'man type'?
type -P ls

type is a bash builtin, not an external command. This allows it to take shell functions and aliases into account, something whereis can’t.

Builtins are documented in man bash, or more conveniently, “help type” (help is also a builtin).

Why Bash is like that: Order of expansion

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why can't you use variables in {}?
for i in {0..$n}; do ..

Short answer: {..} is expanded before $n

Shell execution is based on successive expansions. {..} is evaluated early, before variable expansion, and thus you can’t use variables in them.

This also implies that you can use {..} in variable names, like a1=foo; a2=bar; echo $a{1,2}.

Instead, use for ((i=0; i<n; i++)); do ....

# Why aren't any of the Linux rename tools like Windows 'ren'?
touch foo0001.jpg foo0002.jpg
ren foo*.jpg bar*.jpg        # Windows
rename foo bar foo*.jpg      # Coreutils rename
rename 's/foo/bar/' foo*.jpg # Perl (debian) rename

Short answer: globs are expanded before the command sees them

Bash expands globs before running the command. This means that running rename foo*.jpg bar*.jpg is exactly the same as running rename foo0000.jpg foo0001.jpg .... Since rename can’t know what pattern was originally used, it has to use an alternative syntax.

Of course, you could write a rename where you quote the globs, like rename "foo*.jpg" "bar*.jpg", but that’s not simpler than the coreutils version. It just adds complexity, edge cases and general confusion.

There have been proposals for environment variables to set so that commands can see the shell arguments with globs intact, but that has its own problems so they weren’t widely used.

Why Bash is like that: Command expansion

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why does esc contain "x1B" instead of the escape char?
esc=`printf \\x1B`

Short answer: “ requires another level of backslash escaping

To embed a backtick inside “, you escape it with a backslash. To embed a backslash, you escape it with another backslash. So `printf \\x1B` actually runs printf \x1B, and the shell interprets \x as a literal x with a superfluous escape. In other words, printf just sees “x1B”, and that’s what you get.

The problem grows exponentially as you try to nest ` `.

$(..) has distinct start and stop characters, so they can be used and nested without adding layers of backslashes. In this particular case you’d use esc=$'\x1B', and in general you could use esc=$(printf \\x1B).

# Why is newline empty instead of containing the line feed character?
newline=$(printf "\n")
echo "hello${newline}world"

Short answer: “ and $(..) strips trailing line feeds

$(..) and “ always strip trailing line feeds from command output. This is that special kind of magic that works so well you never think about it. echo “Hello $(whoami), how are you?” comes out as one line even though “whoami” (and basically all other commands) writes the username followed by a line feed.

This causes problems here because the output is only a single \n, i.e. the empty string followed by a trailing line feed. In this case, you’d again use newline=$'\n', but you could also have done newline=$(printf '\n'; printf x); newline=${newline#x} (append a x and then remove it), so that the line feeds in the output aren’t trailing.

# Bonus: Exercise for the reader
# Try adding the appropriate quotes and
# escapes to this quote normalization example
normalized=`echo ``Go on'', he said | sed -e s/`/'/g`

The variable should contain ''Go on'', he said. Can you get it right on the first try?

Why Bash is like that: Pseudo-syntax

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

 # Why doesn't || work in [ ] ?
 if [ -f /etc/inetd.conf || -d /etc/xinetd.d ]; then .. 

# And why does it work in [[ ]] ?
 if [[ -f /etc/inetd.conf || -d /etc/xinetd.d ]]; then ..

Short answer: [ is a regular command, and can’t override ||. [[ is shell syntax.

[ is a pseudo-syntactical command. That is, it’s a regular command much like cp or grep, the name just happens to be a single opening square bracket (try ls -l /usr/bin/[).

Seeing as how it’s a regular command, it can’t affect shell grammar. In grep -q kittens file || echo kittens >> file, grep can’t know or do anything about the fact that it’s being used with “||”. The same goes for [ in the example.

In Bash (but not necessarily other shells), [ is now a builtin command emulating /usr/bin/[ for efficiency. There’s no reason why Bash couldn’t make [ a || b ] work, but this would break compatibility.

Instead, we have [[ which is not bound by legacy, and does in fact alter shell syntax. [[ interprets ||, &&, globs and unquoted variable expansions in ways that an external [-command couldn’t, and an internal [ therefore can’t either.

To get around it without using [[, we’d do [ a ] || [ b ] or [ a -o b ].

# Why is this always true (should be false when foo is empty)?
if [ -n $foo ]

Short answer: $foo disappears and [ -n ] is shorthand for [ -n “-n” ], which is true

This comes back to the fact that [ is a regular command. If $foo is empty, the shell runs [ -n ], and there’s no way for [ to know that there was a variable that was expanded out.

[ -n x ] checks that x is not empty. [ x ] is shorthand for the same. [ -n ] therefore checks if dash-n is empty, which it isn’t, and thus the expression is always true.

Why doesn’t [ complain if you have -n without a parameter? Same thing again. It wouldn’t be able to tell that you said [ $1 ] rather than [ -n ], and then scripts would break whenever you work with your parameter list or any other strings starting with a dash.

Instead, you’d use [ -n "$foo" ], which will prevent bash from removing foo when it’s empty, and instead send in a zero-length argument. Or you could use [[ -n $foo ]], since [[ is built into the shell grammar and can tell that there is a variable there.

# Why doesn't this work?
if [ grep -q text myfile ]

Short answer: [ is a command, grep is a command. Choose one.

if [ -n "$foo" ] is the same as if test -n "$foo", which makes it more obvious how to use other commands (if grep -q text myfile).

Pseudo-syntactical elements makes it harder to tell language from implementation for better and for worse. It makes code look neat, but can be confusing. Especially when basically all if-statements include [.

A similar effect is seen in here-documents (cmd << EOF), where many people only ever see the delimiter “EOF” and think it’s some kind of keyword. Next time, use cmd << KITTENS to tip off anyone reading your script.

MP3 to Video using GStreamer visualizations

VLC showing a sparkly shiny visualization
Everyone loves music visualization, but not all apps support it in a sensible way. Maybe you want to shuffle a random assortment of video and audio files in a player that doesn’t handle that well (VLC!), or not at all (mplayer!). Or maybe you want to upload something to youtube, with gorgeous HD visualizations instead of that lame static cover art image?

The few google results on the topic that weren’t spam suggested screencapping software. Yeah, that’s great… until you have more than two files.

Once again, everyone’s favourite multimedia swiss army knife – GStreamer – steps up to the plate.

Here’s an example of encoding an MP3 to an H.264 .mkv file using the gorgeous goom visualizer (requires the mp3 and x264 plugins for gstreamer):


gst-launch filesrc location=input.mp3 ! queue ! tee name=stream ! queue ! mp3parse ! matroskamux name=mux ! filesink location="output.mkv" stream. ! queue ! mp3parse ! mad ! audioconvert ! queue ! goom ! ffmpegcolorspace ! video/x-raw-yuv,width=1280,height=720 ! x264enc ! mux.

It’s beautiful – and the video is pretty sweet as well.

It’s worth noting that this approach does not re-encode the MP3, like some less awesome approaches would do (causing loss of quality). It simply muxes it together with the visualizer’s video stream. x264 even seems to distribute itself well across cores.

No, wait, what? MP3 and H.264? Of course, I meant Vorbis and Theora! Let me rephrase:


gst-launch filesrc location=input.ogg ! queue ! tee name=stream ! queue ! oggdemux ! vorbisparse ! oggmux name=mux ! filesink location="output.ogg" stream. ! queue ! oggdemux ! vorbisdec ! audioconvert ! queue ! goom ! ffmpegcolorspace ! video/x-raw-yuv,width=1920,height=1080 ! theoraenc ! mux.

The same goodness applies, except for the parallelism. If you have a multicore CPU, there’s massive speedup to be had through simple shell script based multithreading. (Why full HD this time? VLC on Windows crashes on 720p Theora!)

And there you have it. A simple, hack-free, modular and flexible way of encoding visualization videos for MP3 and Ogg Vorbis files. Thanks, GStreamer!

Is it terminal?

Applications often behave differently in subtle ways when stdout is not a terminal. Most of the time, this is done so smoothly that the user isn’t even aware of it.

When it works like magic

Consider ls:

vidar@vidarholen ~/src $ ls
PyYAML-3.09      bsd-games-2.17       nltk-2.0b9
alsa-lib-1.0.23  libsamplerate-0.1.7  pulseaudio-0.9.21
bash-4.0         linux                tmp
bitlbee-1.2.8    linux-2.6.32.8
vidar@vidarholen ~/src $

Now, say we want a list of projects in our ~/src dir, ignoring version numbers:

# For novelty purposes only; parsing ls is a bad idea
vidar@vidarholen ~/src $ ls | sed -n 's/-[^-]*$//p'
PyYAML
alsa-lib
bash
bitlbee
bsd-games
libsamplerate
linux
nltk
pulseaudio
vidar@vidarholen ~/src $

Piece of cake, right?

But think about the magic that actually happened there: We started out with three lines of coloured text, ran it through sed to search&replace on each line, and ended up with nine lines of uncoloured text.

How did sed filter the colours? How did it put each filename a separate line, when the same does not happen for echo "foo bar" | sed ..?

The answer, of course, is that it didn’t. ls detected that output wasn’t a terminal and altered its output accordingly.

When outputting to a terminal, you can be fairly sure that the user will be reading it directly, so you can make it as pretty and unparsable as you want. When output is not a terminal, it’s likely going to some program or file where pretty output will just complicate things.

Life without magic

Try the previous example with ls -C --color=always instead of just ls, and see how different life would have been without this terminal detection. You can also try this with xargs, to see how colours could break things:

vidar@vidarholen ~/src $ ls -C --color=always | xargs ls -ld
ls: cannot access PyYAML-3.09: No such file or directory
ls: cannot access alsa-lib-1.0.23: No such file or directory
...

The directories obviously exist, but the ANSI escape codes that give them that cute colour also prevents utilities from working with them. For additional fun, copy-pasting this error message from a terminal strips the colours, so anyone you reported it to would be quite stumped.

Magic efficiency tricks

It’s not all about making output pretty or parsable depending on the situation. Read/write syscalls are notoriously expensive; reading anything less than about 4k bytes at a time will make disk reads CPU bound.

glibc knows this, and will alter write buffering depending on the context. If the output is a terminal, a user is probably watching and waiting for it, so it will flush output immediately. If it’s a file, it’s better to buffer it up for efficiency:


vidar@kelvin ~ $ strace -e write -o log grep God text/bible12.txt
01:001:001 In the beginning God created the heaven and the earth.
...
vidar@kelvin ~ $ wc -l log
3948 log

In other words, grep wrote about god 3948 times (insert your own bible forum jokes).


vidar@kelvin ~ $ strace -e write -o log grep God text/bible12.txt > tmp
vidar@kelvin ~ $ wc -l log
64 log

This time, grep produced the exact same output, but wrote to a file instead. This resulted in 64 writes – about 1% of the more interactive mode!

Spells of confusion

Sometimes magic can confuse and astound. What if output is kinda like a terminal, only not?

ls -l gives the user pretty colours. ls -l | more does not. The reason is not at all obvious for users who just consider ” | more” a way to scroll in output. But it works, even if it’s not as pretty as we’d like.

Here’s a much more confusing example (just go along with the simplified grep):

# Show apache traffic (works)
cat access.log

# Show 404 errors with line numbers (works)
cat access.log | grep 404 | nl

Basic stuff.

# Show apache traffic in realtime (works)
tail -f access.log

# Show 404 errors with line numbers in realtime (FAILS)
tail -f access.log | grep 404 | nl

While the logic is the same as before, our realtime error log doesn’t show anything!

Why? Because grep’s output isn’t a terminal, so it will buffer up about 4k worth of data before writing it all in one go. In the mean time, the command will just seem to hang for no apparent reason!

(Observant readers might ask, “Isn’t tail buffering?”. And it might be or it might not. It depends on your version and distro patches.)

Mastering magic

Ok, so what can we do to take charge of these useful peculiarities?

Many apps have flags for this, though none of them are POSIX.

GNU ls lets you specify -C for columned mode, and --color=always for colours, regardless of the nature of stdout.

sed has -u, grep has a --line-buffered. awk has a fflush function. tail, if yours buffers at all, has a -u since about 2008 which as of now isn’t in debian stable.

If your app doesn’t have such an option, there’s always unbuffer from Expect, the interactive tool scripting package.

unbuffer starts applications within its own pseudo-tty, much like how xterm and sshd does it. This usually tricks the application into not buffering (and maybe to prettify its output).

Obviously, this depends on the app using standard C stdio, or that it checks for a terminal itself. Apps can unintentionally be written to avoid this, like when setting Java’s System.Out to a BufferedOutputStream.

And finally… how can you create such behaviour yourself?

if [[ -t 1 ]] #if stdout is a terminal
then
    tput setaf 3 #Set foreground to yellow
fi
echo "Pure gold"