Why Bash is like that: Signal propagation

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

How do I simulate pressing Ctrl-C when running this in a script:

while true; do echo sleeping; sleep 30; done

Are you thinking “SIGINT, duh!”? Hold your horses!

I tried kill -INT pid, but it doesn’t work the same: Ctrl-C kills the sleep and the loop SIGINTing the shell does nothing (but only in scripts: see Errata)

SIGINTing sleep makes the loop continue with the next iteration

HOWEVER, if I run the script in the background and kill -INT %1 instead of kill -INT pid, THEN it works :O

Why does Ctrl-C terminate the loop, while SIGINT doesn’t?

Additionally, if I run the same loop with ping or top instead of sleep, Ctrl-C doesn’t terminate the loop either!

Yeah. Well… Yeah…

This behaviour is due to an often overlooked feature in UNIX: process groups. These are important for getting terminals and shells to work the way they do.

A process group is exactly what it sounds like: a group of processes. They have a leader, which is the process that created it using setpgrp(2). The leader’s pid is also the process group id. Child processes are in the same group as their parent by default.

Terminals keep track of the foreground process group (set by the shell using tcsetpgrp(3)). When receiving a Ctrl-C, they send the SIGINT to the entire foreground group. This means that all members of the group will receive SIGINT, not just the immediate process.

kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid! This explains why it works like Ctrl-C.

You can do the same thing with kill -INT -pgrpid. Since the process group id is the same as the process group leader, you can kill the group by killing the pid with a minus in front.

But why do you have to kill both?

When the shell is interrupted, it will wait for the running command to exit. If this child’s status indicates it exited abnormally due to that signal, the shell cleans up, removes its signal handler, and kills itself again to trigger the OS default action (abnormal exit). Alternatively, it runs the script’s signal handler as set with trap, and continues.

If the shell is interrupted and the child’s status says it exited normally, then Bash assumes the child handled the signal and did something useful, so it continues executing. Ping and top both trap SIGINT and exit normally, which is why Ctrl-C doesn’t kill the loop when calling them.

This also explains why interrupting just the shell does nothing: the child exits normally, so the shell thinks the child handled the signal, though in reality it was never received.

Finally, if the shell isn’t interrupted and a child exits, Bash just carries on regardless of whether the signal died abnormally or not. This is why interrupting the sleep just continues with the loop.

In case one would like to handle such cases, Bash sets the exit code to 128+signal when the process exits abnormally, so interrupting sleep with SIGINT would give the exit code 130 (kill -l lists the signal values).

Bonus problem:

I have this C app, testpg:
int main() {
    setsid();
    return sleep(10);
}

I run bash -c './testpg' and press Ctrl-C. The app is killed. Shouldn't testpg be excluded from SIGINT, since it used setsid?

A quick strace unravels this mystery: with a single command to execute, bash execve’s it directly — a little optimization trick. Since the pid is the same and already had its own process group, creating a new one doesn’t have any effect.

This trick can’t be used if there are more commands, so bash -c './testpg; true' can’t be killed with Ctrl-C.

Errata:

Wait, I started a loop in one terminal and killed the shell in another. 
The loop exited!

Yes it did! This does not apply to interactive shells, which have different ways of handling signals. When job control is enabled (running interactively, or when running a script with bash -m), the shell will die when SIGINTed

Here’s the description from the bash source code, jobs.c:2429:

  /* Ignore interrupts while waiting for a job run without job control
     to finish.  We don't want the shell to exit if an interrupt is
     received, only if one of the jobs run is killed via SIGINT. 
   ...

Why Bash is like that: suid

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Why can't bash scripts be SUID?

Bash scripts can’t run with the suid bit set. First of all, Linux doesn’t allow any scripts to be setuid, though some other OS do. Second, bash will detect being run as setuid, and immediately drop the privileges.

This is because shell script security is extremely dependent on the environment, much more so than regular C apps.

Take this script, for example, addmaildomain:

#!/bin/sh
[[ $1 ]] || { man -P cat $0; exit 1; } 

if grep -q "^$(whoami)\$" /etc/accesslist
then
    echo "$1" > /etc/mail/local-host-names
else
    echo "You don't have permissions to add hostnames"
fi

The intention is to allow users in /etc/accesslist to run addmaildomain example.com to write new names to local-host-names, the file which defines which domains sendmail should accept mail for.

Let’s imagine it runs as suid root. What can we do to abuse it?

We can start by setting the path:

echo "rm -rf /" > ~/hax/grep && chmod a+x ~/hax/grep
PATH=~/hax addmaildomain

Now the script will run our grep instead of the system grep, and we have full root access.

Let’s assume the author was aware of that, had set PATH=/bin:/usr/bin as the first line in the script. What can we do now?

We can override a library used by grep

gcc -shared -o libc.so.6 myEvilLib.c
LD_LIBRARY_PATH=. addmaildomain

When grep is invoked, it’ll link with our library and run our evil code.

Ok, so let’s say LD_LIBRARY_PATH is closed up.

If the shell is statically linked, we can set LD_TRACE_LOADED_OBJECTS=true. This will cause dynamically linked executables to print out a list of library dependencies and return true. This would cause our grep to always return true, subverting the test. The rest is builtin and wouldn’t be affected.

Even if the shell is statically compiled, all variables starting with LD_* will typically be stripped by the kernel for suid executables anyways.

There is a delay between the kernel starting the interpretter, and the interpretter opening the file. We can try to race it:

while true
do
    ln /usr/bin/addmaildomain foo
    nice -n 20 ./foo &
    echo 'rm -rf /' > foo
done

But let’s assume the OS uses a /dev/fd/* mechanism for passing a fd, instead of passing the file name.

We can rename the script to confuse the interpretter:

ln /usr/bin/addmaildomain ./-i
PATH=.
-i

Now we’ve created a link, which retains suid, and named it “-i”. When running it, the interpretter will run as “/bin/sh -i”, giving us an interactive shell.

So let’s assume we actually had “#!/bin/sh –” to prevent the script from being interpretted as an option.

If we don’t know how to use the command, it helpfully lists info from the man page for us. We can compile a C app that executes the script with “$0” containing “-P /hax/evil ls”, and then man will execute our evil program instead of cat.

So let’s say “$0” is quoted. We can still set MANOPT=-H/hax/evil.

Several of these attacks were based on the inclusion of ‘man’. Is this a particularly vulnerable command?

Perhaps a bit, but a whole lot of apps can be affected by the environment in more and less dramatic ways.

  • POSIXLY_CORRECT can make some apps fail or change their output
  • LANG/LC_ALL can thwart interpretation of output
  • LC_CTYPE and LC_COLLATE can modify string comparisons
  • Some apps rely on HOME and USER
  • Various runtimes have their own paths, like RUBY_PATH, LUA_PATH and PYTHONPATH
  • Many utilities have variables for adding default options, like RUBYOPT and MANOPT
  • Tools invoke EDITOR, VISUAL and PAGER under various circumstances

So yes, it’s better not to write suid shell scripts. Sudo is better than you at running commands safely.

Do remember that a script can invoke itself with sudo, if need be, for a simulated suid feel.

So wait, can’t perl scripts be suid?

They can indeed, but there the interpretter will run as the normal user and detect that the file is suid. It will then run a suid interpretter to deal with it.

Why Bash is like that: Builtin or not

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why don't the options in "man time" work?
time -f %w myapp

Short answer: ‘time’ runs the builtin version, ‘man time’ shows the external version

time is a builtin in the shell, as well as an external command (this also goes for kill, pwd, and test). The man time shows info about the external command, while help time shows the internal one.

To run the external version, one can use command time or /usr/bin/time or just \time.

The reason why time is built in is so timing pipelines will work properly. time true | sleep 10 would say 0 seconds with an external command (which can’t know what it’s being piped into), and while the internal version can say 10 seconds since it knows about the whole pipeline.

POSIX leaves the behaviour of time a | b undefined.

# This finds the full path to ls. Why isn't there a 'man type'?
type -P ls

type is a bash builtin, not an external command. This allows it to take shell functions and aliases into account, something whereis can’t.

Builtins are documented in man bash, or more conveniently, “help type” (help is also a builtin).

Why Bash is like that: Order of expansion

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why can't you use variables in {}?
for i in {0..$n}; do ..

Short answer: {..} is expanded before $n

Shell execution is based on successive expansions. {..} is evaluated early, before variable expansion, and thus you can’t use variables in them.

This also implies that you can use {..} in variable names, like a1=foo; a2=bar; echo $a{1,2}.

Instead, use for ((i=0; i<n; i++)); do ....

# Why aren't any of the Linux rename tools like Windows 'ren'?
touch foo0001.jpg foo0002.jpg
ren foo*.jpg bar*.jpg        # Windows
rename foo bar foo*.jpg      # Coreutils rename
rename 's/foo/bar/' foo*.jpg # Perl (debian) rename

Short answer: globs are expanded before the command sees them

Bash expands globs before running the command. This means that running rename foo*.jpg bar*.jpg is exactly the same as running rename foo0000.jpg foo0001.jpg .... Since rename can’t know what pattern was originally used, it has to use an alternative syntax.

Of course, you could write a rename where you quote the globs, like rename "foo*.jpg" "bar*.jpg", but that’s not simpler than the coreutils version. It just adds complexity, edge cases and general confusion.

There have been proposals for environment variables to set so that commands can see the shell arguments with globs intact, but that has its own problems so they weren’t widely used.

Why Bash is like that: Command expansion

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why does esc contain "x1B" instead of the escape char?
esc=`printf \\x1B`

Short answer: “ requires another level of backslash escaping

To embed a backtick inside “, you escape it with a backslash. To embed a backslash, you escape it with another backslash. So `printf \\x1B` actually runs printf \x1B, and the shell interprets \x as a literal x with a superfluous escape. In other words, printf just sees “x1B”, and that’s what you get.

The problem grows exponentially as you try to nest ` `.

$(..) has distinct start and stop characters, so they can be used and nested without adding layers of backslashes. In this particular case you’d use esc=$'\x1B', and in general you could use esc=$(printf \\x1B).

# Why is newline empty instead of containing the line feed character?
newline=$(printf "\n")
echo "hello${newline}world"

Short answer: “ and $(..) strips trailing line feeds

$(..) and “ always strip trailing line feeds from command output. This is that special kind of magic that works so well you never think about it. echo “Hello $(whoami), how are you?” comes out as one line even though “whoami” (and basically all other commands) writes the username followed by a line feed.

This causes problems here because the output is only a single \n, i.e. the empty string followed by a trailing line feed. In this case, you’d again use newline=$'\n', but you could also have done newline=$(printf '\n'; printf x); newline=${newline#x} (append a x and then remove it), so that the line feeds in the output aren’t trailing.

# Bonus: Exercise for the reader
# Try adding the appropriate quotes and
# escapes to this quote normalization example
normalized=`echo ``Go on'', he said | sed -e s/`/'/g`

The variable should contain ''Go on'', he said. Can you get it right on the first try?

Why Bash is like that: Pseudo-syntax

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

 # Why doesn't || work in [ ] ?
 if [ -f /etc/inetd.conf || -d /etc/xinetd.d ]; then .. 

# And why does it work in [[ ]] ?
 if [[ -f /etc/inetd.conf || -d /etc/xinetd.d ]]; then ..

Short answer: [ is a regular command, and can’t override ||. [[ is shell syntax.

[ is a pseudo-syntactical command. That is, it’s a regular command much like cp or grep, the name just happens to be a single opening square bracket (try ls -l /usr/bin/[).

Seeing as how it’s a regular command, it can’t affect shell grammar. In grep -q kittens file || echo kittens >> file, grep can’t know or do anything about the fact that it’s being used with “||”. The same goes for [ in the example.

In Bash (but not necessarily other shells), [ is now a builtin command emulating /usr/bin/[ for efficiency. There’s no reason why Bash couldn’t make [ a || b ] work, but this would break compatibility.

Instead, we have [[ which is not bound by legacy, and does in fact alter shell syntax. [[ interprets ||, &&, globs and unquoted variable expansions in ways that an external [-command couldn’t, and an internal [ therefore can’t either.

To get around it without using [[, we’d do [ a ] || [ b ] or [ a -o b ].

# Why is this always true (should be false when foo is empty)?
if [ -n $foo ]

Short answer: $foo disappears and [ -n ] is shorthand for [ -n “-n” ], which is true

This comes back to the fact that [ is a regular command. If $foo is empty, the shell runs [ -n ], and there’s no way for [ to know that there was a variable that was expanded out.

[ -n x ] checks that x is not empty. [ x ] is shorthand for the same. [ -n ] therefore checks if dash-n is empty, which it isn’t, and thus the expression is always true.

Why doesn’t [ complain if you have -n without a parameter? Same thing again. It wouldn’t be able to tell that you said [ $1 ] rather than [ -n ], and then scripts would break whenever you work with your parameter list or any other strings starting with a dash.

Instead, you’d use [ -n "$foo" ], which will prevent bash from removing foo when it’s empty, and instead send in a zero-length argument. Or you could use [[ -n $foo ]], since [[ is built into the shell grammar and can tell that there is a variable there.

# Why doesn't this work?
if [ grep -q text myfile ]

Short answer: [ is a command, grep is a command. Choose one.

if [ -n "$foo" ] is the same as if test -n "$foo", which makes it more obvious how to use other commands (if grep -q text myfile).

Pseudo-syntactical elements makes it harder to tell language from implementation for better and for worse. It makes code look neat, but can be confusing. Especially when basically all if-statements include [.

A similar effect is seen in here-documents (cmd << EOF), where many people only ever see the delimiter “EOF” and think it’s some kind of keyword. Next time, use cmd << KITTENS to tip off anyone reading your script.