Why Bash is like that: Subshells

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# I run this script, but afterwards my PATH and current dir hasn't changed!

#!/bin/bash
export PATH=$PATH:/opt/local/bin
cd /opt/games/

or more interestingly

# Why does this always say 0? 
n=0
cat file | while read line; do (( n++ )); done
echo $n

In the first case, you can add a echo "Path is now $PATH", and see the expected path. In the latter case, you can put a echo $n in the loop, and it will count up as you’d expect, but at the end you’ll still be left with 0.

To make things even more interesting, here are the effects of running these two examples (or equivalents) in different shells:

set in script set in pipeline
Bash No effect No effect
Ksh/Zsh No effect Works
cmd.exe Works No effect

What we’re experiencing are subshells, and different shells have different policies on what runs in subshells.

Environment variables, as well as the current directory, is only inherited parent-to-child. Changes to a child’s environment are not reflect in the parent. Any time a shell forks, changes done in the forked process are confined to that process and its children.

In Unix, all normal shells will fork to execute other shell scripts, so setting PATH or cd’ing in a script will never have an effect after the command is done (instead, use "source file" aka ". file" to read and execute the commands without forking).

However, shells can differ in when subshells are invoked. In Bash, all elements in a pipeline will run in a subshell. In Ksh and Zsh, all except the last will run in a subshell. POSIX leaves it undefined.

This means that echo "2 + 3" | bc | read sum will work in Ksh and Zsh, but fail to set the variable sum in Bash.

To work around this in Bash, you can usually use redirection and process substition instead:

read sum < <(echo "2 + 3" | bc)

So, where do we find subshells? Here are a list of commands that in some way fails to set foo=bar for subsequent commands (note that all the examples set it in some subshell, and can use it until the subshell ends):

# Executing other programs or scripts
./setmyfoo
foo=bar ./something

# Anywhere in a pipeline in Bash
true | foo=bar | true

# In any command that executes new shells
awk '{ system("foo=bar") }'h
find . -exec bash -c 'foo=bar' \;

# In backgrounded commands and coprocs:
foo=bar &
coproc foo=bar

# In command expansion
true "$(foo=bar)"

# In process substitution
true < <(foo=bar)

# In commands explicitly subshelled with ()
( foo=bar )

and probably some more that I'm forgetting.

Trying to set a variable, option or working dir in any of these contexts will result in the changes not being visible for following commands.

Knowing this, we can use it to our advantage:

# cd to each dir and run make
for dir in */; do ( cd "$dir" && make ); done

# Compare to the more fragile
for dir in */; do cd "$dir"; make; cd ..; done

# mess with important variables
fields=(a b c); ( IFS=':'; echo ${fields[*]})

# Compare to the cumbersome
fields=(a b c); oldIFS=$IFS; IFS=':'; echo ${fields[*]}; IFS=$oldIFS; 

# Limit scope of options
( set -e; foo; bar; baz; ) 

8 thoughts on “Why Bash is like that: Subshells”

  1. #!/bin/bash is broken. use #!/usr/bin/env bash – it respects the user’s wishes and doesn’t break when bash is not in /bin

    other than that, nice post!

  2. Which is why you need to consider the execution environment when writing your script.

  3. #!/usr/bin/env bash is broken – it doesn’t respect the user’s wishes and breaks when env is not in /usr/bin.

  4. The gist of this is right; but the explanation is, pedagogically, rather weak.

    To understand this it’s essential to understand the Unix fork()/exec() command execution model. All processes start, conceptually, as a copy of their parent. In a subshell the subprocess continues to run in that copy of the parent’s memory. For any external command the subprocess performs an execve() system call which replaces all of the memory *except for the environment* with a freshly prepare memory image as specified by the executable.

    The key point to be taught after this is that the shell conceptually has two regions of key/value memory, the local variable heap and the environment. Whenever the shell executes external commands *and* whenever it’s required to execute any commands (built-in or external) in a subshell then the child process will get a copy of the environment.

    Conceptually all key/value pairs in the shell (FOO=bar) start as local variables. Using the export command then moves that from the local heap into the environment. (Thus the name is actually intuitive if you think of the environment as being “outside” of the local memory).

    (Incidentally, for any reference to a shell variable the shell searches the environment key space first, and only acts on the local variables if there is not matching environment variable).

    The first example, executing a shell script which changes its copy of the PATH, is conceptually no different from executing any other external command (such as ‘ls’ or ‘cat’). The command executes in its own process, with its own memory. Changes to that process’ memory are lost when the process exits. Sourcing a file (using the ‘source’ command or its more commonly used ‘.’ alias) is a means to force the shell to read and evaluate a series of commands (and shell expressions) in the current shell process. It’s similar to the “eval” command except that it reads the contents of a file (evaluating those). So the command: eval $(cat some_file) is, conceptually, the same as: source some_file.

    The key point to teach about the second example is that the | is an INTER-PROCESS COMMUNICATIONS operator. As that description implies there is more than one process involved. Whenever you use a | in a shell command you are, implicitly, creating a subshell on one side of that pipe or the other.

    As you describe, different shells and different versions of any given shell may create their subshells to evaluate the left or right expression sourcing the | operator. zsh and ksh (’88 and later?) create subshells to the left of the pipe, while bash, ash, dash and older Bourne shells create a subshell to evaluate the expression to the right of the pipe.

    So, for zsh and ksh (after ’88) the command echo foo | read bar is executing the echo statement in a subshell and reading into the current process. You can, of course, get a similar effect using: echo foo | { read bar; echo $bar; } … thus grouping the read and the second echo inside the same subshell. In bash and other shells a command like echo foo | read bar; echo $bar evaluates the read bar in a subshell, and that subshell exits before the echo $bar is evaluated, by the current shell.

  5. @JimD
    You’re quite right; the article doesn’t delve into the hows and whys of Unix processes, and knowing how the fork, execve and pipe syscalls work and how memory and file descriptors are inherited make it more obvious why shells work as they do.

    I would like to point out, though, that modification and inheritance of the environment as you describe it is not part of the Unix process model. The inherited environment, at least in the Linux and FreeBSD ABIs, is passed on the stack and thus not amenable to expansion, and execve always replaces the process’ entire environment with the data explicitly passed to this function.

    Using libc functions like putenv, setenv and friends doesn’t modify the inherited environment, Instead, they modify the C library’s copy, which is kept on the process’ local heap. The execl/execlp/etc family of functions that don’t accept an explicit environment are just libc convenience functions that end up calling execve with this data. A shell doesn’t have to use any of it, and Bash even provides its own implementation of similar functionality.

    Therefore, rather than talking about special, inherited memory areas, it would be more correct to say that export marks the variables that should be passed along when executing external programs.

  6. You can always have a script run in the current shell’s context (and make changes to the current shell doing things such as updating the $PATH etc) by running the “source” command. it will run the contents of the file with the current shell without spawning a subshell. Very useful.

Leave a Reply to Chris Gonnerman Cancel reply