Technically correct: floating point calculations in bc

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc
3.142857142

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
16777215
> truncate (16777216 + 1 :: Float)
16777216

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:

scale=30

define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
}
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  i=bits
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)
}

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"
}

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"
3.142857142857142857142857142857

$ bc -ql fp <<< "float(22/7)"
3.142857193946838378906250000000

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
  where
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

Basics of a Bash action game

If you want to write an action game in bash, you need the ability to check for user input without actually waiting for it. While bash doesn’t let you poll the keyboard in a great way, it does let you wait for input for a miniscule amount of time with read -t 0.0001.

Here’s a snippet that demonstrates this by bouncing some text back and forth, and letting the user control position and color. It also sets (and unsets) the necessary terminal settings for this to look good:

#!/usr/bin/env bash

# Reset terminal on exit
trap 'tput cnorm; tput sgr0; clear' EXIT

# invisible cursor, no echo
tput civis
stty -echo

text="j/k to move, space to color"
max_x=$(($(tput cols) - ${#text}))
dir=1 x=1 y=$(($(tput lines)/2))
color=3

while sleep 0.05 # GNU specific!
do
    # move and change direction when hitting walls
    (( x == 0 || x == max_x )) && \
        ((dir *= -1))
    (( x += dir ))


    # read all the characters that have been buffered up
    while IFS= read -rs -t 0.0001 -n 1 key
    do
        [[ $key == j ]] && (( y++ ))
        [[ $key == k ]] && (( y-- ))
        [[ $key == " " ]] && color=$((color%7+1))
    done

    # batch up all terminal output for smoother action
    framebuffer=$(
        clear
        tput cup "$y" "$x"
        tput setaf "$color"
        printf "%s" "$text"
    )

    # dump to screen
    printf "%s" "$framebuffer"
done

Paste shell script, get feedback: ShellCheck project update

tl;dr: ShellCheck is a bash/sh static analysis and linting tool. Paste a shell command or script on ShellCheck.net and get feedback about many common issues, both in scripts that currently fail and scripts that appear to work just fine.

There’s been a lot of progress since I first posted about it seven months ago. It has a new home on ShellCheck.net with a simplified and improved interface, and the parser has been significantly bugfixed so that parsing errors for correct code are now fairly rare.

However, the best thing is that it can detect a heaping bunch of new problems! This post mentions merely a subset of them.

 

Quiz: ShellCheck is aware of many common usage problems. Are you?

  • find . -name *.mp3
  • sudo echo 3 > /proc/sys/vm/drop_caches
  • PS1='\e[0;32m\$\e[0m '
  • find . | grep "*.mp3"
  • [ $n > 7 ]
  • [[ $n > 7 ]]
  • tr 'A-Z' 'a-z'
  • cmd 2>&1 > log
  • array=(1, 2, 3)
  • echo $10
  • [[ $a=$b ]]
  • [[ $a = $b ]]
  • progress=$((i/total*100))
  • trap "echo \"Time used: $SECONDS\"" EXIT
  • find dir -exec cp {} /backup && rm {} \;
  • [[ $keep = [yY] ]] && mv file /backup || rm file

 
 
ShellCheck gives more helpful messages for many Bash syntax errors

Bash says ShellCheck points to the exact position and says
: command not found Literal carriage return. Run script through tr -d ‘\r’
unexpected token: `fi’ Can’t have empty then clauses (use ‘true’ as a no-op)
unexpected token `(‘ Shells are space sensitive. Use ‘< <(cmd)', not '<<(cmd)'
unexpected token `(‘ ‘(‘ is invalid here. Did you forget to escape it?
echo foo: command not found This is a &nbsp;. Delete it and retype as space.

 
 
ShellCheck suggests style improvements

Code ShellCheck suggestion
basename "$var" Use parameter expansion instead, such as ${var##*/}
ls | grep 'mp3$' Don’t use ls | grep. Use a glob or a for loop with a condition.
expr 3 + 2 Use $((..)), ${} or [[ ]] in place of antiquated expr.
cat foo | grep bar Useless cat. Consider ‘cmd < file | ..' or 'cmd file | ..' instead.
length=$(echo "$var" | wc -c") See if you can use ${#variable} instead

 
 
ShellCheck recognizes common but wrong attempts at doing things

Code ShellCheck tip
var$n=42 For indirection, use (associative) arrays or ‘read “var$n” <<< "value"'".
(Bash says “var3=42: command not found”)
${var$n} To expand via indirection, use name=”foo$n”; echo “${!name}”
(Bash says “bad substitution”. )
echo 'It\'s time' Are you trying to escape that single quote? echo ‘You’\”re doing it wrong’
(Bash says “unexpected end of file”)
[ grep a b ] Use ‘if cmd; then ..’ to check exit code, or ‘if [[ $(cmd) == .. ]]’ to check output
(Bash says “[: a: binary operator expected”)
var=grep a b To assign the output of a command, use var=$(cmd)
(Bash says “a: command not found”)

 
ShellCheck can help with POSIX sh compliance and bashisms

When a script is declared with #!/bin/sh, ShellCheck checks for POSIX sh compliance, much like checkbashisms.

 
ShellCheck is free software, and can be used online and locally

ShellCheck is of course Free Software, and has a cute cli frontend in addition to the primary online version.

 
ShellCheck wants your feedback and suggestions!
Does ShellCheck give you incorrect suggestions? Does it fail to parse your working code? Is there something it could have warned about, but didn’t? After pasting a script on ShellCheck.net, a tiny “submit feedback” link appears in the top right of the annotated script area. Click it to submit the code plus your comments, and I can take a look!

Making bash run DOS/Windows CRLF EOL scripts

If you for any reason use a Windows editor to write scripts, it can be annoying to remember to convert them and bash fails in mysterious ways when you don’t. Let’s just get rid of that problem once and for all:

cat > $'/bin/bash\r' << "EOF"
#!/usr/bin/env bash
script=$1
shift
exec bash <(tr -d '\r' < "$script") "$@"
EOF

This allows you to execute scripts with DOS/Windows \r\n line endings with ./yourscript (but it will fail if the script specifies parameters on the shebang, or if you run it with bash yourscript). It works because from a UNIX point of view, DOS/Windows files specify the interpretter as "bash^M", and we override that to clean the script and run bash on the result.

Of course, you can also replace the helpful exec bash part with echo "Run dos2unix on your file!" >&2 if you'd rather give your users a helpful reminder rather than compatibility or a crazy error.

ShellCheck: shell script analysis

Shell scripting is notoriously full of pitfalls, unintuitive behavior and poor error messages. Here are some things you might have experienced:

  • find -exec fails on commands that are perfectly valid
  • 0==1 is apparently true
  • Comparisons are always false, and write files while failing
  • Variable values are available inside loops, but reset afterwards
  • Looping over filenames with spaces fails, and quoting doesn’t help

 

ShellCheck is my latest project. It will check shell scripts for all of the above, and also tries to give helpful tips and suggestions for otherwise working ones. You can paste your script and have it checked it online, or you can downloaded it and run it locally.

Other things it checks for includes reading from and redirecting to a file in the same pipeline, useless uses of cat, apparent variable use that won’t expand, too much or too little quoting in [[ ]], not quoting globs passed to find, and instead of just saying “syntax error near unexpected token `fi'”, it points to the relevant if statement and suggests that you might be missing a ‘then’.

It’s still in the early stages, but has now reached the point where it can be useful. The online version has a feedback button (in the top right of your annotated script), so feel free to try it out and submit suggestions!

Why Bash is like that: Subshells

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# I run this script, but afterwards my PATH and current dir hasn't changed!

#!/bin/bash
export PATH=$PATH:/opt/local/bin
cd /opt/games/

or more interestingly

# Why does this always say 0? 
n=0
cat file | while read line; do (( n++ )); done
echo $n

In the first case, you can add a echo "Path is now $PATH", and see the expected path. In the latter case, you can put a echo $n in the loop, and it will count up as you’d expect, but at the end you’ll still be left with 0.

To make things even more interesting, here are the effects of running these two examples (or equivalents) in different shells:

set in script set in pipeline
Bash No effect No effect
Ksh/Zsh No effect Works
cmd.exe Works No effect

What we’re experiencing are subshells, and different shells have different policies on what runs in subshells.

Environment variables, as well as the current directory, is only inherited parent-to-child. Changes to a child’s environment are not reflect in the parent. Any time a shell forks, changes done in the forked process are confined to that process and its children.

In Unix, all normal shells will fork to execute other shell scripts, so setting PATH or cd’ing in a script will never have an effect after the command is done (instead, use "source file" aka ". file" to read and execute the commands without forking).

However, shells can differ in when subshells are invoked. In Bash, all elements in a pipeline will run in a subshell. In Ksh and Zsh, all except the last will run in a subshell. POSIX leaves it undefined.

This means that echo "2 + 3" | bc | read sum will work in Ksh and Zsh, but fail to set the variable sum in Bash.

To work around this in Bash, you can usually use redirection and process substition instead:

read sum < <(echo "2 + 3" | bc)

So, where do we find subshells? Here are a list of commands that in some way fails to set foo=bar for subsequent commands (note that all the examples set it in some subshell, and can use it until the subshell ends):

# Executing other programs or scripts
./setmyfoo
foo=bar ./something

# Anywhere in a pipeline in Bash
true | foo=bar | true

# In any command that executes new shells
awk '{ system("foo=bar") }'h
find . -exec bash -c 'foo=bar' \;

# In backgrounded commands and coprocs:
foo=bar &
coproc foo=bar

# In command expansion
true "$(foo=bar)"

# In process substitution
true < <(foo=bar)

# In commands explicitly subshelled with ()
( foo=bar )

and probably some more that I'm forgetting.

Trying to set a variable, option or working dir in any of these contexts will result in the changes not being visible for following commands.

Knowing this, we can use it to our advantage:

# cd to each dir and run make
for dir in */; do ( cd "$dir" && make ); done

# Compare to the more fragile
for dir in */; do cd "$dir"; make; cd ..; done

# mess with important variables
fields=(a b c); ( IFS=':'; echo ${fields[*]})

# Compare to the cumbersome
fields=(a b c); oldIFS=$IFS; IFS=':'; echo ${fields[*]}; IFS=$oldIFS; 

# Limit scope of options
( set -e; foo; bar; baz; )