Technically correct: floating point calculations in bc

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc
3.142857142

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
16777215
> truncate (16777216 + 1 :: Float)
16777216

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:

scale=30

define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
}
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  i=bits
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)
}

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"
}

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"
3.142857142857142857142857142857

$ bc -ql fp <<< "float(22/7)"
3.142857193946838378906250000000

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
  where
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

Leave a Reply

Your email address will not be published.