So what exactly is -ffunction-sections and how does it reduce binary size?

If you’d like a more up-to-date version of ShellCheck than what Raspbian provides, you can build your own on a Raspberry Pi Zero in a little over 21 hours.

Alternatively, as of last week, you can also download RPi compatible, statically linked armv6hf binaries of every new commit and stable release.

It’s statically linked — i.e. the executable has all its library dependencies built in — so you can expect it to be pretty big. However, I didn’t expect it to be 67MB:

build@d1044ff3bf67:/mnt/shellcheck# ls -l shellcheck
-rwxr-xr-x 1 build build 66658032 Jul 14 16:04 shellcheck

This is for a tool intended to run on devices with 512MiB RAM. strip helps shed a lot of that weight, and the post-stripped number is the one we’ll use from now on, but 36MB is still more than I expected, especially given that the x86_64 build is 23MB.

build@d1044ff3bf67:/mnt/shellcheck# strip --strip-all shellcheck
build@d1044ff3bf67:/mnt/shellcheck# ls -l shellcheck
-rwxr-xr-x 1 build build 35951068 Jul 14 16:22 shellcheck

So now what? Optimize for size? Here’s ghc -optlo-Os to enable LLVM opt size optimizations, including a complete three hour Qemu emulated rebuild of all dependencies:

build@31ef6588fdf1:/mnt/shellcheck# ls -l shellcheck
-rwxr-xr-x 1 build build 32051676 Jul 14 22:38 shellcheck

Welp, that’s not nearly enough.

The real problem is that we’re linking in both C and Haskell dependencies, from the JSON formatters and Regex libraries to bignum implemenations and the Haskell runtime itself. These have tons of functionality that ShellCheck doesn’t use, but which is still included as part of the package.

Fortunately, GCC and GHC allow eliminating this kind of dead code through function sections. Let’s look at how that works, and why dead code can’t just be eliminated as a matter of course:

An ELF binary contains a lot of different things, each stored in a section. It can have any number of these sections, each of which has a pile of attributes including a name:

  • .text stores executable code
  • .data stores global variable values
  • .symtab stores the symbol table
  • Ever wondered where compilers embed debug info? Sections.
  • Exception unwinding data, compiler version or build IDs? Sections.

This is how strip is able to safely and efficiently drop so much data: if a section has been deemed unnecessary, it’s simple and straight forward to drop it without affecting the rest of the executable.

Let’s have a look at some real data. Here’s a simple foo.c:

int foo() { return 42; }
int bar() { return foo(); }

We can compile it with gcc -c foo.c -o foo.o and examine the sections:

$ readelf -a foo.o
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:        ELF32
  Data:         2's complement, little endian
  Version:      1 (current)
  OS/ABI:       UNIX - System V
  ABI Version:  0
  Type:         REL (Relocatable file)
  Machine:      ARM

Section Headers:
  [Nr] Name       Type      Addr   Off    Size   ES Flg Lk Inf Al
  [ 0]            NULL      000000 000000 000000 00      0   0  0
  [ 1] .text      PROGBITS  000000 000034 000034 00  AX  0   0  4
  [ 2] .rel.text  REL       000000 000190 000008 08   I  8   1  4
  [ 3] .data      PROGBITS  000000 000068 000000 00  WA  0   0  1
  [ 4] .bss       NOBITS    000000 000068 000000 00  WA  0   0  1

Symbol table '.symtab' contains 11 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     9: 00000000    28 FUNC    GLOBAL DEFAULT    1 foo
    10: 0000001c    24 FUNC    GLOBAL DEFAULT    1 bar

There’s tons more info not included here, and it’s an interesting read in its own right. Anyways, both our functions live in the .text segment. We can see this from the symbol table’s Ndx column which says section 1, corresponding to .text. We can also see it in the disassembly:

$ objdump -d foo.o
foo.o:     file format elf32-littlearm

Disassembly of section .text:
00000000 <foo>:
   0:   e52db004   push    {fp}
   4:   e28db000   add     fp, sp, #0
   8:   e3a0302a   mov     r3, #42 ; 0x2a
   c:   e1a00003   mov     r0, r3
  10:   e28bd000   add     sp, fp, #0
  14:   e49db004   pop     {fp}
  18:   e12fff1e   bx      lr

0000001c <bar>:
  1c:   e92d4800   push    {fp, lr}
  20:   e28db004   add     fp, sp, #4
  24:   ebfffffe   bl      0 <foo>
  28:   e1a03000   mov     r3, r0
  2c:   e1a00003   mov     r0, r3
  30:   e8bd8800   pop     {fp, pc}

Now lets say that the only library function we use is foo, and we want bar removed from the final binary. This is tricky, because you can’t just modify a .text segment by slicing things out of it. There are offsets, addresses and cross-dependencies compiled into the code, and any shifts would mean trying to patch that all up. If only it was as easy as when strip removed whole sections…

This is where gcc -ffunction-sections and ghc -split-sections come in. Let’s recompile our file with gcc -ffunction-sections foo.c -c -o foo.o:

$ readelf -a foo.o
Section Headers:
  [Nr] Name          Type      Addr  Off  Size ES Flg Lk Inf Al
  [ 0]               NULL      00000 0000 0000 00      0   0  0
  [ 1] .text         PROGBITS  00000 0034 0000 00  AX  0   0  1
  [ 2] .data         PROGBITS  00000 0034 0000 00  WA  0   0  1
  [ 3] .bss          NOBITS    00000 0034 0000 00  WA  0   0  1
  [ 4]     PROGBITS  00000 0034 001c 00  AX  0   0  4
  [ 5]     PROGBITS  00000 0050 001c 00  AX  0   0  4
  [ 6] REL       00000 01c0 0008 08   I 10   5  4

Symbol table '.symtab' contains 14 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
12: 00000000    28 FUNC    GLOBAL DEFAULT    4 foo
13: 00000000    28 FUNC    GLOBAL DEFAULT    5 bar

Look at that! Each function now has its very own section.

This means that a linker can go through and find all the sections that contain symbols we need, and drop the rest. We can enable it with the aptly named ld flag --gc-sections. You can pass that flag to ld via gcc using gcc -Wl,--gc-sections. And you can pass that whole thing to gcc via ghc using ghc -optc-Wl,--gc-sections

I enabled all of this in my builder’s .cabal/config:

  gcc-options: -Os -Wl,--gc-sections -ffunction-sections -fdata-sections
  ghc-options: -optc-Os -optlo-Os -split-sections

With this in place, the ShellCheck binary became a mere 14.5MB:

-rw-r--r-- 1 build build 14503356 Jul 15 10:01 shellcheck

That’s less than half the size we started out with. I’ve since applied the same flags to the x86_64 build, which brought it down from 23MB to 7MB. Snappier downloads and installs for all!

For anyone interested in compiling Haskell for armv6hf on x86_64, I spent weeks trying to get cross-compilation going, but in the end (and with many hacks) I was only able to cross-compile armv7. In the end I gave up and took the same approach as with the Windows build blog post: a Docker image runs the Raspbian armv6 userland in Qemu user emulation mode.

I didn’t even have to set up Qemu. There’s tooling from for building ARM Docker containers for IoT purposes. ShellCheck (ab)uses this to run emulated GHC and cabal. Everything Just Works, if slowly.

The Dockerfile is available on GitHub as koalaman/armv6hf-builder.

An ode to pack: gzip’s forgotten decompressor

The latest 4.13.9 source release of the Linux kernel is 780MiB, but thanks to xz compression, the download is a much more managable 96 MiB (an 88% reduction)

Before xz took over as the default compression format on in 2013, following the "latest" link would have gotten you a bzip2 compressed file. The tar.bz2 would have been 115 MiB (-85%), but there’s was no defending the extra 20 MiB after xz caught up in popularity. bzip2 is all but displaced today.

bzip2 became the default in 2003, though it had long been an option over the less efficient gzip. However, since every OS, browser, language core library, phone and IoT lightswitch has built-in support for gzip, a 148 MiB (-81%) tar.gz remains an option even today.

gzip itself started taking over in 1994, before, and before the World Wide Web went mainstream. It must have been a particularly easy sell for the fledgeling Linux kernel: it was made, used and endorsed by the mighty GNU project, it was Free Software, free of patent restrictions, and it provided powerful .zip style DEFLATE compression in a Unix friendly package.

Another nice benefit was that gzip could decompress other contemporary formats, thereby replacing contested and proprietary software.

Among the tools it could replace was compress, the de-facto Unix standard at the time. Created based on LZW in 1985, it was hampered by the same patent woes that plagued GIF files. The then-ubiquitous .Z suffix graced the first public Linux releases, but is now recognized only by the most long-bearded enthusiasts. The current release would have been 302 MiB (-61%) with compress.

Another even more obscure tool it could replace was compress‘s own predecessor, pack. This rather loosely defined collection of only partially compatible formats is why compress had to use a capital Z in its extension. pack came first, and offered straight Huffman coding with a .z extension.

With pack, our Linux release would have been 548 MiB (-30%). Compared to xz‘s 96 MiB, it’s obvious why no one has used it for decades.

Well, guess what: gzip never ended its support! Quoth the man page,

gunzip can currently decompress files created by gzip, zip,
compress, compress -H or pack.

While multiple implementations existed, these were common peculiarities:

  • They could not be used in pipes.
  • They could not represent empty files.
  • They could not compress a file with only one byte value, e.g. "aaaaaa…"
  • They could fail on "large" files. "can’t occur unless [file size] >= [16MB]", a comment said dismissively, from the time when a 10MB hard drive was a luxury few could afford.

These issues stemmed directly from the Huffman coding used. Huffman coding, developed in 1952, is basically an improvement on Morse code, where common characters like "e" get a short code like "011", while uncommon "z" gets a longer one like "111010".

  • Since you have to count the characters to figure out which are common, you can not compress in a single pass in a pipe. Now that memory is cheap, you could mostly get around that by keeping the data in RAM.

  • Empty files and single-valued files hit an edge case: if you only have a single value, the shortest code for it is the empty string. Decompressors that didn’t account for it would get stuck reading 0 bits forever. You can get around it by adding unused dummy symbols to ensure a minimum bit length of 1.

  • A file over 16MB could cause a single character to be so rare that its bit code was 25+ bits long. A decompressor storing the bits to be decoded in a 32bit value (a trick even gzip uses) would be unable to append a new 8bit byte to the buffer without displacing part of the current bit code. You can get around that by using "package merge" length restricted prefix codes over naive Huffman codes.

I wrote a Haskell implementation with all these fixes in place: koalaman/pack is available on GitHub.

During development, I found that pack support in gzip had been buggy since 2012 (version 1.6), but no one had noticed in the five years since. I tracked down the problem and I’m happy to say that version 1.9 will again restore full pack support!

Anyways, what could possibly be the point of using pack today?

There is actually one modern use case: code golfing.

This post came about because I was trying to implement the shortest possible program that would output a piece of simple ASCII art. A common trick is variations of a self-extracting shell script:

sed 1d $0|gunzip;exit
<compressed binary data here>

You can use any available compressor, including xz and bzip2, but these were meant for bigger files and have game ruining overheads. Here’s the result of compressing the ASCII art in question:

  • raw: 269 bytes
  • xz: 216 bytes
  • bzip2: 183 bytes
  • gzip: 163 bytes
  • compress: 165 bytes
  • and finally, pack: 148 bytes!

I was able to save 15 bytes by leveraging gzip‘s forgotten legacy support. This is huge in a sport where winning entries are bytes apart.

Let’s have a look at this simple file format. Here’s an example pack file header for the word "banana":

1f 1e        -- Two byte magic header
00 00 00 06  -- Original compressed length (6 bytes)

Next comes the Huffman tree. Building it is simple to do by hand, but too much for this post. It just needs to be complete, left-aligned, with eof on the right at the deepest level. Here’s the optimal tree for this string:

       /  a
     /  \
    /\   n
   b  eof

We start by encoding its depth (3), and the number of leaves on each level. The last level is encoded minus 2, because the lowest level will have between 2 and 257 leaves, while a byte can only store 0-255.

03  -- depth
01  -- level 1 only contains 'a'
01  -- level 2 only contains 'n'
00  -- level 3 contains 'b' and 'eof', -2 as mentioned

Next we encode the ASCII values of the leaves in the order from top to bottom, left to right. We can leave off the EOF (which is why it needs to be in the lower right):

61 6e 62  -- "a", "n" ,"b"

This is enough for the decompressor to rebuild the tree. Now we go on to encode the actual data.

Starting from the root, the Huffman codes are determined by adding a 0 for ever left branch and 1 for every right branch you have to take to get to your value:

a   -> right = 1
n   -> left+right = 01
b   -> left+left+left -> 000
eof -> left+left+right -> 001

banana<eof> would therefore be 000 1 01 1 01 1 001, or when grouped as bytes:

16  -- 0001 0110
C8  -- 1100 1   (000 as padding)

And that’s all we need:

$ printf '\x1f\x1e\x00\x00\x00\x06'\
'\x03\x01\x01\x00\x61\x6e\x62\x16\xc8' | gzip -d

Unfortunately, the mentioned gzip bug triggers due to failing to account for leading zeroes in bit code. eof and a have values 001 and 1, so an oversimplified equality check confuses one for the other, causing gzip to terminate early:

gzip: stdin: invalid compressed data--length error

However, if you’re stuck with an affected version, there’s another party trick you can do: the Huffman tree has to be canonical, but it does not have to be optimal!

What would happen if we skipped the count and instead claimed that each ASCII character is equally likely? Why, we’d get a tree of depth 8 where all the leaf nodes are on the deepest level.

It then follows that each 8 bit character will be encoded as 8 bits in the output file, with the bit patterns we choose by ordering the leaves.

Let’s add a header with a dummy length to a file:

$ printf '\x1F\x1E____' > myfile.z

Now let’s append the afforementioned tree structure, 8 levels with all nodes in the last one:

$ printf '\x08\0\0\0\0\0\0\0\xFE' >> myfile.z

And let’s populate the leaf nodes with 255 bytes in an order of our choice:

$ printf "$(printf '\\%o' {0..254})" |
    tr 'A-Za-z' 'N-ZA-Mn-za-m' >> myfile.z

Now we can run the following command, enter some text, and hit Ctrl-D to "decompress" it:

$ cat myfile.z - | gzip -d 2> /dev/null
Jr unir whfg pbaivaprq TMvc gb hafpenzoyr EBG13!
We have just convinced GZip to unscramble ROT13!

Can you think of any other fun ways to use or abuse gzip‘s legacy support? Post a comment.

ShellCheck and shadowed case branches

As of the the latest commit, ShellCheck will try to detect shadowed case branches.

Here’s an adaptation from an unnamed script on GitHub:

case $1 in
        exit 0
        die "Unknown option: $1"

The original case statement was significantly longer, so you’d be excused for not noticing the problem: -h is used for two different branches. Because of this, -h as a short option for --hub will not work.

If you run ShellCheck on this example now, you will get a pair of helpful warnings:

Line 4:
    ^-- SC2221: This pattern always overrides a later one.
Line 8:
    ^-- SC2222: This pattern never matches because of a previous pattern.

Very simple and probably somewhat useful in certain cases, right? Well, it gets slightly more interesting.

Here is another example adapted from the wild:

case $1 in
        die "Unknown option: $1"

Did you spot the same problem? ShellCheck did:

Line 4:
              ^-- SC2221: This pattern always overrides a later one.

Since an unescaped ? matches any character, it will match also match -v, so the short form of --verbose will not work.

Similarly, it recognizes two separate issues in this example:

    -*|--*) die "Invalid option: $1" ;;
    --) shift; break ;;

The end-of-option -- marker will never be recognized, and -*|--* is redundant because the first already covers the second.

These are all very simple cases, but this also works more generally. Here’s a fabricated music sorting script where the bug would be exceedingly hard to spot in a longer list of bands:

case "${filename,,}" in
  *"abba"*.mp3 ) rm "$filename" ;;
  *"black"*"sabbath"*.mp3 ) mv "$filename" "Music/Metal" ;;

So how does it work?

There are very clever ways of determining whether one regular language is a superset of another by intersecting it with the complement of the other, and checking the result for satisfiability.

ShellCheck uses none of them.

I’ve written a regex inverter before, and that level of complexity was not something I wanted to introduce.

Instead, ShellCheck’s pattern intersection and superset supports only basic DOS style wildcard patterns: ?, * and literals. It just does a simple recursive match on the two patterns.

Let’s call the patterns A and B, and we wish to check if A is a superset of B, i.e. if A matches everything that B does.

We have two arbitrary shell patterns that we want to turn into a simplified form, while ensuring we don’t simplify away any details that will cause a false positive. ShellCheck does this in two ways:

It creates A in such a way that it’s guaranteed to match a (non-strict) subset of the actual glob. This just means giving up on any pattern that uses features we don’t explicitly recognize. $(cmd)foo@(ab|c) is rejected, while *foo* is allowed.

It then creates B to guarantee that it matches a (non-strict) superset of the actual glob. This is done by replacing anything we don’t support with a *. $(cmd)foo@(ab|c) just becomes *foo*.

Now we can just match the two patterns against each other with an inefficient but simple recursive matcher. Matching two patterns is slightly trickier than matching a pattern against a string, but it’s still a first year level CS exercise.

It just involves breaking down the patterns by prefix, and matching until you reach a trivial base case:

  • superset(“”, “”) = True
  • superset(“”, cY) = False
  • superset(cX, cY) = superset(X, Y)
  • superset(*X, *Y) = superset(*X, Y)

The actual code calls the simplified patterns “PseudoGlobs”, inhabited by PGAny ?, PGMany *, and PGChar c:

pseudoGlobIsSuperSetof :: [PseudoGlob] -> [PseudoGlob] -> Bool
pseudoGlobIsSuperSetof = matchable
    matchable x@(xf:xs) y@(yf:ys) =
        case (xf, yf) of
            (PGMany, PGMany) -> matchable x ys
            (PGMany, _) -> matchable x ys || matchable xs y
            (_, PGMany) -> False
            (PGAny, _) -> matchable xs ys
            (_, PGAny) -> False
            (_, _) -> xf == yf && matchable xs ys

    matchable [] [] = True
    matchable (PGMany : rest) [] = matchable rest []
    matchable _ _ = False

That’s really all there is to it. ShellCheck just goes through each pattern, and flags the first pattern (if any) that it shadows. There’s also a pattern simplifier which rearranges c*?*?****d into c??*d to add some efficiency to obviously diseased patterns.

Future work could include supporting character sets/ranges since [yY] is at least occasionally used, but it’s rare to find any extglob to warrant full regex support.

Of course, 99% of the time, there are no duplicates. 99.9% of the time, you’d get the same result with simple string matches.

However, that 0.1% of cases where you get delightful insights like -? shadowing -v or Linux-3.1* shadowing Linux-3.12* makes it all worthwhile.

Compiling Haskell for Windows on Travis CI

or: How I finally came around and started appreciating Docker.

tl;dr: ShellCheck is now automatically compiled for Windows using Wine+GHC in Docker, without any need for additional Windows CI.

I don’t know what initially surprised me more: that people were building ShellCheck on Windows before and without WSL, or that it actually worked.

Unless you’re a fan of the language, chances are that if you run any Haskell software at all, it’s one of pandoc, xmonad, or shellcheck. Unlike GCC, Haskell build tools are not something you ever just happen to have lying around.

While Haskell is an amazing language, it doesn’t come cheap. At 550 MB, GHC, the Haskell Compiler, is the single largest package on my Debian system. On Windows, the Haskell Platform — GHC plus standard tools and libraries — weighs in at 4,200 MB.

If you need to build your own software from source, 4 GB of build dependencies unique to a single application is enough to make you reconsider. This is especially true on Windows where you can’t just close your eyes and hit “yes” in your package manager.

Thanks to the awesome individuals who package ShellCheck for various distros, most people have no idea. To them, ShellCheck is just a 5 MB download without external dependencies.

Starting today, this includes Windows users!

This is obviously great for them, but the more interesting story is how this happens.

There’s no parallel integration with yet another CI system like Appveyor, eating the cost of Windows licenses in the hopes of future business. There’s not been a rise of a much-needed de facto standard package manager, with generous individuals donating their time.

It’s also not me booting Windows at home to manually compile executables on every release, nor a series of patches trying to convince GHC to target Windows from GNU/Linux.

It’s a Docker container with GHC and Cabal running in Wine.

Ugly? Yes. Does it matter? No. The gory details are all hidden away by Docker.

Anyone, including Travis CI, can now easily and automatically compile ShellCheck (or any other Haskell project for that matter) for Windows in two lines, without a Windows license.

If you want ShellCheck binaries for Windows, they’re linked to on the ShellCheck github repo. If you want to take a look at the Docker image, there’s a repo for that too.

ShellCheck has had an official Docker build for quite a while, but it was contribution (thanks, Nikyle!). I never really had any feelings for Docker, one way or the other.

Consider me converted.

[ -z $var ] works unreasonably well

There is a subreddit /r/nononoyes for videos of things that look like they’ll go horribly wrong, but amazingly turn out ok.

[ -z $var ] would belong there.

It’s a bash statement that tries to check whether the variable is empty, but it’s missing quotes. Most of the time, when dealing with variables that can be empty, this is a disaster.

Consider its opposite, [ -n $var ], for checking whether the variable is non-empty. With the same quoting bug, it becomes completely unusable:

Input Expected [ -n $var ]
“” False True!
“foo” True True
“foo bar” True False!

These issues are due to a combination of word splitting and the fact that [ is not shell syntax but traditionally just an external binary with a funny name. See my previous post Why Bash is like that: Pseudo-syntax for more on that.

The evaluation of [ is defined in terms of the number of argument. The argument values have much less to do with it. Ignoring negation, here’s a simplified excerpt from POSIX test:

# Arguments Action Typical example
0 False [ ]
1 True if $1 is non-empty [ "$var" ]
2 Apply unary operator $1 to $2 [ -x "/bin/ls" ]
3 Apply binary operator $2 to $1 and $3 [ 1 -lt 2 ]

Now we can see why [ -n $var ] fails in two cases:

When the variable is empty and unquoted, it’s removed, and we pass 1 argument: the literal string “-n”. Since “-n” is not an empty string, it evaluates to true when it should be false.

When the variable contains foo bar and is unquoted, it’s split into two arguments, and so we pass 3: “-n”, “foo” and “bar”. Since “foo” is not a binary operator, it evaluates to false (with an error message) when it should be true.

Now let’s have a look at [ -z $var ]:

Input Expected [ -z $var ] Actual test
“” True: is empty True 1 arg: is “-z” non-empty
“foo” False: not empty False 2 args: apply -z to foo
“foo bar” False: not empty False (error) 3 args: apply “foo’ to -z and bar

It performs a completely wrong and unexpected action for both empty strings and multiple arguments. However, both cases fail in exactly the right way!

In other words, [ -z $var ] works way better than it has any possible business doing.

This is not to say you can skip quoting of course. For “foo bar”, [ -z $var ] in bash will return the correct exit code, but prints an ugly error in the process. For ” ” (a string with only spaces), it returns true when it should be false, because the argument is removed as if empty. Bash will also incorrectly pass var="foo -o x" because it ends up being a valid test through code injection.

The moral of the story? Same as always: quote, quote quote. Even when things appear to work.

ShellCheck is aware of this difference, and you can check the code used here online. [ -n $var ] gets an angry red message, while [ -z $var ] merely gets a generic green quoting warning.

Technically correct: floating point calculations in bc

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
> truncate (16777216 + 1 :: Float)

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:


define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"

$ bc -ql fp <<< "float(22/7)"

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!