{"id":729,"date":"2018-07-15T18:25:34","date_gmt":"2018-07-15T18:25:34","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=729"},"modified":"2018-07-17T17:34:03","modified_gmt":"2018-07-17T17:34:03","slug":"shellcheck-on-armv6hf-crossless-compilers-and-the-value-of-split-sections","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=729","title":{"rendered":"So what exactly is -ffunction-sections and how does it reduce binary size?"},"content":{"rendered":"<p>If you&#8217;d like a more up-to-date version of <a href=\"https:\/\/www.shellcheck.net\">ShellCheck<\/a> than what Raspbian provides, you can build your own on a Raspberry Pi Zero in a little over 21 hours.<\/p>\n<p>Alternatively, as of last week, you can also download RPi compatible, statically linked <a href=\"https:\/\/github.com\/koalaman\/shellcheck#installing\">armv6hf binaries<\/a> of every new commit and stable release.<\/p>\n<p>It&#8217;s statically linked &#8212; i.e. the executable has all its library dependencies built in &#8212; so you can expect it to be pretty big. However, I didn&#8217;t expect it to be 67MB:<\/p>\n<pre>build@d1044ff3bf67:\/mnt\/shellcheck# ls -l shellcheck\r\n-rwxr-xr-x 1 build build 66658032 Jul 14 16:04 shellcheck<\/pre>\n<p>This is for a tool intended to run on devices with 512MiB RAM. <code>strip<\/code> helps shed a lot of that weight, and the post-stripped number is the one we&#8217;ll use from now on, but 36MB is still more than I expected, especially given that the x86_64 build is 23MB.<\/p>\n<pre>build@d1044ff3bf67:\/mnt\/shellcheck# strip --strip-all shellcheck\r\nbuild@d1044ff3bf67:\/mnt\/shellcheck# ls -l shellcheck\r\n-rwxr-xr-x 1 build build 35951068 Jul 14 16:22 shellcheck<\/pre>\n<p>So now what? Optimize for size? Here&#8217;s <code>ghc -optlo-Os<\/code> to enable LLVM <code>opt<\/code> size optimizations, including a complete three hour Qemu emulated rebuild of all dependencies:<\/p>\n<pre>build@31ef6588fdf1:\/mnt\/shellcheck# ls -l shellcheck\r\n-rwxr-xr-x 1 build build 32051676 Jul 14 22:38 shellcheck<\/pre>\n<p>Welp, that&#8217;s not nearly enough.<\/p>\n<p>The real problem is that we&#8217;re linking in both C and Haskell dependencies, from the JSON formatters and Regex libraries to bignum implemenations and the Haskell runtime itself. These have tons of functionality that ShellCheck doesn&#8217;t use, but which is still included as part of the package.<\/p>\n<p>Fortunately, GCC and GHC allow eliminating this kind of dead code through <em>function sections<\/em>. Let&#8217;s look at how that works, and why dead code can&#8217;t just be eliminated as a matter of course:<\/p>\n<p>An ELF binary contains a lot of different things, each stored in a <em>section<\/em>. It can have any number of these sections, each of which has a pile of attributes including a name:<\/p>\n<ul>\n<li><code>.text<\/code> stores executable code<\/li>\n<li><code>.data<\/code> stores global variable values<\/li>\n<li><code>.symtab<\/code> stores the symbol table<\/li>\n<li>Ever wondered where compilers embed debug info? Sections.<\/li>\n<li>Exception unwinding data, compiler version or build IDs? Sections.<\/li>\n<\/ul>\n<p>This is how <code>strip<\/code> is able to safely and efficiently drop so much data: if a section has been deemed unnecessary, it&#8217;s simple and straight forward to drop it without affecting the rest of the executable.<\/p>\n<p>Let&#8217;s have a look at some real data. Here&#8217;s a simple <code>foo.c<\/code>:<\/p>\n<pre>int foo() { return 42; }\r\nint bar() { return foo(); }<\/pre>\n<p>We can compile it with <code>gcc -c foo.c -o foo.o<\/code> and examine the sections:<\/p>\n<pre>$ readelf -a foo.o\r\nELF Header:\r\n  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00\r\n  Class:        ELF32\r\n  Data:         2&#39;s complement, little endian\r\n  Version:      1 (current)\r\n  OS\/ABI:       UNIX - System V\r\n  ABI Version:  0\r\n  Type:         REL (Relocatable file)\r\n  Machine:      ARM\r\n[..]\r\n\r\nSection Headers:\r\n  [Nr] Name       Type      Addr   Off    Size   ES Flg Lk Inf Al\r\n  [ 0]            NULL      000000 000000 000000 00      0   0  0\r\n  [ 1] .text      PROGBITS  000000 000034 000034 00  AX  0   0  4\r\n  [ 2] .rel.text  REL       000000 000190 000008 08   I  8   1  4\r\n  [ 3] .data      PROGBITS  000000 000068 000000 00  WA  0   0  1\r\n  [ 4] .bss       NOBITS    000000 000068 000000 00  WA  0   0  1\r\n  [..]\r\n\r\nSymbol table &#39;.symtab&#39; contains 11 entries:\r\n   Num:    Value  Size Type    Bind   Vis      Ndx Name\r\n   [..]\r\n     9: 00000000    28 FUNC    GLOBAL DEFAULT    1 foo\r\n    10: 0000001c    24 FUNC    GLOBAL DEFAULT    1 bar<\/pre>\n<p>There&#8217;s tons more info not included here, and it&#8217;s an interesting read in its own right. Anyways, both our functions live in the <code>.text<\/code> segment. We can see this from the symbol table&#8217;s <code>Ndx<\/code> column which says section <code>1<\/code>, corresponding to <code>.text<\/code>. We can also see it in the disassembly:<\/p>\n<pre>$ objdump -d foo.o\r\nfoo.o:     file format elf32-littlearm\r\n\r\nDisassembly of section .text:\r\n00000000 &lt;foo&gt;:\r\n   0:   e52db004   push    {fp}\r\n   4:   e28db000   add     fp, sp, #0\r\n   8:   e3a0302a   mov     r3, #42 ; 0x2a\r\n   c:   e1a00003   mov     r0, r3\r\n  10:   e28bd000   add     sp, fp, #0\r\n  14:   e49db004   pop     {fp}\r\n  18:   e12fff1e   bx      lr\r\n\r\n0000001c &lt;bar&gt;:\r\n  1c:   e92d4800   push    {fp, lr}\r\n  20:   e28db004   add     fp, sp, #4\r\n  24:   ebfffffe   bl      0 &lt;foo&gt;\r\n  28:   e1a03000   mov     r3, r0\r\n  2c:   e1a00003   mov     r0, r3\r\n  30:   e8bd8800   pop     {fp, pc}<\/pre>\n<p>Now lets say that the only library function we use is <code>foo<\/code>, and we want <code>bar<\/code> removed from the final binary. This is tricky, because you can&#8217;t just modify a <code>.text<\/code> segment by slicing things out of it. There are offsets, addresses and cross-dependencies compiled into the code, and any shifts would mean trying to patch that all up. If only it was as easy as when <code>strip<\/code> removed whole sections&#8230;<\/p>\n<p>This is where <code>gcc -ffunction-sections<\/code> and <code>ghc -split-sections<\/code> come in. Let&#8217;s recompile our file with <code>gcc -ffunction-sections foo.c -c -o foo.o<\/code>:<\/p>\n<pre>$ readelf -a foo.o\r\n[..]\r\nSection Headers:\r\n  [Nr] Name          Type      Addr  Off  Size ES Flg Lk Inf Al\r\n  [ 0]               NULL      00000 0000 0000 00      0   0  0\r\n  [ 1] .text         PROGBITS  00000 0034 0000 00  AX  0   0  1\r\n  [ 2] .data         PROGBITS  00000 0034 0000 00  WA  0   0  1\r\n  [ 3] .bss          NOBITS    00000 0034 0000 00  WA  0   0  1\r\n  [ 4] .text.foo     PROGBITS  00000 0034 001c 00  AX  0   0  4\r\n  [ 5] .text.bar     PROGBITS  00000 0050 001c 00  AX  0   0  4\r\n  [ 6] .rel.text.bar REL       00000 01c0 0008 08   I 10   5  4\r\n  [..]\r\n\r\nSymbol table &#39;.symtab&#39; contains 14 entries:\r\n   Num:    Value  Size Type    Bind   Vis      Ndx Name\r\n[..]\r\n12: 00000000    28 FUNC    GLOBAL DEFAULT    4 foo\r\n13: 00000000    28 FUNC    GLOBAL DEFAULT    5 bar<\/pre>\n<p>Look at that! Each function now has its very own section.<\/p>\n<p>This means that a linker can go through and find all the sections that contain symbols we need, and drop the rest. We can enable it with the aptly named <code>ld<\/code> flag <code>--gc-sections<\/code>. You can pass that flag to <code>ld<\/code> via <code>gcc<\/code> using <code>gcc -Wl,--gc-sections<\/code>. And you can pass that whole thing to <code>gcc<\/code> via <code>ghc<\/code> using <code>ghc -optc-Wl,--gc-sections<\/code><\/p>\n<p>I enabled all of this in my builder&#8217;s <code>.cabal\/config<\/code>:<\/p>\n<pre>program-default-options\r\n  gcc-options: -Os -Wl,--gc-sections -ffunction-sections -fdata-sections\r\n  ghc-options: -optc-Os -optlo-Os -split-sections<\/pre>\n<p>With this in place, the ShellCheck binary became a mere 14.5MB:<\/p>\n<pre>-rw-r--r-- 1 build build 14503356 Jul 15 10:01 shellcheck<\/pre>\n<p>That&#8217;s less than half the size we started out with. I&#8217;ve since applied the same flags to the x86_64 build, which brought it down from 23MB to 7MB. Snappier downloads and installs for all!<\/p>\n<hr \/>\n<p>For anyone interested in compiling Haskell for armv6hf on x86_64, I spent weeks trying to get cross-compilation going, but in the end (and with many hacks) I was only able to cross-compile armv7. In the end I gave up and took the same approach as with the <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=613\">Windows build<\/a> blog post: a Docker image runs the Raspbian armv6 userland in Qemu user emulation mode.<\/p>\n<p>I didn&#8217;t even have to set up Qemu. There&#8217;s tooling from <a href=\"https:\/\/resin.io\/blog\/building-arm-containers-on-any-x86-machine-even-dockerhub\/\">Resin.io<\/a> for building ARM Docker containers for IoT purposes. ShellCheck (ab)uses this to run emulated GHC and cabal. Everything Just Works, if slowly.<\/p>\n<p>The Dockerfile is available on GitHub as <a href=\"https:\/\/github.com\/koalaman\/armv6hf-builder\">koalaman\/armv6hf-builder<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;d like a more up-to-date version of ShellCheck than what Raspbian provides, you can build your own on a Raspberry Pi Zero in a little over 21 hours. Alternatively, as of last week, you can also download RPi compatible, statically linked armv6hf binaries of every new commit and stable release. It&#8217;s statically linked &#8212; &hellip; <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=729\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;So what exactly is -ffunction-sections and how does it reduce binary size?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5,4,23],"tags":[57,46],"class_list":["post-729","post","type-post","status-publish","format-standard","hentry","category-advanced-linux","category-linux","category-programming","tag-haskell","tag-shellcheck"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=729"}],"version-history":[{"count":15,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729\/revisions"}],"predecessor-version":[{"id":741,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/729\/revisions\/741"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}