{"id":995,"date":"2021-04-05T01:19:05","date_gmt":"2021-04-05T01:19:05","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=995"},"modified":"2021-04-05T01:20:32","modified_gmt":"2021-04-05T01:20:32","slug":"why-is-usr-bin-test-4kib-smaller-than-usr-bin","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=995","title":{"rendered":"Why is \/usr\/bin\/test 4kiB smaller than \/usr\/bin\/[ ?"},"content":{"rendered":"\n<div class=\"wp-block-jetpack-markdown\"><p>Reddit user mathisweirdaf posted <a href=\"https:\/\/www.reddit.com\/r\/bash\/comments\/lbdfyi\/why_are_the_executables_and_test_a_4_kb_difference\/\">this interesting observation<\/a>:<\/p>\n<pre><code> $ ls -lh \/usr\/bin\/{test,[}\n-rwxr-xr-x 1 root root 59K  Sep  5  2019 '\/usr\/bin\/['\n-rwxr-xr-x 1 root root 55K  Sep  5  2019  \/usr\/bin\/test\n<\/code><\/pre>\n<p><code>[<\/code> and <code>test<\/code> are supposed to be aliases for each other, and yet there is a 4kiB difference between their GNU coreutils binaries. Why?<\/p>\n<p>First, for anyone surprised: yes, there is a <code>\/usr\/bin\/[<\/code>. I have a <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=25\">previous post<\/a> on this subject, but here&#8217;s a quick recap:<\/p>\n<p>When you write <code>if [ -e \/etc\/passwd ]; then ..<\/code> that bracket is not shell syntax but just a regular command with a funny name. It&#8217;s serviced by <code>\/usr\/bin\/[<\/code>, or (more likely) a shell builtin. This explains a lot of its surprising behavior, e.g. why it&#8217;s notoriously space sensitive: <code>[1=2]<\/code> is no more valid than <code>ls-l\/tmp<\/code>.<\/p>\n<p>Anyways, why is there a size difference? We can compare <code>objdump<\/code> output to see where the data goes. Here&#8217;s an excerpt from <code>objdump -h \/usr\/bin\/[<\/code>:<\/p>\n<pre><code>                 size                                          offset\n15 .text         00006e82  0000000000002640  0000000000002640  00002640  2**4\n16 .fini         0000000d  00000000000094c4  00000000000094c4  000094c4  2**2\n17 .rodata       00001e4c  000000000000a000  000000000000a000  0000a000  2**5\n<\/code><\/pre>\n<p>and here&#8217;s <code>objdump -h \/usr\/bin\/test<\/code>:<\/p>\n<pre><code>15 .text         000068a2  0000000000002640  0000000000002640  00002640  2**4\n16 .fini         0000000d  0000000000008ee4  0000000000008ee4  00008ee4  2**2\n17 .rodata       00001aec  0000000000009000  0000000000009000  00009000  2**5\n<\/code><\/pre>\n<p>We can see that the <code>.text<\/code> segment (compiled executable code) &#8212; is 1504 bytes larger, while <code>.rodata<\/code> (constant values and strings) is 864 bytes larger.<\/p>\n<p>Most crucially, the increased size of the <code>.text<\/code> segment causes it to go from the 8000s to the 9000s, crossing a 0x1000 (4096) page size boundary, and therefore shifting all other segments up by 4096 bytes. This is the size difference we&#8217;re seeing.<\/p>\n<p>The only nominal difference between <code>[<\/code> and <code>test<\/code> is that <code>[<\/code> requires a <code>]<\/code> as a final argument. Checking for that would be a very minuscule amount of code, so what are those ~1500 bytes used for?<\/p>\n<p>Since it&#8217;s hard to inspect stripped binaries, I built my own copy of <code>coreutils<\/code> and compared the list of functions in each:<\/p>\n<pre><code>$ diff -u &lt;(nm -S --defined-only src\/[ | cut -d ' ' -f 2-) &lt;(nm -S --defined-only src\/test | cut -d ' ' -f 2-)\n--- \/dev\/fd\/63      2021-02-02 20:21:35.337942508 -0800\n+++ \/dev\/fd\/62      2021-02-02 20:21:35.341942491 -0800\n@@ -37,7 +37,6 @@\n D __dso_handle\n d _DYNAMIC\n D _edata\n-0000000000000099 T emit_bug_reporting_address\n B _end\n 0000000000000004 D exit_failure\n 0000000000000008 b file_name\n@@ -63,7 +62,7 @@\n 0000000000000022 T locale_charset\n 0000000000000014 T __lstat\n 0000000000000014 t lstat\n-0000000000000188 T main\n+00000000000000d1 T main\n 000000000000000b T make_timespec\n 0000000000000004 d nslots\n 0000000000000022 t one_argument\n@@ -142,16 +141,10 @@\n 0000000000000032 T umaxtostr\n 0000000000000013 t unary_advance\n 00000000000004e5 t unary_operator\n-00000000000003d2 T usage\n+0000000000000428 T usage\n 0000000000000d2d T vasnprintf\n 0000000000000013 T verror\n 00000000000000ae T verror_at_line\n-0000000000000008 D Version\n-00000000000000ab T version_etc\n-0000000000000018 T version_etc_ar\n-000000000000042b T version_etc_arn\n-000000000000002f R version_etc_copyright\n-000000000000007a T version_etc_va\n 000000000000001c r wide_null_string.2840\n 0000000000000078 T x2nrealloc\n 000000000000000e T x2realloc\n<\/code><\/pre>\n<p>The major contributors are the <code>version_etc*<\/code> functions. What do they do?<\/p>\n<p>Well, let&#8217;s <a href=\"https:\/\/github.com\/coreutils\/gnulib\/blob\/8aea50f4dbe02dcb286d3e89fd3a66c0d1e307bf\/lib\/version-etc.c#L42-L262\">have a look<\/a>:<\/p>\n<pre><code>\/* The three functions below display the --version information the\n   standard way. [...]\n<\/code><\/pre>\n<p>These are 260 lines of rather elaborate, internationalized, conditional ways of formatting data that makes up <code>--version<\/code> output. Together they take about <code>bc &lt;&lt;&lt; &quot;ibase=16; 7A+2F+42B+18+AB+8+99&quot;<\/code> = 1592 bytes.<\/p>\n<p>What does this mean? Simple. This is what we&#8217;re paying an extra 4kb for:<\/p>\n<pre><code>$ \/usr\/bin\/[ --version\n[ (GNU coreutils) 8.30\nCopyright (C) 2018 Free Software Foundation, Inc.\nLicense GPLv3+: GNU GPL version 3 or later &lt;https:\/\/gnu.org\/licenses\/gpl.html&gt;.\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n\nWritten by Kevin Braunsdorf and Matthew Bradburn.\n<\/code><\/pre>\n<p><code>[ --version<\/code> is missing the final <code>]<\/code>, so the invocation is invalid and the result is therefore implementation defined. GNU is free to let it output version info.<\/p>\n<p>Meanwhile, <code>\/usr\/bin\/test --version<\/code> is a valid invocation, and POSIX mandates that it simply returns success when the first parameter (<code>--version<\/code>) is a non-empty string.<\/p>\n<p>This difference is even mentioned in the documentation:<\/p>\n<pre><code>NOTE: [ honors the --help and --version options, but test does not.\ntest treats each of those as it treats any other nonempty STRING.\n<\/code><\/pre>\n<p>Mystery solved!<\/p>\n<p>(Exercise: what would have been the implications of having <code>test<\/code> support <code>--help<\/code> and <code>--version<\/code> in spite of POSIX?)<\/p>\n<\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5,4],"tags":[8,53],"class_list":["post-995","post","type-post","status-publish","format-standard","hentry","category-advanced-linux","category-linux","tag-gnu","tag-linux"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=995"}],"version-history":[{"count":22,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/995\/revisions"}],"predecessor-version":[{"id":1034,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/995\/revisions\/1034"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}