Why is /usr/bin/test 4kiB smaller than /usr/bin/[ ?

Reddit user mathisweirdaf posted this interesting observation:

 $ ls -lh /usr/bin/{test,[}
-rwxr-xr-x 1 root root 59K  Sep  5  2019 '/usr/bin/['
-rwxr-xr-x 1 root root 55K  Sep  5  2019  /usr/bin/test

[ and test are supposed to be aliases for each other, and yet there is a 4kiB difference between their GNU coreutils binaries. Why?

First, for anyone surprised: yes, there is a /usr/bin/[. I have a previous post on this subject, but here’s a quick recap:

When you write if [ -e /etc/passwd ]; then .. that bracket is not shell syntax but just a regular command with a funny name. It’s serviced by /usr/bin/[, or (more likely) a shell builtin. This explains a lot of its surprising behavior, e.g. why it’s notoriously space sensitive: [1=2] is no more valid than ls-l/tmp.

Anyways, why is there a size difference? We can compare objdump output to see where the data goes. Here’s an excerpt from objdump -h /usr/bin/[:

                 size                                          offset
15 .text         00006e82  0000000000002640  0000000000002640  00002640  2**4
16 .fini         0000000d  00000000000094c4  00000000000094c4  000094c4  2**2
17 .rodata       00001e4c  000000000000a000  000000000000a000  0000a000  2**5

and here’s objdump -h /usr/bin/test:

15 .text         000068a2  0000000000002640  0000000000002640  00002640  2**4
16 .fini         0000000d  0000000000008ee4  0000000000008ee4  00008ee4  2**2
17 .rodata       00001aec  0000000000009000  0000000000009000  00009000  2**5

We can see that the .text segment (compiled executable code) — is 1504 bytes larger, while .rodata (constant values and strings) is 864 bytes larger.

Most crucially, the increased size of the .text segment causes it to go from the 8000s to the 9000s, crossing a 0x1000 (4096) page size boundary, and therefore shifting all other segments up by 4096 bytes. This is the size difference we’re seeing.

The only nominal difference between [ and test is that [ requires a ] as a final argument. Checking for that would be a very minuscule amount of code, so what are those ~1500 bytes used for?

Since it’s hard to inspect stripped binaries, I built my own copy of coreutils and compared the list of functions in each:

$ diff -u <(nm -S --defined-only src/[ | cut -d ' ' -f 2-) <(nm -S --defined-only src/test | cut -d ' ' -f 2-)
--- /dev/fd/63      2021-02-02 20:21:35.337942508 -0800
+++ /dev/fd/62      2021-02-02 20:21:35.341942491 -0800
@@ -37,7 +37,6 @@
 D __dso_handle
 d _DYNAMIC
 D _edata
-0000000000000099 T emit_bug_reporting_address
 B _end
 0000000000000004 D exit_failure
 0000000000000008 b file_name
@@ -63,7 +62,7 @@
 0000000000000022 T locale_charset
 0000000000000014 T __lstat
 0000000000000014 t lstat
-0000000000000188 T main
+00000000000000d1 T main
 000000000000000b T make_timespec
 0000000000000004 d nslots
 0000000000000022 t one_argument
@@ -142,16 +141,10 @@
 0000000000000032 T umaxtostr
 0000000000000013 t unary_advance
 00000000000004e5 t unary_operator
-00000000000003d2 T usage
+0000000000000428 T usage
 0000000000000d2d T vasnprintf
 0000000000000013 T verror
 00000000000000ae T verror_at_line
-0000000000000008 D Version
-00000000000000ab T version_etc
-0000000000000018 T version_etc_ar
-000000000000042b T version_etc_arn
-000000000000002f R version_etc_copyright
-000000000000007a T version_etc_va
 000000000000001c r wide_null_string.2840
 0000000000000078 T x2nrealloc
 000000000000000e T x2realloc

The major contributors are the version_etc* functions. What do they do?

Well, let’s have a look:

/* The three functions below display the --version information the
   standard way. [...]

These are 260 lines of rather elaborate, internationalized, conditional ways of formatting data that makes up --version output. Together they take about bc <<< "ibase=16; 7A+2F+42B+18+AB+8+99" = 1592 bytes.

What does this mean? Simple. This is what we’re paying an extra 4kb for:

$ /usr/bin/[ --version
[ (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Kevin Braunsdorf and Matthew Bradburn.

[ --version is missing the final ], so the invocation is invalid and the result is therefore implementation defined. GNU is free to let it output version info.

Meanwhile, /usr/bin/test --version is a valid invocation, and POSIX mandates that it simply returns success when the first parameter (--version) is a non-empty string.

This difference is even mentioned in the documentation:

NOTE: [ honors the --help and --version options, but test does not.
test treats each of those as it treats any other nonempty STRING.

Mystery solved!

(Exercise: what would have been the implications of having test support --help and --version in spite of POSIX?)