Reddit user mathisweirdaf posted this interesting observation:
$ ls -lh /usr/bin/{test,[}
-rwxr-xr-x 1 root root 59K Sep 5 2019 '/usr/bin/['
-rwxr-xr-x 1 root root 55K Sep 5 2019 /usr/bin/test
[
and test
are supposed to be aliases for each other, and yet there is a 4kiB difference between their GNU coreutils binaries. Why?
First, for anyone surprised: yes, there is a /usr/bin/[
. I have a previous post on this subject, but here’s a quick recap:
When you write if [ -e /etc/passwd ]; then ..
that bracket is not shell syntax but just a regular command with a funny name. It’s serviced by /usr/bin/[
, or (more likely) a shell builtin. This explains a lot of its surprising behavior, e.g. why it’s notoriously space sensitive: [1=2]
is no more valid than ls-l/tmp
.
Anyways, why is there a size difference? We can compare objdump
output to see where the data goes. Here’s an excerpt from objdump -h /usr/bin/[
:
size offset
15 .text 00006e82 0000000000002640 0000000000002640 00002640 2**4
16 .fini 0000000d 00000000000094c4 00000000000094c4 000094c4 2**2
17 .rodata 00001e4c 000000000000a000 000000000000a000 0000a000 2**5
and here’s objdump -h /usr/bin/test
:
15 .text 000068a2 0000000000002640 0000000000002640 00002640 2**4
16 .fini 0000000d 0000000000008ee4 0000000000008ee4 00008ee4 2**2
17 .rodata 00001aec 0000000000009000 0000000000009000 00009000 2**5
We can see that the .text
segment (compiled executable code) — is 1504 bytes larger, while .rodata
(constant values and strings) is 864 bytes larger.
Most crucially, the increased size of the .text
segment causes it to go from the 8000s to the 9000s, crossing a 0x1000 (4096) page size boundary, and therefore shifting all other segments up by 4096 bytes. This is the size difference we’re seeing.
The only nominal difference between [
and test
is that [
requires a ]
as a final argument. Checking for that would be a very minuscule amount of code, so what are those ~1500 bytes used for?
Since it’s hard to inspect stripped binaries, I built my own copy of coreutils
and compared the list of functions in each:
$ diff -u <(nm -S --defined-only src/[ | cut -d ' ' -f 2-) <(nm -S --defined-only src/test | cut -d ' ' -f 2-)
--- /dev/fd/63 2021-02-02 20:21:35.337942508 -0800
+++ /dev/fd/62 2021-02-02 20:21:35.341942491 -0800
@@ -37,7 +37,6 @@
D __dso_handle
d _DYNAMIC
D _edata
-0000000000000099 T emit_bug_reporting_address
B _end
0000000000000004 D exit_failure
0000000000000008 b file_name
@@ -63,7 +62,7 @@
0000000000000022 T locale_charset
0000000000000014 T __lstat
0000000000000014 t lstat
-0000000000000188 T main
+00000000000000d1 T main
000000000000000b T make_timespec
0000000000000004 d nslots
0000000000000022 t one_argument
@@ -142,16 +141,10 @@
0000000000000032 T umaxtostr
0000000000000013 t unary_advance
00000000000004e5 t unary_operator
-00000000000003d2 T usage
+0000000000000428 T usage
0000000000000d2d T vasnprintf
0000000000000013 T verror
00000000000000ae T verror_at_line
-0000000000000008 D Version
-00000000000000ab T version_etc
-0000000000000018 T version_etc_ar
-000000000000042b T version_etc_arn
-000000000000002f R version_etc_copyright
-000000000000007a T version_etc_va
000000000000001c r wide_null_string.2840
0000000000000078 T x2nrealloc
000000000000000e T x2realloc
The major contributors are the version_etc*
functions. What do they do?
Well, let’s have a look:
/* The three functions below display the --version information the
standard way. [...]
These are 260 lines of rather elaborate, internationalized, conditional ways of formatting data that makes up --version
output. Together they take about bc <<< "ibase=16; 7A+2F+42B+18+AB+8+99"
= 1592 bytes.
What does this mean? Simple. This is what we’re paying an extra 4kb for:
$ /usr/bin/[ --version
[ (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Kevin Braunsdorf and Matthew Bradburn.
[ --version
is missing the final ]
, so the invocation is invalid and the result is therefore implementation defined. GNU is free to let it output version info.
Meanwhile, /usr/bin/test --version
is a valid invocation, and POSIX mandates that it simply returns success when the first parameter (--version
) is a non-empty string.
This difference is even mentioned in the documentation:
NOTE: [ honors the --help and --version options, but test does not.
test treats each of those as it treats any other nonempty STRING.
Mystery solved!
(Exercise: what would have been the implications of having test
support --help
and --version
in spite of POSIX?)
GNU Coreutils is just confusing.
At this point, I don’t even wanna touch the code, I will just let the devs handle it and I won’t suffer myself.