Swearing in the Linux kernel: now interactive

September 22nd, 2016
Graphs showing a rise in "crap" and fall in "fuck" over time.


If you’ve followed discussions on Linux, you may at some point have bumped into a funny graph showing how many times frustrated Linux kernel developers have put four letter words into the source code.

Today, for the first time in 12 years, it’s gotten a major revamp!

You can now interactively plot any words of your choice with commit level granularity.


Did you find any interesting insights? Post a comment!

Basic Linux-related things, Linux

Technically correct: floating point calculations in bc

June 14th, 2015

Whenever someone asks how to do floating point math in a shell script, the answer is typically bc:

$  echo "scale=9; 22/7" | bc

However, this is technically wrong: bc does not support floating point at all! What you see above is arbitrary precision FIXED point arithmetic.

The user’s intention is obviously to do math with fractional numbers, regardless of the low level implementation, so the above is a good and pragmatic answer. However, technically correct is the best kind of correct, so let’s stop being helpful and start pedantically splitting hairs instead!

Fixed vs floating point

There are many important things that every programmer should know about floating point, but in one sentence, the larger they get, the less precise they are.

In fixed point you have a certain number of digits, and a decimal point fixed in place like on a tax form: 001234.56. No matter how small or large the number, you can always write down increments of 0.01, whether it’s 000000.01 or 999999.99.

Floating point, meanwhile, is basically scientific notation. If you have 1.23e-4 (0.000123), you can increment by a millionth to get 1.24e-4. However, if you have 1.23e4 (12300), you can’t add less than 100 unless you reserve more space for more digits.

We can see this effect in practice in any language that supports floating point, such as Haskell:

> truncate (16777216 - 1 :: Float)
> truncate (16777216 + 1 :: Float)

Subtracting 1 gives us the decremented number, but adding 1 had no effect with floating point math! bc, with its arbitrary precision fixed points, would instead correctly give us 16777217! This is clearly unacceptable!

Floating point in bc

The problem with the bc solution is, in other words, that the math is too correct. Floating point math always introduces and accumulates rounding errors in ways that are hard to predict. Fixed point doesn’t, and therefore we need to find a way to artificially introduce the same type of inaccuracies! We can do this by rounding a number to a N significant bits, where N = 24 for float and 52 for double. Here is some bc code for that:


define trunc(x) {
  auto old, tmp
  old=scale; scale=0; tmp=x/1; scale=old
  return tmp
define fp(bits, x) {
  auto i
  if (x < 0) return -fp(bits, -x);
  if (x == 0) return 0;
  while (x < 1) { x*=2; i+=1; }
  while (x >= 2) { x/=2; i-=1; }
  return trunc(x * 2^bits + 0.5) / 2^(i)

define float(x) { return fp(24, x); }
define double(x) { return fp(52, x); }
define test(x) {
  print "Float:  ", float(x), "\n"
  print "Double: ", double(x), "\n"

With this file named fp, we can try it out:

$ bc -ql fp <<< "22/7"

$ bc -ql fp <<< "float(22/7)"

The first number is correct to 30 decimals. Yuck! However, with our floating point simulator applied, we get the desired floating point style errors after ~7 decimals!

Let's write a similar program for doing the same thing but with actual floating point, printing them out up to 30 decimals as well:

{-# LANGUAGE RankNTypes #-}
import Control.Monad
import Data.Number.CReal
import System.Environment

main = do
    input <- liftM head getArgs
    putStrLn . ("Float:  " ++) $ showNumber (read input :: Float)
    putStrLn . ("Double: " ++) $ showNumber (read input :: Double)
    showNumber :: forall a. Real a => a -> String
    showNumber = showCReal 30 . realToFrac

Here's a comparison of the two:

$ bc -ql fp <<< "x=test(1000000001.3)"
Float:  1000000000.000000000000000000000000000000
Double: 1000000001.299999952316284179687500000000

$ ./fptest 1000000001.3
Float:  1000000000.0
Double: 1000000001.2999999523162841796875

Due to differences in rounding and/or off-by-one bugs, they're not always identical like here, but the error bars are similar.

Now we can finally start doing floating point math in bc!

Advanced Linux-related things, Linux, Programming , ,

dd is not a disk writing tool

April 20th, 2015

If you’ve ever used dd, you’ve probably used it to read or write disk images:

# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M

Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use dd. Want to write to a raw device? Use dd.

This belief can make simple tasks complicated. How do you combine dd with gzip? How do you use pv if the source is raw device? How do you dd over ssh?

The fact of the matter is, dd is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.

dd just reads and writes file.

On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and dd can copy files, dd be used to copy raw disks.

But do you know what else can read and write files? Everything:

# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb

# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso

# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz

However, this does not mean that dd is useless! The reason why people started using it in the first place is that it does exactly what it’s told, no more and no less.

If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.

dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files. When this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.

This is not to say I think dd is overrated! Au contraire! It’s one of my favorite Unix tools!

dd is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd! Want to open a file with O_SYNC? Use dd! Want to read groups of three byte pixels from a PPM file? Use dd!

It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to and inappropriately hailed for its most generic and least interesting capability: simply copying a file from start to finish.

* dd actually has two jobs: Convert and Copy. Legend has it that the intended name, “cc”, was taken by the C compiler, so the letters were shifted by one to give “dd”. This is also why we ended up with a Window system called X.

Basic Linux-related things, Linux

I’m not paranoid, you’re just foolish

January 24th, 2015

Remember this dialog from when you installed your distro?

Fake dialog saying "In the event of physical theft, grant perpetrators access to" with options for "My browsing history, My email and social media, My photos and documents, and similar". All boxes are checked by default.

Most distros have a step like this. If you don’t immediately recognize it, you might have used a different installer with different wording. For example, the graphical Ubuntu installer calls it “Encrypt the new Ubuntu installation for security”, while the text installer even more opaquely calls it “Use entire disk and set up encrypted LVM”.

Somehow, some people have gotten it into their heads that not granting the new owner access to all your data after they steal your computer is a sign of paranoia. It’s 2015, and there actually exists people who have information based jobs and spend half their lives online, who not only think disk encryption is unnecessary but that it’s a sign you’re doing something illegal.

I have no idea what kind of poorly written crime dramas or irrational prime ministers they get this ridiculous notion from.

An office desk covered in broken glass after a break-in.

The last time my laptop was stolen from a locked office building.

Here’s a photo from 2012, when my company laptop was taken from a locked office with an alarm system.

Was this the inevitable FBI raid I was expecting and encrypted my drive to thwart?

Or was it a junkie stealing an office computer from a company and user who, thanks to encryption, didn’t have to worry about online accounts, design documents, or the source code for their unreleased product?

Hundreds of thousands of computers are lost or stolen every year. I’m not paranoid for using disk encryption, you’re just foolish if you don’t.

Basic Linux-related things, Security ,

Parameterized Color Cell Compression

August 24th, 2014

I came across a quaint and adorable paper from SIGGRAPH’86: Two bit/pixel Full Color Encoding. It describes Color Cell Compression, an early ancestor of Adaptive Scalable Texture Compression which is all the rage these days.

Like ASTC, it offers a fixed 2 bits/pixel encoding for color images. However, the first of many d’awwws in this paper comes as early as the second line of the abstract, when it suggests that a fixed rate is useful not for the random access we covet for rendering today, but simply for doing local image updates!

The algorithm can compress a 640×480 image in just 11 seconds on a 3MHz VAX 11/750, and decompress it basically in real time. This means that it may allow full color video, unlike these impractical, newfangled transform based algorithms people are researching.

CCC actually works astonishingly well. Here’s our politically correct Lenna substitute:


The left half of the image is 24bpp, while the right is is 2bpp. Really the only way to tell is in the eyes, and I’m sure there’s an interesting, evolutionary explanation for that.

If we zoom in, we can get a sense of what’s going on:


The image is divided into 4×4 cells, and each cell is composed of only two different colors. In other words, the image is composed of 4×4 bitmaps with a two color palette, each chosen from an image-wide 8bit palette. A 4×4 bitmap would take up 16 bits, and two 8bit palette indices would take up 16 bits, for a total of 32 bits per 16 pixels — or 2 bits per pixel.

The pixels in each cell are divided into two groups based on luminosity, and each group gets its own color based on the average color in the group. One of the reasons this works really well, the author says, is because video is always filmed so that a change in chromaticity has an associated change in luminosity — otherwise on-screen features would be invisible to the folks at home who still have black&white TVs!

We now know enough to implement this lovely algorithm: find an 8bit palette covering the image, then for each cell of 4×4 pixels, divide the pixels into two groups based on whether their luminosity is over or under the cell average. Find the average color of each part, and find its closest match in the palette.

However, let’s experiment! Why limit ourselves to 4×4 cells with 2 colors each from a palette of 256? What would happen if we used 8×8 cells with 3 colors each from a palette of 512? That also comes out to around 2 bpp.

Parameterizing palette and cell size is easy, but how do we group pixels into k colors based on luminosity? Simple: instead of using the mean, we use k-means!

Here’s a colorful parrot in original truecolor on the left, CCC with 4×4 cells in the middle, and 32×32 cells (1.01 bpp) on the right. Popartsy!


Here’s what we get if we only allow eight colors per horizontal line. The color averaging effect is really pronounced:


And here’s 3 colors per 90×90 cell:

The best part about this paper is the discussion of applications. For 24fps video interlaced at 320×480, they say, you would need a transfer rate of 470 kb/s. Current microcomputers have a transfer rate of 625 kb/s, so this is well within the realm of possibility. Today’s standard 30 megabyte hard drives could therefore store around 60 seconds of animation!

Apart from the obvious benefits of digital video like no copy degradation and multiple resolutions, you can save space when panning a scene by simply transmitting the edge in the direction of the pan!

You can even use CCC for electronic shopping. Since the images are so small and decoding so simple, you can make cheap terminals in great quantities, transmit images from a central location and provide accompanying audio commentary via cable!

In addition to one-to-many applications, you can have one-to-one electronic, image based communication. In just one minute on a 9600bps phone connection, a graphic arts shop can transmit a 640×480 image to clients for approval and comment.

You can even do many-to-many teleconferencing! Imagine the ability to show the speaker’s face, or a drawing they wish to present to the group on simple consumer hardware!

Truly, the future is amazing.

Here’s the JuicyPixel based Haskell implementation I used. It doesn’t actually encode the image, it just simulates the degradation. Ironically, this version is slower than the authors’ original, even though the hardware is five or six orders of magnitude faster!

Apart from the parameterization, I added two other improvements: Firstly, instead of the naive RGB based average suggestion in the paper, it uses the YCrCb average. Second, instead of choosing the palette from the original image, it chooses it from the averages. This doesn’t matter much for colorful photograph, but gives better results for images lacking gradients.

Programming ,

Why Bash Is Like That: Rewrite hacks

June 21st, 2014

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Let’s say you wanted to enforce a policy in which no files on the system could contain swearing. How would you write a script that checks it? Let’s use the word “damn”, and let’s write a script “checklanguage” that checks whether a file contains that word.

Our first version might be:

#!/usr/bin/env bash
grep -q "damn" "$@" 

The problem with this is that it triggers on itself: ./checklanguage checklanguage returns true. How can we write the script in such a way that it reliably detects the word, but doesn’t detect itself? (Think about it for a second).

There are many ways of doing this: a="da"; b="mn"; grep "$a$b", grep "da""mn", grep da\mn. All of these check for the four characters d-a-m-n in sequence, but doesn’t contain the sequence itself. These methods rely on two things being A. identical in one context (shell script) and B. different in another (plaintext).

This type of trick is the basis of three common command line hacks:

Finding processes from ps, while excluding the grep that does the filtering.

If we do a simple ps ax | grep processname, we might get output like this:

$ ps ax | grep processname
13003 pts/2    S      0:00 /bin/bash ./processname
13496 pts/4    R+     0:00 grep --color=auto processname

How do we get the same list, but without the grep process? You’ll see people wrapping the first character in square brackets:

$ ps ax | grep "[p]rocessname"
13003 pts/2    S      0:00 /bin/bash ./processname

In this case, the regex “[p]rocessname” is identical to the regex “processname”, but since they’re written differently, the latter matches itself while the former doesn’t. This means that the grep won’t match itself, and we only get the process we’re interested in (this job is better done by pgrep).

There is no syntax rule that says “if the first character is enclosed in square brackets, grep shall ignore itself in ps output”.

It’s just a logical side effect of rewriting the regex to work the same but not match itself. We could have used grep -E 'process()name' or grep -E 'proces{2}name' instead.

Running commands instead of aliases

Maybe you’re sick of Debian’s weird perl rename, and you aliased it to rename.ul instead.

$ rename -v .htm .html *
`foo.htm' -> `foo.html'

Yay, that’s way easier than writing regex! But what if we need to use the unaliased rename?

$ rename -v 's/([1-9])x([0-9]*)/S$1E$2/' *
rename.ul: not enough arguments

Instead, you’ll see people prefixing the command with a backslash:

$ \rename -v 's/([1-9])x([0-9]*)/S0$1E$2/' *
Foo_1x20.mkv renamed as Foo_S01E20.mkv

Shell aliases trigger when a command starts with a word. However, if the command starts with something that expands into a word, alias expansion does not apply. This allows us to use e.g. \ls or \git to run the command instead of the alias.

There is no syntax rule that says that “if a command is preceded by a backslash, alias expansion is ignored”.

It’s just a logical side effect of rewriting the command to work the same, but not start with a literal token that the shell will recognize as an alias. We could also have used l\s or 'ls'.

Deleting files starting with a dash

How would you go about deleting a file that starts with a dash?

$ rm -v -file
rm: invalid option -- 'l'

Instead, you’ll see people prefixing the filename with ./:

$ rm -v ./-file
removed `./-file'

A command will interpret anything that starts with a dash as a flag. However, to the file system, -file and ./-file mean exactly the same thing.

There is no syntax rule that says that “if an argument starts with ./, it shall be interpretted as a filename and not an option”.

It’s just a logical side effect of rewriting a filename to refer to the same file, but start with a different character. We could have used rm /home/me/-file or rm ../me/-file instead.

Homework: What do you tell someone who thinks that ./myscript is a perfect example of how weird UNIX is? Why would anyone design a system where the run command is “./” instead of “run”?

Basic Linux-related things, Linux , ,

Basics of a Bash action game

February 24th, 2014

If you want to write an action game in bash, you need the ability to check for user input without actually waiting for it. While bash doesn’t let you poll the keyboard in a great way, it does let you wait for input for a miniscule amount of time with read -t 0.0001.

Here’s a snippet that demonstrates this by bouncing some text back and forth, and letting the user control position and color. It also sets (and unsets) the necessary terminal settings for this to look good:

#!/usr/bin/env bash

# Reset terminal on exit
trap 'tput cnorm; tput sgr0; clear' EXIT

# invisible cursor, no echo
tput civis
stty -echo

text="j/k to move, space to color"
max_x=$(($(tput cols) - ${#text}))
dir=1 x=1 y=$(($(tput lines)/2))

while sleep 0.05 # GNU specific!
    # move and change direction when hitting walls
    (( x == 0 || x == max_x )) && \
        ((dir *= -1))
    (( x += dir ))

    # read all the characters that have been buffered up
    while IFS= read -rs -t 0.0001 -n 1 key
        [[ $key == j ]] && (( y++ ))
        [[ $key == k ]] && (( y-- ))
        [[ $key == " " ]] && color=$((color%7+1))

    # batch up all terminal output for smoother action
        tput cup "$y" "$x"
        tput setaf "$color"
        printf "%s" "$text"

    # dump to screen
    printf "%s" "$framebuffer"

Advanced Linux-related things, Programming , , ,

Adventures in String Reversal

September 30th, 2013

Oh, string reversal! The bread and butter of Programming 101 exams. If I ask you to prove your hacker worth by implementing it in your favorite language, how long would it take you and how many tries will you need to get it right?

Five minutes with one or two tries? 30 seconds and nail it on the first try?

What if I say that this is 2013 and your software can’t just fail because a user inputs non-ASCII data?

Well… Java, C#, Python, Haskell and all other modern languages have native Unicode string types, so at most you’ll just need another minute to verify that it does indeed work, right?

No, you will in fact need several hours and hundreds of lines of code. Reversing a string is much harder than one would think.

The following are cases that a string reversal algorithm could reasonably be expected to handle, but which your initial, naive implementation most likely fails:

Byte order marks

Wikipedia says that “The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. It is encoded at U+FEFF byte order mark (BOM). BOM use is optional, and, if used, should appear at the start of the text stream.”

It’s obviously a bug if the BOM ends up at the end of the string when it’s reversed. At least that’s a simple fix, right?

Surrogate pairs

Environment based around 16-bit character types, like Java and C#’s char and some C/++ compilers’ wchar_t, had an awkward time when Unicode 2.0 came along, which expanded the number of characters from 65536 to 1114112. Characters in so-called supplementary planes will not fit in a 16-bit char, and will be encoded as a surrogate pair – two chars next to each other.

If two chars form a single code point (see e.g. Java’s String.codePointAt(int)), reversing them produces an invalid character.

Trashing characters in the string is not a property of correct string reversers. Please fix.

Composing characters

While there is a separate character for “ñ”, n with tilde, it can also be written as two characters: regular “n” (U+006E) plus composing tilde (U+0303), which I’ll write as a regular tilde for illustration.

In this way, you can encode “pin~a colada”, and it will render as “piña colada”. If the string is trivially reversed, it becomes “adaloc a~nip” which will render as “adaloc ãnip”. The tilde is now on the wrong character.

Please don’t shuffle diacritical marks in the input string. Just reverse it.

By the way, if you try to fix this by ensuring that composing characters stay behind their preceding character, you’ll introduce a regression. Double composing characters go between the characters they compose.

To put a ‘double macron below’ under the characters “ea” in “mean”, you’d encode “me_an” which renders as “mean”. If you try to reverse it while keeping the macron after the “e”, you end up with “nae_m” (“naem“) rather than the original, correct “na_em” (“naem”).

Directional overrides

What’s “hello world” backwards? It’s “hello world” if your implementation is to be believed.

It happens to be encoded with left-to-right and right-to-left overrides as “U+202D U+202E dlrow olleh”.

In this direction, everything from the second character onward is shown right-to-left as “hello world”. With trivial reversion, it becomes “hello world” followed by a RLO immediately cancelled by a LRO.

Your string reverser doesn’t actually reverse strings. Would you kindly sort that out?

Obviously, it also has to handle explicit directional embedding, U+202A and U+202B, which are similar but not identical to directional overrides.

RTL scripts

Reversal issues occur naturally in bidirectional text. A mix such as “hello עולם” will render “hello” LTR and “עולם” RTL (the “ם” is encoded last, but displays leftmost in that word). When the latin script is first, the string starts from the left margin, with the first encoded character to the left.

If we trivially reverse this string, we get “olleh םלוע” as it starts rendering from the right margin. The first encoded character appears rightmost in the right word, while the last encoded displays rightmost of the leftmost word, i.e. in the middle.

Obviously, “hello עולם” backwards is “םלוע olleh”. Please add this to your list.

Left-to-right and right-to-left markers

Similarly to the two above cases, the LRM (U+200E) and RLM (U+200F) codes allows changing the direction neutral characters (such as punctuation) are rendered.

“(RLM)O:” will be rendered as “:O” in the right margin. With trivial string reversal, it will still render as “:O”, starting at the left margin.

Didn’t we already file two bugs about this?

Pop directional formatting

Once your kludged and fragile directional string reversal support appears to work reasonably ok, along comes the U+202C Pop Directional Format character. It never ends!

This character undoes the previous explicit directional override, whatever it happened to be. You can no longer try to be clever by splitting the string up into linear sections based on directional markers; you have to go full stack parsing.

Here’s the ten thousand word specification of the Unicode directionality algorithm. Have fun.

Interlinear annotations

Even if you give up and add a TODO to handle directionality, you still have some cases to go. In logographic languages like Chinese and Japanese, you can have pronunciation guides, so called ruby characters, alongside the text.

If your browser supports it, here’s an example: kanji.

To support this in plain text, Unicode has U+FFF9 through U+FFFB, the Interlinear Annotation Anchor, Separator and Terminator characters respectively. The above could be encoded as “U+FFFF9 漢字 U+FFFA kan U+FFFA ji U+FFFB”.

Reversing the anchor and terminating characters is obviously a bug.

Your string reverser produces garbled output instead of a reversed string… Is it going to be much longer?

Note that reversing just the contents is still wrong. Instead of correctly annotating “字漢” with “ij nak”, you’d be annotating “ij” with “nak 字漢”.

Once you’ve correctly handled this case, try it again when you have an excess of separators at the end of the ruby text. Normally, these would just be ignored, but if you reversed them and put them in front, they’ll push all ruby characters away from where they were supposed to be.

For “U+FFFF9 漢字 U+FFFA kan U+FFFA ji U+FFFA U+FFFB”, instead of ijnak you’d get 字漢ijnak.

(Update: Commenter Jim convincingly argues that you’d want to reverse the ruby logograph groups but not the characters themselves, resulting in jikan )

Like with the composing characters, your string reversal shuffles ruby characters around. Please… oh, why bother.


Your implementation most likely had half a dozen bugs. Maybe string reversal is beyond your abilities? Join the club!

Hopefully you had fun anyways.


Paste shell script, get feedback: ShellCheck project update

June 30th, 2013

tl;dr: ShellCheck is a bash/sh static analysis and linting tool. Paste a shell command or script on ShellCheck.net and get feedback about many common issues, both in scripts that currently fail and scripts that appear to work just fine.

There’s been a lot of progress since I first posted about it seven months ago. It has a new home on ShellCheck.net with a simplified and improved interface, and the parser has been significantly bugfixed so that parsing errors for correct code are now fairly rare.

However, the best thing is that it can detect a heaping bunch of new problems! This post mentions merely a subset of them.


Quiz: ShellCheck is aware of many common usage problems. Are you?

  • find . -name *.mp3
  • sudo echo 3 > /proc/sys/vm/drop_caches
  • PS1='\e[0;32m\$\e[0m '
  • find . | grep "*.mp3"
  • [ $n > 7 ]
  • [[ $n > 7 ]]
  • tr 'A-Z' 'a-z'
  • cmd 2>&1 > log
  • array=(1, 2, 3)
  • echo $10
  • [[ $a=$b ]]
  • [[ $a = $b ]]
  • progress=$((i/total*100))
  • trap "echo \"Time used: $SECONDS\"" EXIT
  • find dir -exec cp {} /backup && rm {} \;
  • [[ $keep = [yY] ]] && mv file /backup || rm file

ShellCheck gives more helpful messages for many Bash syntax errors

Bash says ShellCheck points to the exact position and says
: command not found Literal carriage return. Run script through tr -d ‘\r’
unexpected token: `fi’ Can’t have empty then clauses (use ‘true’ as a no-op)
unexpected token `(‘ Shells are space sensitive. Use ‘< <(cmd)', not '<<(cmd)'
unexpected token `(‘ ‘(‘ is invalid here. Did you forget to escape it?
echo foo: command not found This is a &nbsp;. Delete it and retype as space.

ShellCheck suggests style improvements

Code ShellCheck suggestion
basename "$var" Use parameter expansion instead, such as ${var##*/}
ls | grep 'mp3$' Don’t use ls | grep. Use a glob or a for loop with a condition.
expr 3 + 2 Use $((..)), ${} or [[ ]] in place of antiquated expr.
cat foo | grep bar Useless cat. Consider ‘cmd < file | ..' or 'cmd file | ..' instead.
length=$(echo "$var" | wc -c") See if you can use ${#variable} instead

ShellCheck recognizes common but wrong attempts at doing things

Code ShellCheck tip
var$n=42 For indirection, use (associative) arrays or ‘read “var$n” <<< "value"'".
(Bash says “var3=42: command not found”)
${var$n} To expand via indirection, use name=”foo$n”; echo “${!name}”
(Bash says “bad substitution”. )
echo 'It\'s time' Are you trying to escape that single quote? echo ‘You’\”re doing it wrong’
(Bash says “unexpected end of file”)
[ grep a b ] Use ‘if cmd; then ..’ to check exit code, or ‘if [[ $(cmd) == .. ]]’ to check output
(Bash says “[: a: binary operator expected”)
var=grep a b To assign the output of a command, use var=$(cmd)
(Bash says “a: command not found”)

ShellCheck can help with POSIX sh compliance and bashisms

When a script is declared with #!/bin/sh, ShellCheck checks for POSIX sh compliance, much like checkbashisms.

ShellCheck is free software, and can be used online and locally

ShellCheck is of course Free Software, and has a cute cli frontend in addition to the primary online version.

ShellCheck wants your feedback and suggestions!
Does ShellCheck give you incorrect suggestions? Does it fail to parse your working code? Is there something it could have warned about, but didn’t? After pasting a script on ShellCheck.net, a tiny “submit feedback” link appears in the top right of the annotated script area. Click it to submit the code plus your comments, and I can take a look!

Basic Linux-related things, Linux , ,

Making bash run DOS/Windows CRLF EOL scripts

January 30th, 2013

If you for any reason use a Windows editor to write scripts, it can be annoying to remember to convert them and bash fails in mysterious ways when you don’t. Let’s just get rid of that problem once and for all:

cat > $'/bin/bash\r' << "EOF" #!/usr/bin/env bash script=$1 shift exec bash <(tr -d '\r' < "$script") "$@" EOF

This allows you to execute scripts with DOS/Windows \r\n line endings with ./yourscript (but it will fail if the script specifies parameters on the shebang, or if you run it with bash yourscript). It works because from a UNIX point of view, DOS/Windows files specify the interpretter as "bash^M", and we override that to clean the script and run bash on the result.

Of course, you can also replace the helpful exec bash part with echo "Run dos2unix on your file!" >&2 if you'd rather give your users a helpful reminder rather than compatibility or a crazy error.

Basic Linux-related things, Linux , ,