Archive for the ‘Basic Linux-related things’ Category

Swearing in the Linux kernel: now interactive

September 22nd, 2016
Graphs showing a rise in "crap" and fall in "fuck" over time.


If you’ve followed discussions on Linux, you may at some point have bumped into a funny graph showing how many times frustrated Linux kernel developers have put four letter words into the source code.

Today, for the first time in 12 years, it’s gotten a major revamp!

You can now interactively plot any words of your choice with commit level granularity.


Did you find any interesting insights? Post a comment!

Basic Linux-related things, Linux

dd is not a disk writing tool

April 20th, 2015

If you’ve ever used dd, you’ve probably used it to read or write disk images:

# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M

Usage of dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use dd. Want to write to a raw device? Use dd.

This belief can make simple tasks complicated. How do you combine dd with gzip? How do you use pv if the source is raw device? How do you dd over ssh?

The fact of the matter is, dd is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.

dd just reads and writes file.

On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and dd can copy files, dd be used to copy raw disks.

But do you know what else can read and write files? Everything:

# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb

# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso

# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz

However, this does not mean that dd is useless! The reason why people started using it in the first place is that it does exactly what it’s told, no more and no less.

If an alias specifies -a, cp might try to create a new block device rather than a copy of the file data. If using gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.

dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files. When this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving dd out entirely.

This is not to say I think dd is overrated! Au contraire! It’s one of my favorite Unix tools!

dd is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd! Want to open a file with O_SYNC? Use dd! Want to read groups of three byte pixels from a PPM file? Use dd!

It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to and inappropriately hailed for its most generic and least interesting capability: simply copying a file from start to finish.

* dd actually has two jobs: Convert and Copy. Legend has it that the intended name, “cc”, was taken by the C compiler, so the letters were shifted by one to give “dd”. This is also why we ended up with a Window system called X.

Basic Linux-related things, Linux

I’m not paranoid, you’re just foolish

January 24th, 2015

Remember this dialog from when you installed your distro?

Fake dialog saying "In the event of physical theft, grant perpetrators access to" with options for "My browsing history, My email and social media, My photos and documents, and similar". All boxes are checked by default.

Most distros have a step like this. If you don’t immediately recognize it, you might have used a different installer with different wording. For example, the graphical Ubuntu installer calls it “Encrypt the new Ubuntu installation for security”, while the text installer even more opaquely calls it “Use entire disk and set up encrypted LVM”.

Somehow, some people have gotten it into their heads that not granting the new owner access to all your data after they steal your computer is a sign of paranoia. It’s 2015, and there actually exists people who have information based jobs and spend half their lives online, who not only think disk encryption is unnecessary but that it’s a sign you’re doing something illegal.

I have no idea what kind of poorly written crime dramas or irrational prime ministers they get this ridiculous notion from.

An office desk covered in broken glass after a break-in.

The last time my laptop was stolen from a locked office building.

Here’s a photo from 2012, when my company laptop was taken from a locked office with an alarm system.

Was this the inevitable FBI raid I was expecting and encrypted my drive to thwart?

Or was it a junkie stealing an office computer from a company and user who, thanks to encryption, didn’t have to worry about online accounts, design documents, or the source code for their unreleased product?

Hundreds of thousands of computers are lost or stolen every year. I’m not paranoid for using disk encryption, you’re just foolish if you don’t.

Basic Linux-related things, Security ,

Why Bash Is Like That: Rewrite hacks

June 21st, 2014

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

Let’s say you wanted to enforce a policy in which no files on the system could contain swearing. How would you write a script that checks it? Let’s use the word “damn”, and let’s write a script “checklanguage” that checks whether a file contains that word.

Our first version might be:

#!/usr/bin/env bash
grep -q "damn" "$@" 

The problem with this is that it triggers on itself: ./checklanguage checklanguage returns true. How can we write the script in such a way that it reliably detects the word, but doesn’t detect itself? (Think about it for a second).

There are many ways of doing this: a="da"; b="mn"; grep "$a$b", grep "da""mn", grep da\mn. All of these check for the four characters d-a-m-n in sequence, but doesn’t contain the sequence itself. These methods rely on two things being A. identical in one context (shell script) and B. different in another (plaintext).

This type of trick is the basis of three common command line hacks:

Finding processes from ps, while excluding the grep that does the filtering.

If we do a simple ps ax | grep processname, we might get output like this:

$ ps ax | grep processname
13003 pts/2    S      0:00 /bin/bash ./processname
13496 pts/4    R+     0:00 grep --color=auto processname

How do we get the same list, but without the grep process? You’ll see people wrapping the first character in square brackets:

$ ps ax | grep "[p]rocessname"
13003 pts/2    S      0:00 /bin/bash ./processname

In this case, the regex “[p]rocessname” is identical to the regex “processname”, but since they’re written differently, the latter matches itself while the former doesn’t. This means that the grep won’t match itself, and we only get the process we’re interested in (this job is better done by pgrep).

There is no syntax rule that says “if the first character is enclosed in square brackets, grep shall ignore itself in ps output”.

It’s just a logical side effect of rewriting the regex to work the same but not match itself. We could have used grep -E 'process()name' or grep -E 'proces{2}name' instead.

Running commands instead of aliases

Maybe you’re sick of Debian’s weird perl rename, and you aliased it to rename.ul instead.

$ rename -v .htm .html *
`foo.htm' -> `foo.html'

Yay, that’s way easier than writing regex! But what if we need to use the unaliased rename?

$ rename -v 's/([1-9])x([0-9]*)/S$1E$2/' *
rename.ul: not enough arguments

Instead, you’ll see people prefixing the command with a backslash:

$ \rename -v 's/([1-9])x([0-9]*)/S0$1E$2/' *
Foo_1x20.mkv renamed as Foo_S01E20.mkv

Shell aliases trigger when a command starts with a word. However, if the command starts with something that expands into a word, alias expansion does not apply. This allows us to use e.g. \ls or \git to run the command instead of the alias.

There is no syntax rule that says that “if a command is preceded by a backslash, alias expansion is ignored”.

It’s just a logical side effect of rewriting the command to work the same, but not start with a literal token that the shell will recognize as an alias. We could also have used l\s or 'ls'.

Deleting files starting with a dash

How would you go about deleting a file that starts with a dash?

$ rm -v -file
rm: invalid option -- 'l'

Instead, you’ll see people prefixing the filename with ./:

$ rm -v ./-file
removed `./-file'

A command will interpret anything that starts with a dash as a flag. However, to the file system, -file and ./-file mean exactly the same thing.

There is no syntax rule that says that “if an argument starts with ./, it shall be interpretted as a filename and not an option”.

It’s just a logical side effect of rewriting a filename to refer to the same file, but start with a different character. We could have used rm /home/me/-file or rm ../me/-file instead.

Homework: What do you tell someone who thinks that ./myscript is a perfect example of how weird UNIX is? Why would anyone design a system where the run command is “./” instead of “run”?

Basic Linux-related things, Linux , ,

Paste shell script, get feedback: ShellCheck project update

June 30th, 2013

tl;dr: ShellCheck is a bash/sh static analysis and linting tool. Paste a shell command or script on and get feedback about many common issues, both in scripts that currently fail and scripts that appear to work just fine.

There’s been a lot of progress since I first posted about it seven months ago. It has a new home on with a simplified and improved interface, and the parser has been significantly bugfixed so that parsing errors for correct code are now fairly rare.

However, the best thing is that it can detect a heaping bunch of new problems! This post mentions merely a subset of them.


Quiz: ShellCheck is aware of many common usage problems. Are you?

  • find . -name *.mp3
  • sudo echo 3 > /proc/sys/vm/drop_caches
  • PS1='\e[0;32m\$\e[0m '
  • find . | grep "*.mp3"
  • [ $n > 7 ]
  • [[ $n > 7 ]]
  • tr 'A-Z' 'a-z'
  • cmd 2>&1 > log
  • array=(1, 2, 3)
  • echo $10
  • [[ $a=$b ]]
  • [[ $a = $b ]]
  • progress=$((i/total*100))
  • trap "echo \"Time used: $SECONDS\"" EXIT
  • find dir -exec cp {} /backup && rm {} \;
  • [[ $keep = [yY] ]] && mv file /backup || rm file

ShellCheck gives more helpful messages for many Bash syntax errors

Bash says ShellCheck points to the exact position and says
: command not found Literal carriage return. Run script through tr -d ‘\r’
unexpected token: `fi’ Can’t have empty then clauses (use ‘true’ as a no-op)
unexpected token `(‘ Shells are space sensitive. Use ‘< <(cmd)', not '<<(cmd)'
unexpected token `(‘ ‘(‘ is invalid here. Did you forget to escape it?
echo foo: command not found This is a &nbsp;. Delete it and retype as space.

ShellCheck suggests style improvements

Code ShellCheck suggestion
basename "$var" Use parameter expansion instead, such as ${var##*/}
ls | grep 'mp3$' Don’t use ls | grep. Use a glob or a for loop with a condition.
expr 3 + 2 Use $((..)), ${} or [[ ]] in place of antiquated expr.
cat foo | grep bar Useless cat. Consider ‘cmd < file | ..' or 'cmd file | ..' instead.
length=$(echo "$var" | wc -c") See if you can use ${#variable} instead

ShellCheck recognizes common but wrong attempts at doing things

Code ShellCheck tip
var$n=42 For indirection, use (associative) arrays or ‘read “var$n” <<< "value"'".
(Bash says “var3=42: command not found”)
${var$n} To expand via indirection, use name=”foo$n”; echo “${!name}”
(Bash says “bad substitution”. )
echo 'It\'s time' Are you trying to escape that single quote? echo ‘You’\”re doing it wrong’
(Bash says “unexpected end of file”)
[ grep a b ] Use ‘if cmd; then ..’ to check exit code, or ‘if [[ $(cmd) == .. ]]’ to check output
(Bash says “[: a: binary operator expected”)
var=grep a b To assign the output of a command, use var=$(cmd)
(Bash says “a: command not found”)

ShellCheck can help with POSIX sh compliance and bashisms

When a script is declared with #!/bin/sh, ShellCheck checks for POSIX sh compliance, much like checkbashisms.

ShellCheck is free software, and can be used online and locally

ShellCheck is of course Free Software, and has a cute cli frontend in addition to the primary online version.

ShellCheck wants your feedback and suggestions!
Does ShellCheck give you incorrect suggestions? Does it fail to parse your working code? Is there something it could have warned about, but didn’t? After pasting a script on, a tiny “submit feedback” link appears in the top right of the annotated script area. Click it to submit the code plus your comments, and I can take a look!

Basic Linux-related things, Linux , ,

Making bash run DOS/Windows CRLF EOL scripts

January 30th, 2013

If you for any reason use a Windows editor to write scripts, it can be annoying to remember to convert them and bash fails in mysterious ways when you don’t. Let’s just get rid of that problem once and for all:

cat > $'/bin/bash\r' << "EOF" #!/usr/bin/env bash script=$1 shift exec bash <(tr -d '\r' < "$script") "$@" EOF

This allows you to execute scripts with DOS/Windows \r\n line endings with ./yourscript (but it will fail if the script specifies parameters on the shebang, or if you run it with bash yourscript). It works because from a UNIX point of view, DOS/Windows files specify the interpretter as "bash^M", and we override that to clean the script and run bash on the result.

Of course, you can also replace the helpful exec bash part with echo "Run dos2unix on your file!" >&2 if you'd rather give your users a helpful reminder rather than compatibility or a crazy error.

Basic Linux-related things, Linux , ,

ShellCheck: shell script analysis

December 8th, 2012

Shell scripting is notoriously full of pitfalls, unintuitive behavior and poor error messages. Here are some things you might have experienced:

  • find -exec fails on commands that are perfectly valid
  • 0==1 is apparently true
  • Comparisons are always false, and write files while failing
  • Variable values are available inside loops, but reset afterwards
  • Looping over filenames with spaces fails, and quoting doesn’t help


ShellCheck is my latest project. It will check shell scripts for all of the above, and also tries to give helpful tips and suggestions for otherwise working ones. You can paste your script and have it checked it online, or you can downloaded it and run it locally.

Other things it checks for includes reading from and redirecting to a file in the same pipeline, useless uses of cat, apparent variable use that won’t expand, too much or too little quoting in [[ ]], not quoting globs passed to find, and instead of just saying “syntax error near unexpected token `fi'”, it points to the relevant if statement and suggests that you might be missing a ‘then’.

It’s still in the early stages, but has now reached the point where it can be useful. The online version has a feedback button (in the top right of your annotated script), so feel free to try it out and submit suggestions!

Basic Linux-related things, Linux, Programming , ,

Approaches to data recovery

September 8th, 2012

There are a lot of howtos and tutorials for using data recovery tools in Linux, but far less on how to choose a recovery tool or approach in the first place. Here’s an overview with suggestions for which route to go or tool to use:

Cause Outlook Tools
Forgotten login password Fantastic Any livecd
This barely qualifies as data recovery, but is included for completeness. If you forget the login password, you can just boot a livecd and mount the drive to access the files. You can also chroot into it and reset the password. Google “linux forgot password”.
Accidentally deleting files in use Excellent lsof, cp
When accidentally deleting a file that is still in use by some process — like an active log file or the source of a video you’re encoding — make sure the process doesn’t exit (sigstop if necessary) and copy the file from the /proc file handle. Google “lsof recover deleted files”
Accidentally deleting other files Fair for harddisks, bad for SSDs testdisk, ext3grep, extundelete
When deleting a file that’s not currently being held open, stop as much disk activity as you can to prevent the data from being overwritten. If you’re using an SSD, the data was probably irrevocably cleared within seconds, so bad luck there. Proceed with an fs specific undeletion tool: Testdisk can undelete NTFS, VFAT and ext2, extundelete/ext3grep can help with ext3 and ext4. Google “YourFS undeletion”. If you can’t find an undeletion tool for your file systems, or if it fails, try PhotoRec.
Trashing the MBR or deleting partitions Excellent gpart (note: not gparted), testdisk
If you delete a partition with fdisk or recover the MBR from a backup while forgetting that it also contains a partition table, gpart or testdisk will usually easily recover them. If you overwrite any more than the first couple of kilobytes though, it’s a different ballgame. Just don’t confuse gpart (guess partitions) with gparted (gtk/graphical partition editor). Google “recover partition table”.
Reformatting a file system Depends on fs e2fsck, photorec, testdisk
If you format the wrong partition, recovery depends on the old and new file system. Try finding unformat/recovery tools for your old fs. Accidentally formatting a ext3 fs to ntfs (like Windows helpfully suggests when it detects a Linux fs) can often be almost completely reverted by running fsck with an alternate superblock. Google “ext3 alternate superblock recovery” or somesuch.

Reformatting ext2/3/4 with ext2/3/4 will tend to overwrite the superblocks, making this harder. Consider PhotoRec.

Repartition and reinstall Depends on progress
If you ran a distro installer and accidentally repartitioned and reformatted a disk, try treating it as a case of deleted partitions plus reformatted partitions as described above. Chances of recovery are smaller the more files the installer copied to the partitions. If all else fails, PhotoRec.
Bad sectors and drive errors Ok, depending on extent ddrescue
If the drive has errors, use ddrescue to get as much of the data as possible onto another drive, then treat it as a corrupted file system. Try the fs’ fsck tool, or if the drive is highly corrupted, PhotoRec.
Lost encryption key Very bad bash, cryptsetup
I don’t know of any tools made for attempting to crack a LUKS password, though you can generate permutations and script a simple cracker if you have limited number of permutations (“it was Swordfish with some l33t, and a few numbers at the end”). If you have no idea, or if your encryption software uses TPM (rare for Linux), you’re screwed.
Reformatted or partially overwritten LUKS partition Horrible
LUKS uses your passphrase to encrypt a master key, and stores this info at the start of the partition. If this gets overwritten, you’re screwed even if you know the passphrase.
Other kinds of corruptions or unknown FS Indeterminable PhotoRec, strings, grep
PhotoRec searches by file signature, and can therefore recover files from a boatload of FS and scenarios, though you’ll often lose filenames and hierarchies. If you have important ASCII data, strings can dump ASCII text regardless of FS, and you can grep that as a last resort.

If you have other suggestions for scenarios, tools or approaches, leave a commment. Otherwise, I’ll wish you a speedy recovery!

Basic Linux-related things, Linux , ,

Why Bash is like that: Subshells

August 22nd, 2012

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# I run this script, but afterwards my PATH and current dir hasn't changed!

export PATH=$PATH:/opt/local/bin
cd /opt/games/

or more interestingly

# Why does this always say 0? 
cat file | while read line; do (( n++ )); done
echo $n

In the first case, you can add a echo "Path is now $PATH", and see the expected path. In the latter case, you can put a echo $n in the loop, and it will count up as you’d expect, but at the end you’ll still be left with 0.

To make things even more interesting, here are the effects of running these two examples (or equivalents) in different shells:

set in script set in pipeline
Bash No effect No effect
Ksh/Zsh No effect Works
cmd.exe Works No effect

What we’re experiencing are subshells, and different shells have different policies on what runs in subshells.

Environment variables, as well as the current directory, is only inherited parent-to-child. Changes to a child’s environment are not reflect in the parent. Any time a shell forks, changes done in the forked process are confined to that process and its children.

In Unix, all normal shells will fork to execute other shell scripts, so setting PATH or cd’ing in a script will never have an effect after the command is done (instead, use "source file" aka ". file" to read and execute the commands without forking).

However, shells can differ in when subshells are invoked. In Bash, all elements in a pipeline will run in a subshell. In Ksh and Zsh, all except the last will run in a subshell. POSIX leaves it undefined.

This means that echo "2 + 3" | bc | read sum will work in Ksh and Zsh, but fail to set the variable sum in Bash.

To work around this in Bash, you can usually use redirection and process substition instead:

read sum < <(echo "2 + 3" | bc)

So, where do we find subshells? Here are a list of commands that in some way fails to set foo=bar for subsequent commands (note that all the examples set it in some subshell, and can use it until the subshell ends):

# Executing other programs or scripts
foo=bar ./something

# Anywhere in a pipeline in Bash
true | foo=bar | true

# In any command that executes new shells
awk '{ system("foo=bar") }'h
find . -exec bash -c 'foo=bar' \;

# In backgrounded commands and coprocs:
foo=bar &
coproc foo=bar

# In command expansion
true "$(foo=bar)"

# In process substitution
true < <(foo=bar)

# In commands explicitly subshelled with ()
( foo=bar )

and probably some more that I'm forgetting.

Trying to set a variable, option or working dir in any of these contexts will result in the changes not being visible for following commands.

Knowing this, we can use it to our advantage:

# cd to each dir and run make
for dir in */; do ( cd "$dir" && make ); done

# Compare to the more fragile
for dir in */; do cd "$dir"; make; cd ..; done

# mess with important variables
fields=(a b c); ( IFS=':'; echo ${fields[*]})

# Compare to the cumbersome
fields=(a b c); oldIFS=$IFS; IFS=':'; echo ${fields[*]}; IFS=$oldIFS; 

# Limit scope of options
( set -e; foo; bar; baz; ) 

Basic Linux-related things, Linux , ,

Why Bash is like that: Builtin or not

July 19th, 2011

Bash can seem pretty random and weird at times, but most of what people see as quirks have very logical (if not very good) explanations behind them. This series of posts looks at some of them.

# Why don't the options in "man time" work?
time -f %w myapp

Short answer: ‘time’ runs the builtin version, ‘man time’ shows the external version

time is a builtin in the shell, as well as an external command (this also goes for kill, pwd, and test). The man time shows info about the external command, while help time shows the internal one.

To run the external version, one can use command time or /usr/bin/time or just \time.

The reason why time is built in is so timing pipelines will work properly. time true | sleep 10 would say 0 seconds with an external command (which can’t know what it’s being piped into), and while the internal version can say 10 seconds since it knows about the whole pipeline.

POSIX leaves the behaviour of time a | b undefined.

# This finds the full path to ls. Why isn't there a 'man type'?
type -P ls

type is a bash builtin, not an external command. This allows it to take shell functions and aliases into account, something whereis can’t.

Builtins are documented in man bash, or more conveniently, “help type” (help is also a builtin).

Basic Linux-related things , ,