dd works for reading and writing disks, but it has no advantages and some disadvantages over just treating the disk as a regular file. In many commands it’s entirely pointless.
If you’ve ever used
dd, you’ve probably used it to read or write disk images:
# Write myfile.iso to a USB drive
dd if=myfile.iso of=/dev/sdb bs=1M
dd in this context is so pervasive that it’s being hailed as the magic gatekeeper of raw devices. Want to read from a raw device? Use
dd. Want to write to a raw device? Use
This belief adds complexity to simple commands. How do you combine
gzip? How do you use
pv if the source is raw device? How do you
People cleverly find ways to insert
dd at the front and end of pipelines.
dd if=/dev/sda | gzip > image.gz, they say.
dd if=/dev/sda | pv | dd of=/dev/sdb.
In both these cases,
dd serves no purpose. It’s purely a superstitious charm trying to ensure safe passage of the data. It’s like
cat /dev/sda | pv | cat > /dev/sdb except not as efficient.
The fact of the matter is,
dd is not a disk writing tool. Neither “d” is for “disk”, “drive” or “device”. It does not support “low level” reading or writing. It has no special dominion over any kind of device whatsoever.
dd just reads and writes file.
On UNIX, the adage goes, everything is a file. This includes raw disks. Since raw disks are files, and
dd can be used to copy files,
dd be used to copy raw disks.
But do you know what else can read and write files? Everything:
# Write myfile.iso to a USB drive
cp myfile.iso /dev/sdb
# Rip a cdrom to a .iso file
cat /dev/cdrom > myfile.iso
# Create a gzipped image
gzip -9 < /dev/sdb > /tmp/myimage.gz
dd can even end up doing a worse job. By specification, its default 512 block size has had to remain unchanged for decades. Today, this tiny size makes it CPU bound by default. A script that doesn’t specify a block size is very inefficient, and any script that picks the current optimal value may slowly become obsolete — or start obsolete if it’s copied from
cat is free to choose its buffer size that best serves a modern system, and the GNU cat buffer size has grown steadily over the years from 512 bytes in 1991 to 131072 bytes in 2014.
./src/ioblksize.h in the coreutils source code has benchmarks backing up this decision.
However, this does not mean that
dd should necessarily be categorically shunned! The reason why people started using it in the first place is that it does exactly what it’s told: no more and no less.
If an alias specifies
cp might try to create a new block device rather than a copy of the file data. If using
gzip without redirection, it may try to be helpful and skip the file for not being regular. Neither of them will write out a reassuring status during or after a copy.
dd, meanwhile, has one job*: copy data from one place to another. It doesn’t care about files, safeguards or user convenience. It will not try to second guess your intent, based on trailing slashes or types of files.
However, when this is no longer a convenience, like when combining it with other tools that already read and write files, one should not feel guilty for leaving
dd out entirely.
This is not to say I think
dd is overrated! Au contraire! It’s one of my favorite Unix tools!
dd is the swiss army knife of the open, read, write and seek syscalls. It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd! Want to open a file with O_SYNC? Use dd! Want to read groups of three byte pixels from a PPM file? Use dd!
It’s a flexible, unique and useful tool, and I love it. My only issue is that, far too often, this great tool is being relegated to and inappropriately hailed for its most generic and least interesting capability: simply copying a file from start to finish.
* dd actually has two jobs: Convert and Copy. A post on comp.unix.misc (incorrectly) claimed that the intended name “cc” was taken by the C compiler, so the letters were shifted in the same way we ended up with a Window system called X. A more likely explanation is given in that thread as pointed out by Paweł and Bruce in the comments: the name, syntax and purpose is almost identical to the JCL “Dataset Definition” command found in 1960s IBM mainframes.