{"id":746,"date":"2019-03-27T03:07:33","date_gmt":"2019-03-27T03:07:33","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=746"},"modified":"2019-03-27T03:12:05","modified_gmt":"2019-03-27T03:12:05","slug":"a-shell-script-that-deleted-a-database-and-how-shellcheck-could-have-helped","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=746","title":{"rendered":"A shell script that deleted a database, and how ShellCheck could have helped"},"content":{"rendered":"<p>Summary: We examine a real world case of how an innocent shell scripting mistake caused the deletion of a production database, and how <a href=\"https:\/\/www.shellcheck.net\">ShellCheck<\/a> (a GPLv3 shell script linting and analysis tool) would have pointed out the errors and prevented the disaster.<\/p>\n<p>Disclosure: I am the ShellCheck author.<\/p>\n<h3 id=\"the-event\">The event<\/h3>\n<p>Here is the sad case, taken from a recent StackOverflow post:<\/p>\n<blockquote><p>My developer committed a huge mistake and we cannot find our mongo database anyone in the server. Rescue please!!!<\/p>\n<p>He logged into the server, and saved the following shell under <code>~\/crontab\/mongod_back.sh<\/code>:<\/p>\n<pre><code>#!\/bin\/sh\nDUMP=mongodump\nOUT_DIR=\/data\/backup\/mongod\/tmp     \/\/ \u5907\u4efd\u6587\u4ef6\u4e34\u65f6\u76ee\u5f55\nTAR_DIR=\/data\/backup\/mongod         \/\/ \u5907\u4efd\u6587\u4ef6\u6b63\u5f0f\u76ee\u5f55\nDATE=`date +%Y_%m_%d_%H_%M_%S`      \/\/ \u5907\u4efd\u6587\u4ef6\u5c06\u4ee5\u5907\u4efd\u65f6\u95f4\u4fdd\u5b58\nDB_USER=Guitang                     \/\/ \u6570\u636e\u5e93\u64cd\u4f5c\u5458\nDB_PASS=qq____________              \/\/ \u6570\u636e\u5e93\u64cd\u4f5c\u5458\u5bc6\u7801\nDAYS=14                             \/\/ \u4fdd\u7559\u6700\u65b014\u592d\u7684\u5907\u4efd\nTAR_BAK=\"mongod_bak_$DATE.tar.gz\"   \/\/ \u5907\u4efd\u6587\u4ef6\u547d\u540d\u683c\u5f0f\ncd $OUT_DIR                         \/\/ \u521b\u5efa\u6587\u4ef6\u5939\nrm -rf $OUT_DIR\/*                   \/\/ \u6e05\u7a7a\u4e34\u65f6\u76ee\u5f55\nmkdir -p $OUT_DIR\/$DATE             \/\/ \u521b\u5efa\u672c\u6b21\u5907\u4efd\u6587\u4ef6\u5939\n$DUMP -d wecard -u $DB_USER -p $DB_PASS -o $OUT_DIR\/$DATE  \/\/ \u6267\u884c\u5907\u4efd\u547d\u4ee4\ntar -zcvf $TAR_DIR\/$TAR_BAK $OUT_DIR\/$DATE       \/\/ \u5c06\u5907\u4efd\u6587\u4ef6\u6253\u5305\u653e\u5165\u6b63\u5f0f\u76ee\nfind $TAR_DIR\/ -mtime +%DAYS -delete             \/\/ \u5220\u966414\u5929\u524d\u7684\u65e7\u5907\u6d32<\/code><\/pre>\n<p>And then he run <code>.\/mongod_back.sh<\/code>, then there were lots of permission denied, then he did Ctrl+C. Then the server shut down automatically.<\/p>\n<p>He then contacted AliCloud, the engineer connected the disk to another working server, so that he could check the disk. Then, he realized that some folders have gone, including <code>\/data\/<\/code> where the mongodb is!!!<\/p>\n<p>PS: he did not take snapshot of the disk before.<\/p><\/blockquote>\n<p>Essentially, it\u2019s every engineer\u2019s nightmare.<\/p>\n<p>The post-mortem of this issue is an interesting puzzle that requires only basic shell scripting knowledge. If you\u2019d like to give it a try, now\u2019s the time. If you\u2019d like some hints, here\u2019s <a href=\"https:\/\/www.shellcheck.net\/?id=rescueplease\">shellcheck\u2019s output<\/a> for the script.<\/p>\n<p>The rest of this post details about what happened, and how ShellCheck could have averted the disaster.<\/p>\n<h3 id=\"what-went-wrong\">What went wrong?<\/h3>\n<p>The <a href=\"https:\/\/stackoverflow.com\/help\/mcve\">MCVE<\/a> for how to ruin your week is this:<\/p>\n<pre><code>#!\/bin\/sh\nDIR=\/data\/tmp    \/\/ The directory to delete\nrm -rf $DIR\/*    \/\/ Now delete it<\/code><\/pre>\n<p>The fatal error here is that <code>\/\/<\/code> is not a comment in shell scripts. It\u2019s a path to the root directory, equivalent to <code>\/<\/code>.<\/p>\n<p>On some platforms, the <code>rm<\/code> line would have been fatal by itself, because it\u2019d boil down to <code>rm -rf \/<\/code> with a few other arguments. Implementation these days often don\u2019t allow this though. The disaster in question happened on Ubuntu, whose GNU <code>rm<\/code> would have refused:<\/p>\n<pre><code>$ rm -rf \/\/\nrm: it is dangerous to operate recursively on '\/\/' (same as '\/')\nrm: use --no-preserve-root to override this failsafe<\/code><\/pre>\n<p>This is where the assignment comes in.<\/p>\n<p>The shell treats variable assignments and commands as two sides of the same coin. Here\u2019s the description <a href=\"http:\/\/pubs.opengroup.org\/onlinepubs\/9699919799\/utilities\/V3_chap02.html#tag_18_09_01\">from POSIX<\/a>:<\/p>\n<blockquote><p>A \u201csimple command\u201d is a sequence of optional variable assignments and redirections, in any sequence, optionally followed by words and redirections, terminated by a control operator.<\/p><\/blockquote>\n<p>(A \u201csimple command\u201d is in contrast to a \u201ccompound\u201d command, which are structures like <code>if<\/code> statements and <code>for<\/code> loops that contain one or more simple or compound commands.)<\/p>\n<p>This means that <code>var=42<\/code> and <code>echo \"Hello\"<\/code> are both simple commands. The former has one optional assignment and zero optional words. The latter has zero optional assignments and two optional words.<\/p>\n<p>It also implies that a single simple command can contain both: <code>var=42 echo \"Hello\"<\/code><\/p>\n<p>To make a long spec short, assignments in a simple command will apply only to the invoked command name. If there is no command name, they apply to the current shell. This latter explains <code>var=42<\/code> by itself, but when would you use the former?<\/p>\n<p>It\u2019s useful when you want to set a variable for a single command without affecting your the rest of your shell:<\/p>\n<pre><code>$ echo \"$PAGER\"  # Show current pager\nless\n\n$ PAGER=\"head -n 5\" man ascii\nASCII(7)       Linux Programmer's Manual      ASCII(7)\n\nNAME\n       ascii  -  ASCII character set encoded in octal,\n       decimal, and hexadecimal\n\n$ echo \"$PAGER\"  # Current pager hasn't changed\nless<\/code><\/pre>\n<p>This is exactly what happened unintentionally in the fatal assignment. Just like how the previous example scoped <code>PAGER<\/code> to <code>man<\/code> only, this one scoped <code>DIR<\/code> to <code>\/\/<\/code>:<\/p>\n<pre><code>$ DIR=\/data\/tmp    \/\/ The directory to delete\nbash: \/\/: Is a directory\n\n$ echo \"$DIR\"  # The variable is unset\n(no output)<\/code><\/pre>\n<p>This meant that <code>rm -rf $DIR\/*<\/code> became <code>rm -rf \/*<\/code>, and therefore bypassed the check that was is in place for <code>rm -rf \/<\/code><\/p>\n<p>(Why can\u2019t or won\u2019t <code>rm<\/code> simply refuse to delete <code>\/*<\/code> too? Because it never sees <code>\/*<\/code>: the shell expands it first, so <code>rm<\/code> sees <code>\/bin \/boot \/dev \/data ...<\/code>. While <code>rm<\/code> could obviously refuse to remove first level directories as well, this starts getting in the way of legitimate usage \u2013 a big sin in the Unix philosophy)<\/p>\n<h3 id=\"how-shellcheck-could-have-helped\">How ShellCheck could have helped<\/h3>\n<p>Here\u2019s the output from this minimized snippet (<a href=\"https:\/\/www.shellcheck.net\/?id=rescuemcve\">see online<\/a>):<\/p>\n<pre><code>$ shellcheck myscript\n\nIn myscript line 2:\nDIR=\/data\/tmp    \/\/ The directory to delete\n                 ^-- SC1127: Was this intended as a comment? Use # in sh.\n\n\nIn myscript line 3:\nrm -rf $DIR\/*    \/\/ Now delete it\n       ^----^ SC2115: Use \"${var:?}\" to ensure this never expands to \/* .\n       ^--^ SC2086: Double quote to prevent globbing and word splitting.\n                 ^-- SC2114: Warning: deletes a system directory.<\/code><\/pre>\n<p>Two issues have already been discussed, and would have averted this disaster:<\/p>\n<ul>\n<li>ShellCheck noticed that the first <code>\/\/<\/code> was likely intended as a comment (wiki: <a href=\"https:\/\/www.shellcheck.net\/wiki\/SC1127\">SC1127<\/a>).<\/li>\n<li>ShellCheck pointed out that the second <code>\/\/<\/code> would target a system directory (wiki: <a href=\"https:\/\/www.shellcheck.net\/wiki\/SC2114\">SC2114<\/a>).<\/li>\n<\/ul>\n<p>The third is a general defensive technique which would also have prevented this catastrophic <code>rm<\/code> independently of the two other fixes:<\/p>\n<ul>\n<li>ShellCheck suggested using <code>rm -rf ${DIR:?}\/*<\/code> to abort execution if the variable for any reason is empty or unset (wiki: <a href=\"https:\/\/www.shellcheck.net\/wiki\/SC2115\">SC2115<\/a>).<\/li>\n<\/ul>\n<p>This would mitigate the effect of a whole slew of pitfalls that can leave a variable empty, including <code>echo \/tmp | read DIR<\/code> (subshells), <code>DIR= \/tmp<\/code> (bad spacing) and <code>DIR=$(echo \/tmp)<\/code> (potential fork\/command failures).<\/p>\n<h3 id=\"conclusion\">Conclusion<\/h3>\n<p>Shell scripts are really convenient, but also have a large number of potential pitfalls. Many issues that would be simple, fail-fast syntax errors in other languages would instead cause a script to misbehave in confusing, annoying, or catastrophic ways. Many examples can be found in the <a href=\"https:\/\/mywiki.wooledge.org\/BashPitfalls\">Wooledge Bash Pitfalls<\/a> list, or ShellCheck\u2019s own <a href=\"https:\/\/github.com\/koalaman\/shellcheck#gallery-of-bad-code\">gallery of bad code<\/a>.<\/p>\n<p>Since tooling exists, why not take advantage? Even if (or especially when!) you rarely write shell scripts, you can install <a href=\"https:\/\/www.shellcheck.net\"><code>shellcheck<\/code><\/a> from your package manager, along with a <a href=\"https:\/\/github.com\/koalaman\/shellcheck#how-to-use\">suitable editor plugin<\/a> like <a href=\"https:\/\/www.flycheck.org\/en\/latest\/\">Flycheck<\/a> (Emacs) or <a href=\"https:\/\/github.com\/vim-syntastic\/syntastic\">Syntastic<\/a> (Vim), and just forget about it.<\/p>\n<p>The next time you\u2019re writing a script, your editor will show warnings and suggestions automatically. Whether or not you want to fix the more pedantic style issues, it may be worth looking at any unexpected errors and warnings. It might just save your database.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: We examine a real world case of how an innocent shell scripting mistake caused the deletion of a production database, and how ShellCheck (a GPLv3 shell script linting and analysis tool) would have pointed out the errors and prevented the disaster. Disclosure: I am the ShellCheck author. The event Here is the sad case, &hellip; <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=746\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;A shell script that deleted a database, and how ShellCheck could have helped&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[6,4],"tags":[39],"class_list":["post-746","post","type-post","status-publish","format-standard","hentry","category-basic-linux","category-linux","tag-why-bash-is-like-that"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=746"}],"version-history":[{"count":18,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/746\/revisions"}],"predecessor-version":[{"id":764,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/746\/revisions\/764"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=746"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}