{"id":662,"date":"2017-09-17T22:38:23","date_gmt":"2017-09-17T22:38:23","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=662"},"modified":"2017-09-17T22:38:23","modified_gmt":"2017-09-17T22:38:23","slug":"shellcheck-and-shadowed-case-branches","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=662","title":{"rendered":"ShellCheck and shadowed case branches"},"content":{"rendered":"<p>As of the <a href=\"https:\/\/github.com\/koalaman\/shellcheck\/commit\/74c199b51ac28a6019716c7f18bb3955507fef4e\">the latest<\/a> commit, <a href=\"http:\/\/www.shellcheck.net\">ShellCheck<\/a> will try to detect shadowed case branches.<\/p>\n<p>Here&#8217;s an adaptation from an unnamed script on GitHub: <\/p>\n<pre>\r\ncase $1 in\r\n    -h|--help)\r\n        help\r\n        exit 0\r\n        ;;\r\n    -h|--hub)\r\n        hub=$2\r\n        shift\r\n        ;;\r\n    *)\r\n        die \"Unknown option: $1\"\r\n        ;;\r\nesac\r\n<\/pre>\n<p>The original <code>case<\/code> statement was significantly longer, so you&#8217;d be excused for not noticing the problem: <code>-h<\/code> is used for two different branches. Because of this, <code>-h<\/code> as a short option for <code>--hub<\/code> will not work. <\/p>\n<p>If you <a href=\"http:\/\/www.shellcheck.net\/?id=shadowcase\">run ShellCheck<\/a> on this example now, you will get a pair of helpful warnings:<\/p>\n<pre>\r\nLine 4:\r\n    -h|--help)\r\n    ^-- SC2221: This pattern always overrides a later one.\r\n \r\nLine 8:\r\n    -h|--hub)\r\n    ^-- SC2222: This pattern never matches because of a previous pattern.\r\n<\/pre>\n<p>Very simple and probably somewhat useful in certain cases, right? Well, it gets slightly more interesting.<\/p>\n<p>Here is another example adapted from the wild:<\/p>\n<pre>\r\ncase $1 in\r\n    -h|--help|-?)\r\n        usage\r\n        exit\r\n        ;;\r\n    -v|--verbose)\r\n        verbose=1\r\n        ;;\r\n    *)\r\n        die \"Unknown option: $1\"\r\n        ;;\r\nesac\r\n<\/pre>\n<p>Did you spot the same problem? ShellCheck did:<\/p>\n<pre>\r\nLine 4:\r\n    -h|--help|-?)\r\n              ^-- SC2221: This pattern always overrides a later one.\r\n<\/pre>\n<p>Since an unescaped <code>?<\/code> matches any character, it will match also match <code>-v<\/code>, so the short form of <code>--verbose<\/code> will not work.<\/p>\n<p>Similarly, it recognizes two separate issues in this example:<\/p>\n<pre>\r\n    -*|--*) die \"Invalid option: $1\" ;;\r\n    --) shift; break ;;\r\n<\/pre>\n<p>The end-of-option <code>--<\/code> marker will never be recognized, and <code>-*|--*<\/code> is redundant because the first already covers the second.<\/p>\n<p>These are all very simple cases, but this also works more generally. Here&#8217;s a fabricated music sorting script where the bug would be exceedingly hard to spot in a longer list of bands:<\/p>\n<pre>\r\ncase \"${filename,,}\" in\r\n  *\"abba\"*.mp3 ) rm \"$filename\" ;;\r\n  *\"black\"*\"sabbath\"*.mp3 ) mv \"$filename\" \"Music\/Metal\" ;;\r\nesac\r\n<\/pre>\n<p>So how does it work?<\/p>\n<p>There are very clever ways of determining whether one regular language is a superset of another by intersecting it with the complement of the other, and checking the result for satisfiability. <\/p>\n<p>ShellCheck uses none of them.<\/p>\n<p>I&#8217;ve written a <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=36\">regex inverter<\/a> before, and that level of complexity was not something I wanted to introduce.<\/p>\n<p>Instead, ShellCheck&#8217;s pattern intersection and superset supports only basic DOS style wildcard patterns: <code>?<\/code>, <code>*<\/code> and literals. It just does a simple recursive match on the two patterns.<\/p>\n<p>Let&#8217;s call the patterns A and B, and we wish to check if A is a superset of B, i.e. if A matches everything that B does. <\/p>\n<p>We have two arbitrary shell patterns that we want to turn into a simplified form, while ensuring we don&#8217;t simplify away any details that will cause a false positive. ShellCheck does this in two ways:<\/p>\n<p>It creates A in such a way that it&#8217;s guaranteed to match <a href=\"https:\/\/github.com\/koalaman\/shellcheck\/blob\/74c199b51ac28a6019716c7f18bb3955507fef4e\/ShellCheck\/ASTLib.hs#L376\">a (non-strict) subset<\/a> of the actual glob. This just means giving up on any pattern that uses features we don&#8217;t explicitly recognize. <code>$(cmd)foo@(ab|c)<\/code> is rejected, while <code>*foo*<\/code> is allowed.<\/p>\n<p>It then creates B to guarantee that it matches <a href=\"https:\/\/github.com\/koalaman\/shellcheck\/blob\/74c199b51ac28a6019716c7f18bb3955507fef4e\/ShellCheck\/ASTLib.hs#L354\">a (non-strict) superset<\/a> of the actual glob. This is done by replacing anything we don&#8217;t support with a <code>*<\/code>. <code>$(cmd)foo@(ab|c)<\/code> just becomes <code>*foo*<\/code>.<\/p>\n<p>Now we can just <a href=\"https:\/\/github.com\/koalaman\/shellcheck\/blob\/74c199b51ac28a6019716c7f18bb3955507fef4e\/ShellCheck\/ASTLib.hs#L419\">match the two<\/a> patterns against each other with an inefficient but simple recursive matcher. Matching two patterns is slightly trickier than matching a pattern against a string, but it&#8217;s still a first year level CS exercise.<\/p>\n<p>It just involves breaking down the patterns by prefix, and matching until you reach a trivial base case:<\/p>\n<ul>\n<li>superset(&#8220;&#8221;, &#8220;&#8221;) = True<\/li>\n<li>superset(&#8220;&#8221;, cY) = False<\/li>\n<li>superset(cX, cY) = superset(X, Y)<\/li>\n<li>superset(*X, *Y) = superset(*X, Y)<\/li>\n<li>&#8230;<\/li>\n<\/ul>\n<p>The <a href=\"https:\/\/github.com\/koalaman\/shellcheck\/blob\/74c199b51ac28a6019716c7f18bb3955507fef4e\/ShellCheck\/ASTLib.hs#L419\">actual code<\/a> calls the simplified patterns &#8220;PseudoGlobs&#8221;, inhabited by PGAny ?, PGMany *, and PGChar c:<\/p>\n<pre>\r\npseudoGlobIsSuperSetof :: [PseudoGlob] -> [PseudoGlob] -> Bool\r\npseudoGlobIsSuperSetof = matchable\r\n  where\r\n    matchable x@(xf:xs) y@(yf:ys) =\r\n        case (xf, yf) of\r\n            (PGMany, PGMany) -> matchable x ys\r\n            (PGMany, _) -> matchable x ys || matchable xs y\r\n            (_, PGMany) -> False\r\n            (PGAny, _) -> matchable xs ys\r\n            (_, PGAny) -> False\r\n            (_, _) -> xf == yf && matchable xs ys\r\n\r\n    matchable [] [] = True\r\n    matchable (PGMany : rest) [] = matchable rest []\r\n    matchable _ _ = False\r\n<\/pre>\n<p>That&#8217;s really all there is to it. ShellCheck just goes through each pattern, and flags the first pattern (if any) that it shadows. There&#8217;s also a pattern simplifier which rearranges <code>c*?*?****d<\/code> into <code>c??*d<\/code> to add some efficiency to obviously diseased patterns.<\/p>\n<p>Future work could include supporting character sets\/ranges since <code>[yY]<\/code> is at least occasionally used, but it&#8217;s rare to find any extglob to warrant full regex support.<\/p>\n<p>Of course, 99% of the time, there are no duplicates. 99.9% of the time, you&#8217;d get the same result with simple string matches.<\/p>\n<p>However, that 0.1% of cases where you get delightful insights like <code>-?<\/code> shadowing <code>-v<\/code> or <code>Linux-3.1*<\/code> shadowing <code>Linux-3.12*<\/code> makes it all worthwhile. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>As of the the latest commit, ShellCheck will try to detect shadowed case branches. Here&#8217;s an adaptation from an unnamed script on GitHub: case $1 in -h|&#8211;help) help exit 0 ;; -h|&#8211;hub) hub=$2 shift ;; *) die &#8220;Unknown option: $1&#8221; ;; esac The original case statement was significantly longer, so you&#8217;d be excused for not &hellip; <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=662\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;ShellCheck and shadowed case branches&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[23],"tags":[46],"class_list":["post-662","post","type-post","status-publish","format-standard","hentry","category-programming","tag-shellcheck"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/662","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=662"}],"version-history":[{"count":25,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/662\/revisions"}],"predecessor-version":[{"id":689,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/662\/revisions\/689"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=662"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=662"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=662"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}