As of the the latest commit, ShellCheck will try to detect shadowed case branches.
Here’s an adaptation from an unnamed script on GitHub:
case $1 in
-h|--help)
help
exit 0
;;
-h|--hub)
hub=$2
shift
;;
*)
die "Unknown option: $1"
;;
esac
The original case statement was significantly longer, so you’d be excused for not noticing the problem: -h is used for two different branches. Because of this, -h as a short option for --hub will not work.
If you run ShellCheck on this example now, you will get a pair of helpful warnings:
Line 4:
-h|--help)
^-- SC2221: This pattern always overrides a later one.
Line 8:
-h|--hub)
^-- SC2222: This pattern never matches because of a previous pattern.
Very simple and probably somewhat useful in certain cases, right? Well, it gets slightly more interesting.
Here is another example adapted from the wild:
case $1 in
-h|--help|-?)
usage
exit
;;
-v|--verbose)
verbose=1
;;
*)
die "Unknown option: $1"
;;
esac
Did you spot the same problem? ShellCheck did:
Line 4:
-h|--help|-?)
^-- SC2221: This pattern always overrides a later one.
Since an unescaped ? matches any character, it will match also match -v, so the short form of --verbose will not work.
Similarly, it recognizes two separate issues in this example:
-*|--*) die "Invalid option: $1" ;;
--) shift; break ;;
The end-of-option -- marker will never be recognized, and -*|--* is redundant because the first already covers the second.
These are all very simple cases, but this also works more generally. Here’s a fabricated music sorting script where the bug would be exceedingly hard to spot in a longer list of bands:
case "${filename,,}" in
*"abba"*.mp3 ) rm "$filename" ;;
*"black"*"sabbath"*.mp3 ) mv "$filename" "Music/Metal" ;;
esac
So how does it work?
There are very clever ways of determining whether one regular language is a superset of another by intersecting it with the complement of the other, and checking the result for satisfiability.
ShellCheck uses none of them.
I’ve written a regex inverter before, and that level of complexity was not something I wanted to introduce.
Instead, ShellCheck’s pattern intersection and superset supports only basic DOS style wildcard patterns: ?, * and literals. It just does a simple recursive match on the two patterns.
Let’s call the patterns A and B, and we wish to check if A is a superset of B, i.e. if A matches everything that B does.
We have two arbitrary shell patterns that we want to turn into a simplified form, while ensuring we don’t simplify away any details that will cause a false positive. ShellCheck does this in two ways:
It creates A in such a way that it’s guaranteed to match a (non-strict) subset of the actual glob. This just means giving up on any pattern that uses features we don’t explicitly recognize. $(cmd)foo@(ab|c) is rejected, while *foo* is allowed.
It then creates B to guarantee that it matches a (non-strict) superset of the actual glob. This is done by replacing anything we don’t support with a *. $(cmd)foo@(ab|c) just becomes *foo*.
Now we can just match the two patterns against each other with an inefficient but simple recursive matcher. Matching two patterns is slightly trickier than matching a pattern against a string, but it’s still a first year level CS exercise.
It just involves breaking down the patterns by prefix, and matching until you reach a trivial base case:
- superset(“”, “”) = True
- superset(“”, cY) = False
- superset(cX, cY) = superset(X, Y)
- superset(*X, *Y) = superset(*X, Y)
- …
The actual code calls the simplified patterns “PseudoGlobs”, inhabited by PGAny ?, PGMany *, and PGChar c:
pseudoGlobIsSuperSetof :: [PseudoGlob] -> [PseudoGlob] -> Bool
pseudoGlobIsSuperSetof = matchable
where
matchable x@(xf:xs) y@(yf:ys) =
case (xf, yf) of
(PGMany, PGMany) -> matchable x ys
(PGMany, _) -> matchable x ys || matchable xs y
(_, PGMany) -> False
(PGAny, _) -> matchable xs ys
(_, PGAny) -> False
(_, _) -> xf == yf && matchable xs ys
matchable [] [] = True
matchable (PGMany : rest) [] = matchable rest []
matchable _ _ = False
That’s really all there is to it. ShellCheck just goes through each pattern, and flags the first pattern (if any) that it shadows. There’s also a pattern simplifier which rearranges c*?*?****d into c??*d to add some efficiency to obviously diseased patterns.
Future work could include supporting character sets/ranges since [yY] is at least occasionally used, but it’s rare to find any extglob to warrant full regex support.
Of course, 99% of the time, there are no duplicates. 99.9% of the time, you’d get the same result with simple string matches.
However, that 0.1% of cases where you get delightful insights like -? shadowing -v or Linux-3.1* shadowing Linux-3.12* makes it all worthwhile.