Pattern matching with Bash (not grep)

Pattern matching, either on file names or variable contents, is something Bash can do faster and more accurately by itself than with grep. This post tersely describes some cases where bash’s own pattern matching can help, by being faster, easier or better.

Simple substring search on variables

# Check if a variable contains 'foo'. Just to warm up.

# Works
if echo "$var" | grep -q foo
if [[ "$(echo $var | grep foo))" == "" ]]

# Easier and faster 
if [[ $var == *foo* ]] 

The latter runs several hundred times faster by saving two forks (good to know when looping), and the code is cleaner and clearer.

Mixed pattern/fixed string search on variables

This is a less common but more interesting case.

#Check if /usr/bin overrides our install dir

# Mostly works (Can fail if $installdir contains 
# regex characters like . * [ ] etc)
if echo "$PATH" | grep -q "/usr/bin:.*:$installdir"

# Quoted strings will not be interpreted as globs
if [[ $PATH == */usr/bin:*:"$installdir" ]] 

We want parts of our input to be interpreted as regex, and parts to be literal, so neither grep nor fgrep entirely fits. Manually trying to escape regex chars is icky at best. We end up chancing that people won’t use the script anywhere near weirdly named files (like, in their torrent directory). With globs, bash doesn’t have to reparse the variable contents as part of the pattern, and just knows to take quoted strings literally.

Of course, you see how both the above fails to account for cases like /usr/bin:$installdir. This is not something you can easily express in traditional globs, but bash does regex too, and the semantics of quotes remain the same (since 3.2 or so at least):

# Quoted strings will not be interpreted as regex either
if [[ $PATH =~ (^|.*:)/usr/bin(:|:.*:)"$dir"(:.*|$) ]]

Matching file names

I’ll skip the trivial examples for things like `ls | grep .avi$`. Here is a case where traditional globs don’t cut it:

# Copy non-BBC .avi files, and fail on half a dozen different cases
cp $(ls *.avi | grep -v BBC) /stuff

Bash has another form of regular expressions, extglobs (enable with shopt -s extglob). These are mathematically regular, but don’t follow the typical unix regex syntax:

 
# Copy non-BBC .avi files without making a mess 
# when files have spaces or other weird characters
cp !(*BBC*).avi /stuff

man bash contains enough on extglob, so I’d just like to point out one thing. grep -v foo can be replaced by !(foo), which strives to reject “foo” (unlike [^f][^o][^o] and similar attempts which strive to accept). egrep "foo|bar" can be replaced by @(foo|bar) to match one of the patterns. But how about grep foo | grep bar to match both?

That’s our old friend De Morgan: !(@(!(foo)|!(bar))). Don’t you just love that guy?

PS: If you don’t already use parameter expansion to do simple trimming and replacement on variables, now could be a good time to look up that and probably save yourself a lot of sed too.

4 thoughts on “Pattern matching with Bash (not grep)”

  1. Pingback: Alex Gordon
  2. Pingback: Kylie Batt
  3. For the De Morgan’s law example, did you mean `!(@(!(*foo*)|!(*bar*)))` (with asterisks added)?

Leave a Reply to Robin A. Meade Cancel reply