{"id":19,"date":"2010-02-22T20:51:17","date_gmt":"2010-02-22T20:51:17","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=19"},"modified":"2010-02-22T20:55:55","modified_gmt":"2010-02-22T20:55:55","slug":"pattern-matching-with-bash-not-grep","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=19","title":{"rendered":"Pattern matching with Bash (not grep)"},"content":{"rendered":"<p>Pattern matching, either on file names or variable contents, is something Bash can do faster and more accurately by itself than with grep. This post tersely describes some cases where bash&#8217;s own pattern matching can help, by being faster, easier or better. <\/p>\n<p><strong>Simple substring search on variables<\/strong><\/p>\n<pre>\r\n# Check if a variable contains 'foo'. Just to warm up.\r\n\r\n# Works\r\nif echo \"$var\" | grep -q foo\r\nif [[ \"$(echo $var | grep foo))\" == \"\" ]]\r\n\r\n# Easier and faster \r\nif [[ $var == *foo* ]] \r\n<\/pre>\n<p>The latter runs several hundred times faster by saving two forks (good to know when looping), and the code is cleaner and clearer.<\/p>\n<p><strong>Mixed pattern\/fixed string search on variables<\/strong><\/p>\n<p>This is a less common but more interesting case. <\/p>\n<pre>\r\n#Check if \/usr\/bin overrides our install dir\r\n\r\n# Mostly works (Can fail if $installdir contains \r\n# regex characters like . * [ ] etc)\r\nif echo \"$PATH\" | grep -q \"\/usr\/bin:.*:$installdir\"\r\n\r\n# Quoted strings will not be interpreted as globs\r\nif [[ $PATH == *\/usr\/bin:*:\"$installdir\" ]] \r\n<\/pre>\n<p>We want parts of our input to be interpreted as regex, and parts to be literal, so neither grep nor fgrep entirely fits. Manually trying to escape regex chars is icky at best. We end up chancing that people won&#8217;t use the script anywhere near weirdly named files (like, in their torrent directory). With globs, bash doesn&#8217;t have to reparse the variable contents as part of the pattern, and just knows to take quoted strings literally.<\/p>\n<p>Of course, you see how both the above fails to account for cases like \/usr\/bin:$installdir. This is not something you can easily express in traditional globs, but bash does regex too, and the semantics of quotes remain the same (since 3.2 or so at least):<\/p>\n<pre>\r\n# Quoted strings will not be interpreted as regex either\r\nif [[ $PATH =~ (^|.*:)\/usr\/bin(:|:.*:)\"$dir\"(:.*|$) ]]\r\n<\/pre>\n<p><strong>Matching file names<\/strong><\/p>\n<p>I&#8217;ll skip the trivial examples for things like <code>`ls | grep .avi$`<\/code>. Here is a case where traditional globs don&#8217;t cut it:<\/p>\n<pre>\r\n# Copy non-BBC .avi files, and fail on half a dozen different cases\r\ncp $(ls *.avi | grep -v BBC) \/stuff\r\n<\/pre>\n<p>Bash has another form of regular expressions, extglobs (enable with <code>shopt -s extglob<\/code>). These are mathematically regular, but don&#8217;t follow the typical unix regex syntax:<\/p>\n<pre> \r\n# Copy non-BBC .avi files without making a mess \r\n# when files have spaces or other weird characters\r\ncp !(*BBC*).avi \/stuff\r\n<\/pre>\n<p>man bash contains enough on extglob, so I&#8217;d just like to point out one thing. <code>grep -v foo<\/code> can be replaced by <code>!(foo)<\/code>, which strives to reject &#8220;foo&#8221; (unlike [^f][^o][^o] and similar attempts which strive to accept). <code>egrep \"foo|bar\"<\/code> can be replaced by <code>@(foo|bar)<\/code> to match one of the patterns. But how about <code> grep foo | grep bar<\/code> to match both? <\/p>\n<p>That&#8217;s our old friend De Morgan: !(@(!(foo)|!(bar))). Don&#8217;t you just love that guy? <\/p>\n<p>PS: If you don&#8217;t already use parameter expansion to do simple trimming and replacement on variables, now could be a good time to look up that and probably save yourself a lot of sed too.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pattern matching, either on file names or variable contents, is something Bash can do faster and more accurately by itself than with grep. This post tersely describes some cases where bash&#8217;s own pattern matching can help, by being faster, easier or better. Simple substring search on variables # Check if a variable contains &#8216;foo&#8217;. Just &hellip; <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=19\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Pattern matching with Bash (not grep)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5,4],"tags":[11,21],"class_list":["post-19","post","type-post","status-publish","format-standard","hentry","category-advanced-linux","category-linux","tag-bash","tag-shell-script"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/19","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19"}],"version-history":[{"count":0,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/19\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}