{"id":9,"date":"2009-02-11T19:59:01","date_gmt":"2009-02-11T19:59:01","guid":{"rendered":"http:\/\/www.vidarholen.net\/contents\/blog\/?p=9"},"modified":"2017-10-22T23:41:57","modified_gmt":"2017-10-22T23:41:57","slug":"multithreading-for-performance-in-shell-scripts","status":"publish","type":"post","link":"https:\/\/www.vidarholen.net\/contents\/blog\/?p=9","title":{"rendered":"Multithreading for performance in shell scripts"},"content":{"rendered":"<p>Now that everyone and their grandmother have at least two cores, you can double the efficiency by distributing the workload. However, multithreading support in pure shell scripts is terrible, even though you often do things that can take a while, like encoding a bunch of chip tunes to ogg vorbis:<\/p>\n<pre>\r\nmkdir ogg\r\nfor file in *.mod\r\ndo\r\n\txmp -d wav -o - \"$file\" | oggenc -q 3 -o \"ogg\/$file.ogg\"\r\ndone\r\n<\/pre>\n<p>This is exactly the kind of operation that is conceptually trivial to parallelize, but not obvious to implement in a shell script. Sure, you could run them all in the background and <code>wait<\/code> for them, but that will give you a load average equal to the number of files. Not fun when there are hundreds of files. <\/p>\n<p>You can run two (or however many) in the background, <code>wait<\/code> and then start two more, but that&#8217;ll give terrible performance when the jobs aren&#8217;t of roughly equal length, since at the end, the longest running job will be blocking the other eager cores.<\/p>\n<p>Instead of listing ways that won&#8217;t work, I&#8217;ll get to the point: GNU (and FreeBSD) <code>xargs<\/code> has a <code>-P<\/code> for specifying the number of jobs to run in parallel! <\/p>\n<p>Let&#8217;s rewrite that conversion loop to parallelize<\/p>\n<pre>\r\nmod2ogg() { \r\n\tfor arg; do xmp -d wav -o - \"$arg\" | oggenc -q 3 -o \"ogg\/$arg.ogg\" -; done\r\n}\r\nexport -f mod2ogg\r\nfind . -name '*.mod' -print0 | xargs -0 -n 1 -P 2 bash -c 'mod2ogg \"$@\"' -- \r\n<\/pre>\n<p>And if we already had a mod2ogg script, similar to the function just defined, it would have been simpler:<\/p>\n<pre>\r\nfind . -name '*.mod' -print0 | xargs -0 -n 1 -P 2 mod2ogg\r\n<\/pre>\n<p>Voila. Twice as fast, and you can just increase the <code>-P<\/code> with fancier hardware. <\/p>\n<p>I also added <code>-n 1<\/code> to xargs here, to ensure an even distribution of work. If the work units are so small that executing the command starts becoming a sizable portion of it, you can increase it to make xargs run mod2ogg with more files at a time (which is why it&#8217;s a loop in the example).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that everyone and their grandmother have at least two cores, you can double the efficiency by distributing the workload. However, multithreading support in pure shell scripts is terrible, even though you often do things that can take a while, like encoding a bunch of chip tunes to ogg vorbis: mkdir ogg for file in &hellip; <a href=\"https:\/\/www.vidarholen.net\/contents\/blog\/?p=9\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Multithreading for performance in shell scripts&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[5,4],"tags":[11,53,10,21],"class_list":["post-9","post","type-post","status-publish","format-standard","hentry","category-advanced-linux","category-linux","tag-bash","tag-linux","tag-multithreading","tag-shell-script"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/9","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9"}],"version-history":[{"count":1,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/9\/revisions"}],"predecessor-version":[{"id":700,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=\/wp\/v2\/posts\/9\/revisions\/700"}],"wp:attachment":[{"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vidarholen.net\/contents\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}