LXer: Optimizing BASH Scripts

LXer · 05-28-2013, 03:51 PM

Published at LXer:

Shell script writers may write a script that runs slowly, but the programmer cannot figure out what code is causing the issue. Even if they find the problem, they may have difficulties finding a better way to write the code. This article will show inefficient code and how to write efficient script. This article will also show methods of writing code to make the code execute quickly.

Read More...

David the H. · 05-30-2013, 02:37 PM

A bit of good advice, but horrible examples. And it didn't go nearly deep enough into what's possible.

To start with, it would've been better to state more clearly the Useless Use Of Cat, and not continued to use it in the subsequent examples.

Even better I would stress the importance of avoiding external process calls of most kinds whenever possible. All the sed examples at the end could've been replaced with parameter substitutions, saving a process or two. And don't call something like date 20 times; run it once, save the output to a variable, and use that. You could even save multiple values in a single variable or array, and use the shell to print out only the ones you need at that time.

Interestingly, in my testing, looping through small files (up to about .5-1 kilobyte, depending on the exact operation) in the shell and processing every line can often be faster than doing the same action with sed or awk.

Code:

#on a file that contains 20 lines of "foo bar baz" (244 bytes)

$ time { while read -r line; do echo "${line/foo/FOO}"; done <file.txt ;}
FOO bar baz
FOO bar baz
....etc....

real    0m0.004s
user    0m0.004s
sys     0m0.000s

$ time sed 's/foo/FOO/' file.txt
FOO bar baz
FOO bar baz
....etc....

real    0m0.008s
user    0m0.000s
sys     0m0.000s

sed started to perform faster than the loop at about 50 lines on my system. But the following variation was still highly competitive even at 100 lines:

Code:

$ time { mapfile -t lines <file.txt ; printf '%s\n' "${lines[@]/foo/FOO}" ;} 
FOO bar baz
FOO bar baz
....etc....

real    0m0.009s
user    0m0.000s
sys     0m0.004s

H_TeXMeX_H · 05-31-2013, 03:23 AM

I agree with not using cat unless you need it. Checking my scripts, I only use it once to cat a file to be viewed on the terminal.

I tend to write bash scripts using as many external utilities as possible, because they are faster, especially for large amounts of data. For smaller amounts, bash can be faster like you have shown. I would also consider the readability of the code ... I would go with the external utilities. I mean the sed line is much easier to read.

David the H. · 05-31-2013, 12:22 PM

Compiled external programs may be faster in themselves, but this is balanced against the need for the shell to spawn extra processes for them.

My usual advice is to mostly use text-processing tools like grep/sed/awk when you have to operate on large blocks of text en-masse, or if you have complex operations that the shell can't do easily. Pre-filtering lines with grep before reading them into the shell is a good use of it, for example. But once a text string is stored in a variable, it's nearly always more efficient to use in-shell manipulations like parameter substitution. Remember the shell interpreter is also a compiled program, and its individual built-in operations are generally just as efficient at what they do as the others.

As for readability, or to be more precise comprehensibility, there is a rather large subjective component involved, and a lot of it depends on experience. I personally find shell code to be no more difficult to understand than the syntax of sed or awk, and it can often be easier to figure out than some of the more complex expressions of the latter.

I will admit that, given a choice between a sed one-liner and a 10 line shell function that does the same thing, it often make sense to go with the former. But this kind of thing is not at all acceptable in my opinion:

Code:

# uggh!
var=foobarbaz
var=$( echo "$var" | sed 's/baz$//' )
echo "$var"

# yes!
var=foobarbaz
var=${var%baz}
echo "$var"

I find the in-shell version to be both cleaner and easier to read, and it's almost infinitely faster (it's hard to beat 0).

And of course, the topic at hand here is optimization, not readability. When milliseconds count, sed it out, at least in kind of operations like the above.