When both sed and awk will do, which one to chose?
I have the inclination that sed is the lighterweight of the two and thus should be used.
But my sed command uses a wildcard, whereas awk doesn't need to. (However, i don't know if behind the scenes they're both doing regex and thus the search pattern in itself is of equal speed) here's what I'm doing: (want to return only the number) Code:
echo ' index: 56' | sed 's/.* index: //' Code:
echo ' index: 56' | awk '/index:/{print $2}' |
I was just going to mention the speed and CPU load aspect. Most of the time the two are about the same, but your pattern is more problematic in sed this time. Here are the times for more repetitions on a very slow piece of hardware:
Code:
$ time for i in $(seq 1 10000); do echo ' index: 56' | sed 's/.* index: //'; done Code:
$ time for i in $(seq 1 10000); echo ' index: 56' | awk '/index:/{print $2}'; done Code:
$ time for i in $(seq 1 10000); echo ' index: 56' | awk '$1=="index:"{print $2}'; done Code:
$ realpath $(which awk) |
Poorly constructed regex causing excessive backtracking is gunna have a cost, no doubt about it. I prefer to search for what I want rather than what I don't want - especially when you can use anchors.
But more generally, it it's field based data, use awk. KISS. |
Right, sed feels simpler though, thus I thought i was KISSING ;)
|
Depends on awk flavor, actually. mawk which is default awk in everything Debian-based is very fast and generally on par with sed. Heck, it even beats grep -o '\w*$' in this case!
OTOH, gawk (default awk in Fedora-based distros) not so much. |
Don't use "*" in regex. Ever.
Wellll - maybe once I found a valid use. It introduces zero-length matches (and backtracking) you really don't want unless you really do need it. But you'd better be able to justify it. Steps off hobby-horse ... |
Quote:
If you have performance critical code, use real world application behaviour to profile both options and make your decision that way. (I'd be surprised if choosing between awk and sed was your biggest issue.) For specifically obtaining the number from that string, the simplest matching regex is "\d+", but it can't be expressed that cleanly in many command line tools - compare the readability of: Code:
grep -oP '\d+' |
Quote:
|
@boughtonp. If you installed gawk from the repo, it would become the new default, because it has higher priority in update-alternatives. But what gets installed during installation of Debian is mawk.
|
Thanks for all the insight everyone, this has come especially handy, because I don't know why now, but now the string is this:
Code:
echo ' * index: 3' Code:
echo ' * index: 3' | awk '/index:/{print $2}' # doesn't work Code:
echo ' * index: 3' | sed 's/.* index: //' # works (for those who are curious, it is the output to pulseaudio's `pacmd list-sinks` and `pacmd list-sink-inputs` |
With awk '{print$2}' you would get index: because it is the second field.
|
Quote:
Anyway, you probably installed an application that pulled in gawk as a dependency. Looking through `apt rdepends gawk | grep Depends` might offer a clue. |
Quote:
Previously I didn't go direct, so maybe the route I took somehow set a cookie (but if so I can't replicate it now, nor see why Debian would do something like that). Quote:
I also wouldn't expect installing a dependency to change a default like this, but not too bothered by it. |
Quote:
Code:
#!/bin/bash |
All times are GMT -5. The time now is 12:45 PM. |