When both sed and awk will do, which one to chose?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
When both sed and awk will do, which one to chose?
I have the inclination that sed is the lighterweight of the two and thus should be used.
But my sed command uses a wildcard, whereas awk doesn't need to. (However, i don't know if behind the scenes they're both doing regex and thus the search pattern in itself is of equal speed)
here's what I'm doing: (want to return only the number)
Code:
echo ' index: 56' | sed 's/.* index: //'
Code:
echo ' index: 56' | awk '/index:/{print $2}'
edit prepending with time, awk seems to consistently beat sed by 0.001s
I was just going to mention the speed and CPU load aspect. Most of the time the two are about the same, but your pattern is more problematic in sed this time. Here are the times for more repetitions on a very slow piece of hardware:
Code:
$ time for i in $(seq 1 10000); do echo ' index: 56' | sed 's/.* index: //'; done
...
real 5m19.441s
user 1m37.236s
sys 2m35.147s
versus
Code:
$ time for i in $(seq 1 10000); echo ' index: 56' | awk '/index:/{print $2}'; done
...
real 4m14.199s
user 1m5.048s
sys 2m8.869s
versus
Code:
$ time for i in $(seq 1 10000); echo ' index: 56' | awk '$1=="index:"{print $2}'; done
...
real 4m14.731s
user 1m4.882s
sys 2m8.165s
Edit: the above was with MAWK on a Raspberry Pi ZeroW. The slow processor amplifies the differences.
Code:
$ realpath $(which awk)
/usr/bin/mawk
Last edited by Turbocapitalist; 01-27-2021 at 05:17 AM.
Poorly constructed regex causing excessive backtracking is gunna have a cost, no doubt about it. I prefer to search for what I want rather than what I don't want - especially when you can use anchors.
But more generally, it it's field based data, use awk. KISS.
Depends on awk flavor, actually. mawk which is default awk in everything Debian-based is very fast and generally on par with sed. Heck, it even beats grep -o '\w*$' in this case!
OTOH, gawk (default awk in Fedora-based distros) not so much.
Don't use "*" in regex. Ever.
Wellll - maybe once I found a valid use. It introduces zero-length matches (and backtracking) you really don't want unless you really do need it. But you'd better be able to justify it.
When both sed and awk will do, which one to chose?
Whichever one you prefer!
If you have performance critical code, use real world application behaviour to profile both options and make your decision that way. (I'd be surprised if choosing between awk and sed was your biggest issue.)
For specifically obtaining the number from that string, the simplest matching regex is "\d+", but it can't be expressed that cleanly in many command line tools - compare the readability of:
@boughtonp. If you installed gawk from the repo, it would become the new default, because it has higher priority in update-alternatives. But what gets installed during installation of Debian is mawk.
Anyway, you probably installed an application that pulled in gawk as a dependency. Looking through `apt rdepends gawk | grep Depends` might offer a clue.
Heh, weird - it's now doing that for me too, but it definitely went to gawk's page before.
Previously I didn't go direct, so maybe the route I took somehow set a cookie (but if so I can't replicate it now, nor see why Debian would do something like that).
Quote:
Anyway, you probably installed an application that pulled in gawk as a dependency. Looking through `apt rdepends gawk | grep Depends` might offer a clue.
There's nothing in the list that outputs which I've installed myself, but I guess there could be secondary or tertiary dependencies, and I don't feel like checking them all.
I also wouldn't expect installing a dependency to change a default like this, but not too bothered by it.
...
There's nothing in the list that outputs which I've installed myself, but I guess there could be secondary or tertiary dependencies, and I don't feel like checking them all...
I've modified my logrotates to keep **all** my dpkg & apt logs. So, for me, code like this helps dig out when I installed something:
Code:
#!/bin/bash
echo "enter the package name;"; echo "use .* as prefix/suffix if the exact package name is not known"
read the_string
[ "$the_string" ] || { echo "You forgot the search string!" ; exit 1 ; }
zgrep -E "status (not-)?installed $the_string:" /var/log/dpkg.log* | sed 's/:/: /' | sort -k2,3 -r | column -t
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.