LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-27-2021, 02:54 AM   #1
arifd86
LQ Newbie
 
Registered: Aug 2009
Distribution: custom ubuntu/lxde
Posts: 24

Rep: Reputation: 3
When both sed and awk will do, which one to chose?


I have the inclination that sed is the lighterweight of the two and thus should be used.

But my sed command uses a wildcard, whereas awk doesn't need to. (However, i don't know if behind the scenes they're both doing regex and thus the search pattern in itself is of equal speed)

here's what I'm doing: (want to return only the number)
Code:
echo '    index: 56' | sed 's/.* index: //'
Code:
echo '    index: 56' | awk '/index:/{print $2}'
edit prepending with time, awk seems to consistently beat sed by 0.001s

Last edited by arifd86; 01-27-2021 at 02:59 AM.
 
Old 01-27-2021, 03:17 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,443
Blog Entries: 3

Rep: Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711Reputation: 2711
I was just going to mention the speed and CPU load aspect. Most of the time the two are about the same, but your pattern is more problematic in sed this time. Here are the times for more repetitions on a very slow piece of hardware:

Code:
$ time for i in $(seq 1 10000); do echo '    index: 56' | sed 's/.* index: //';  done

...

real    5m19.441s
user    1m37.236s
sys     2m35.147s
versus

Code:
$ time for i in $(seq 1 10000); echo '    index: 56' | awk '/index:/{print $2}';  done

...

real    4m14.199s
user    1m5.048s
sys     2m8.869s
versus

Code:
$ time for i in $(seq 1 10000); echo '    index: 56' | awk '$1=="index:"{print $2}';  done

...

real    4m14.731s
user    1m4.882s
sys     2m8.165s
Edit: the above was with MAWK on a Raspberry Pi ZeroW. The slow processor amplifies the differences.

Code:
$ realpath $(which awk)
/usr/bin/mawk

Last edited by Turbocapitalist; 01-27-2021 at 05:17 AM.
 
Old 01-27-2021, 04:09 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,485

Rep: Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455
Poorly constructed regex causing excessive backtracking is gunna have a cost, no doubt about it. I prefer to search for what I want rather than what I don't want - especially when you can use anchors.
But more generally, it it's field based data, use awk. KISS.
 
Old 01-27-2021, 04:41 AM   #4
arifd86
LQ Newbie
 
Registered: Aug 2009
Distribution: custom ubuntu/lxde
Posts: 24

Original Poster
Rep: Reputation: 3
Right, sed feels simpler though, thus I thought i was KISSING
 
Old 01-27-2021, 05:01 AM   #5
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,810

Rep: Reputation: Disabled
Depends on awk flavor, actually. mawk which is default awk in everything Debian-based is very fast and generally on par with sed. Heck, it even beats grep -o '\w*$' in this case!

OTOH, gawk (default awk in Fedora-based distros) not so much.
 
Old 01-27-2021, 06:13 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,485

Rep: Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455Reputation: 3455
Don't use "*" in regex. Ever.
Wellll - maybe once I found a valid use. It introduces zero-length matches (and backtracking) you really don't want unless you really do need it. But you'd better be able to justify it.

Steps off hobby-horse ...
 
1 members found this post helpful.
Old 01-27-2021, 08:42 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 1,061

Rep: Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868
Quote:
Originally Posted by arifd86 View Post
When both sed and awk will do, which one to chose?
Whichever one you prefer!

If you have performance critical code, use real world application behaviour to profile both options and make your decision that way.
(I'd be surprised if choosing between awk and sed was your biggest issue.)



For specifically obtaining the number from that string, the simplest matching regex is "\d+", but it can't be expressed that cleanly in many command line tools - compare the readability of:
Code:
grep -oP '\d+'
grep -oE '[0-9]+'
grep -o '[0-9]\{1,\}'
sed 's/[^0-9]\{1,\}//'
Of course, to get just the last field from a string, "awk '{print $NF}'" might be the simplest option, if the undescribed data format allows that.


Last edited by boughtonp; 01-27-2021 at 08:47 AM.
 
Old 01-27-2021, 08:46 AM   #8
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 1,061

Rep: Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868
Quote:
Originally Posted by shruggy View Post
Depends on awk flavor, actually. mawk which is default awk in everything Debian-based...
My Debian uses gawk, which I don't recall ever configuring, and entering "awk" at https://manpages.debian.org/ brings up the gawk manpage.

 
Old 01-27-2021, 09:21 AM   #9
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,810

Rep: Reputation: Disabled
@boughtonp. If you installed gawk from the repo, it would become the new default, because it has higher priority in update-alternatives. But what gets installed during installation of Debian is mawk.
 
Old 01-27-2021, 10:44 AM   #10
arifd86
LQ Newbie
 
Registered: Aug 2009
Distribution: custom ubuntu/lxde
Posts: 24

Original Poster
Rep: Reputation: 3
Thanks for all the insight everyone, this has come especially handy, because I don't know why now, but now the string is this:
Code:
echo '  * index: 3'
And I have no idea why
Code:
echo '  * index: 3' | awk '/index:/{print $2}' # doesn't work
# use {print $3} if you want it to work.
but
Code:
echo '  * index: 3' | sed 's/.* index: //' # works
So going to use `awk '{print $NF}'`.
(for those who are curious, it is the output to pulseaudio's `pacmd list-sinks` and `pacmd list-sink-inputs`

Last edited by arifd86; 01-27-2021 at 10:51 AM.
 
Old 01-27-2021, 10:56 AM   #11
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 1,810

Rep: Reputation: Disabled
With awk '{print$2}' you would get index: because it is the second field.
 
Old 01-28-2021, 09:19 AM   #12
hish2021
LQ Newbie
 
Registered: Jan 2021
Posts: 25

Rep: Reputation: Disabled
Quote:
Originally Posted by boughtonp View Post
My Debian uses gawk, which I don't recall ever configuring, and entering "awk" at https://manpages.debian.org/ brings up the gawk manpage.
Interesting! When I enter "awk" in https://manpages.debian.org/, I'm taken to https://manpages.debian.org/buster/o.../awk.1.en.html.

Anyway, you probably installed an application that pulled in gawk as a dependency. Looking through `apt rdepends gawk | grep Depends` might offer a clue.
 
Old 01-28-2021, 11:06 AM   #13
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 1,061

Rep: Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868Reputation: 868
Quote:
Originally Posted by hish2021 View Post
Heh, weird - it's now doing that for me too, but it definitely went to gawk's page before.

Previously I didn't go direct, so maybe the route I took somehow set a cookie (but if so I can't replicate it now, nor see why Debian would do something like that).


Quote:
Anyway, you probably installed an application that pulled in gawk as a dependency. Looking through `apt rdepends gawk | grep Depends` might offer a clue.
There's nothing in the list that outputs which I've installed myself, but I guess there could be secondary or tertiary dependencies, and I don't feel like checking them all.

I also wouldn't expect installing a dependency to change a default like this, but not too bothered by it.

 
Old 01-28-2021, 07:32 PM   #14
hish2021
LQ Newbie
 
Registered: Jan 2021
Posts: 25

Rep: Reputation: Disabled
Quote:
Originally Posted by boughtonp View Post
...
There's nothing in the list that outputs which I've installed myself, but I guess there could be secondary or tertiary dependencies, and I don't feel like checking them all...
I've modified my logrotates to keep **all** my dpkg & apt logs. So, for me, code like this helps dig out when I installed something:
Code:
#!/bin/bash

echo "enter the package name;"; echo "use .* as prefix/suffix if the exact package name is not known"
read the_string
[ "$the_string" ] || { echo "You forgot the search string!" ; exit 1 ; }
zgrep -E "status (not-)?installed $the_string:" /var/log/dpkg.log* | sed 's/:/: /' | sort -k2,3 -r | column -t
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed inside awk or awk inside awk maddyfreaks Linux - Newbie 4 06-29-2016 02:10 PM
[Cygwin, sed] Using filenames as both files and search strings within sed lingh Linux - Newbie 5 10-20-2012 11:38 AM
[SOLVED] How substitute in a file when sed and awk both fail porphyry5 Linux - General 2 06-05-2011 02:41 PM
2.6 kernel - which dist to chose? Ahlen Linux - Distributions 7 04-21-2004 11:48 AM
Good tutorial for sed/awk or both !! paonethestar Programming 2 11-08-2003 04:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration