LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-21-2010, 01:08 AM   #1
suresh.chola
LQ Newbie
 
Registered: Jan 2010
Posts: 8

Rep: Reputation: 0
Which one is efficient cut cmd or using awk


Hi,

You may find this bit silly.
But I would like to understand it at the core level.

I have a following line.

Code:
line=this,forum,is,for,all,programming,questions
Now if I want the 3rd field from the line, I could get it by either of the below 2 ways :

Code:
ThirdField=$(echo $line|cut -d',' -f3)
or
ThirdField=$(echo $line|awk -F ',' '{print $3}')
My question here is,
Which one is efficient(using cut cmd or awk). Which one works better in all situations.
Right now, i can't think of any situation. However, I would like to know from all your experince that Have you find any difference.

Is there any basic guidelines to write a shell script on efficiency part.

Thanks in advance.

Cheers,
Suresh

Last edited by suresh.chola; 01-21-2010 at 01:17 AM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 01-21-2010, 01:20 AM   #2
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 101Reputation: 101
In my opinion, in this example, they are the same. You can check by using 'time' command. 'cut' is use when separator is colons, commas... while 'awk' is use when columns are separated by a varying number of spaces.
 
Old 01-21-2010, 01:21 AM   #3
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
You can use the "time" command to benchmark the results. For example:
Quote:
$ line=this,forum,is,for,all,programming,questions

$ time ThirdField=$(echo $line|cut -d',' -f3)
real 0m0.024s
user 0m0.000s
sys 0m0.030s

$ time ThirdField=$(echo $line|awk -F ',' '{print $3}')
real 0m0.021s
user 0m0.010s
sys 0m0.010s
I would have guessed "cut" would be a *lot* less expensive than "awk". Based on the above results ... it looks like I would have been wrong ;-)

'Hope that helps .. PSM
 
1 members found this post helpful.
Old 01-21-2010, 01:34 AM   #4
cantab
Member
 
Registered: Oct 2009
Location: England
Distribution: Kubuntu, Ubuntu, Debian, Proxmox.
Posts: 553

Rep: Reputation: 115Reputation: 115
You cannot get reliable timing doing something once. I wrapped them in a for loop to repeat them 1000 times, and then did 3 tests for each (to get a handle on the consistency). Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer. A small difference; if one is seeking to improve performance, there are likely to be bigger gains elsewhere. (Starting with not using shell scripting!)
 
2 members found this post helpful.
Old 01-21-2010, 01:39 AM   #5
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556
Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW
 
Old 01-21-2010, 01:42 AM   #6
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 115Reputation: 115
My hunch is that most of the time goes to spawning subshells
Code:
line=this,forum,is,for,all,programming,questions
IFS=, read _ _ ThirdField _ <<< "$line"
 
Old 01-21-2010, 02:41 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by suresh.chola View Post
My question here is,
Which one is efficient(using cut cmd or awk).
if you are acting on one string, not much difference. If your task is to get fields 2 to 5, using cut's range -f2-5 may be more "clean", whereas you have to use a loop in awk. (but that's not a big problem)

Quote:
Which one works better in all situations.
awk of course. Its a programming language. cut is just a small tool to do one task. you seldom need to use cut (or grep/sed etc) when you know awk.

Quote:
Is there any basic guidelines to write a shell script on efficiency part.
if you are concerned with efficiency, always try to use the shell's internal commands. If you want to cut a string up, make use of IFS, set etc . That way, there's no need to use external tools. However, if you want to process BIG files, the shell is not the way. Use awk, or languages such as Perl/Python for that.
 
1 members found this post helpful.
Old 01-21-2010, 02:48 AM   #8
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by cantab View Post
Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer.
you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.
 
Old 01-21-2010, 02:51 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by GrapefruiTgirl View Post
Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW
a lot depends also on how well one know his tools, and how well one understand the problem to solve.
 
Old 01-21-2010, 03:01 AM   #10
cantab
Member
 
Registered: Oct 2009
Location: England
Distribution: Kubuntu, Ubuntu, Debian, Proxmox.
Posts: 553

Rep: Reputation: 115Reputation: 115
Quote:
Originally Posted by ghostdog74 View Post
you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.
One of the OP's questions was which is more efficient for the job of getting the third field from a comma separated line. I did the relevant test for that usage, taking 'efficient' as referring to computer time.
Obviously awk can do many things cut cannot. But for those things cut can do, I expect it will generally be a little faster than awk.
Of course, 'efficiency' can also refer to the programming/scripting stage. But that cannot be measured objectively, since as mentioned it depends on personal familiarity with the tools.
 
Old 01-22-2010, 10:32 AM   #11
suresh.chola
LQ Newbie
 
Registered: Jan 2010
Posts: 8

Original Poster
Rep: Reputation: 0
Thankyou guys for all your valuable inputs on the efficiency part.

So, Can I conclude that
" In my above scenario where I need to get the 3rd field from a record, CUT is better option than AWK.
However, AWK is better in terms of options and flexibility"

It really helped me coz,
right now I need to loop through a file which has over 1 million records and get the 3rd field from each line and pass it to a routine.

Thanks once again to all of you.
 
Old 01-24-2010, 11:17 PM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).
 
1 members found this post helpful.
Old 01-25-2010, 01:42 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by chrism01 View Post
Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
don't understand what you mean.
Code:
cut -f1 -d"," file_million_records.txt
that million records are processed "inside" cut.
 
Old 01-25-2010, 09:00 AM   #14
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Quote:
Originally Posted by chrism01 View Post
Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).
That is wrong, cut works on files too and it is run only once. In fact, I'm still betting that it would be somewhat faster than awk on large files, but I don't have any benchmarks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
cut part of a string using awk m4rtin Programming 2 09-03-2009 07:32 PM
checking for intltool >= 0.35.0... awk: cmd. line:1: fatal: cannot open file `./intlt chytraeus Linux - Software 2 12-25-2008 05:08 AM
How to use command grep,cut,awk to cut a data from a file? hocheetiong Linux - Newbie 7 09-11-2008 07:16 PM
cut / awk command?? Sammy2ooo Linux - Newbie 1 05-27-2003 05:46 PM
Awk and Shell CMD Output xanthium Programming 16 04-24-2002 06:13 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration