LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Which one is efficient cut cmd or using awk (https://www.linuxquestions.org/questions/programming-9/which-one-is-efficient-cut-cmd-or-using-awk-783673/)

suresh.chola 01-21-2010 01:08 AM

Which one is efficient cut cmd or using awk
 
Hi,

You may find this bit silly.
But I would like to understand it at the core level.

I have a following line.

Code:

line=this,forum,is,for,all,programming,questions
Now if I want the 3rd field from the line, I could get it by either of the below 2 ways :

Code:

ThirdField=$(echo $line|cut -d',' -f3)
or
ThirdField=$(echo $line|awk -F ',' '{print $3}')

My question here is,
Which one is efficient(using cut cmd or awk). Which one works better in all situations.
Right now, i can't think of any situation. However, I would like to know from all your experince that Have you find any difference.

Is there any basic guidelines to write a shell script on efficiency part.

Thanks in advance.

Cheers,
Suresh

quanta 01-21-2010 01:20 AM

In my opinion, in this example, they are the same. You can check by using 'time' command. 'cut' is use when separator is colons, commas... while 'awk' is use when columns are separated by a varying number of spaces.

paulsm4 01-21-2010 01:21 AM

You can use the "time" command to benchmark the results. For example:
Quote:

$ line=this,forum,is,for,all,programming,questions

$ time ThirdField=$(echo $line|cut -d',' -f3)
real 0m0.024s
user 0m0.000s
sys 0m0.030s

$ time ThirdField=$(echo $line|awk -F ',' '{print $3}')
real 0m0.021s
user 0m0.010s
sys 0m0.010s
I would have guessed "cut" would be a *lot* less expensive than "awk". Based on the above results ... it looks like I would have been wrong ;-)

'Hope that helps .. PSM

cantab 01-21-2010 01:34 AM

You cannot get reliable timing doing something once. I wrapped them in a for loop to repeat them 1000 times, and then did 3 tests for each (to get a handle on the consistency). Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer. A small difference; if one is seeking to improve performance, there are likely to be bigger gains elsewhere. (Starting with not using shell scripting!)

GrapefruiTgirl 01-21-2010 01:39 AM

Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW

tuxdev 01-21-2010 01:42 AM

My hunch is that most of the time goes to spawning subshells
Code:

line=this,forum,is,for,all,programming,questions
IFS=, read _ _ ThirdField _ <<< "$line"


ghostdog74 01-21-2010 02:41 AM

Quote:

Originally Posted by suresh.chola (Post 3834571)
My question here is,
Which one is efficient(using cut cmd or awk).

if you are acting on one string, not much difference. If your task is to get fields 2 to 5, using cut's range -f2-5 may be more "clean", whereas you have to use a loop in awk. (but that's not a big problem)

Quote:

Which one works better in all situations.
awk of course. Its a programming language. cut is just a small tool to do one task. you seldom need to use cut (or grep/sed etc) when you know awk.

Quote:

Is there any basic guidelines to write a shell script on efficiency part.
if you are concerned with efficiency, always try to use the shell's internal commands. If you want to cut a string up, make use of IFS, set etc . That way, there's no need to use external tools. However, if you want to process BIG files, the shell is not the way. Use awk, or languages such as Perl/Python for that.

ghostdog74 01-21-2010 02:48 AM

Quote:

Originally Posted by cantab (Post 3834596)
Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer.

you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.

ghostdog74 01-21-2010 02:51 AM

Quote:

Originally Posted by GrapefruiTgirl (Post 3834599)
Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW

a lot depends also on how well one know his tools, and how well one understand the problem to solve.

cantab 01-21-2010 03:01 AM

Quote:

Originally Posted by ghostdog74 (Post 3834654)
you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.

One of the OP's questions was which is more efficient for the job of getting the third field from a comma separated line. I did the relevant test for that usage, taking 'efficient' as referring to computer time.
Obviously awk can do many things cut cannot. But for those things cut can do, I expect it will generally be a little faster than awk.
Of course, 'efficiency' can also refer to the programming/scripting stage. But that cannot be measured objectively, since as mentioned it depends on personal familiarity with the tools.

suresh.chola 01-22-2010 10:32 AM

Thankyou guys for all your valuable inputs on the efficiency part.

So, Can I conclude that
" In my above scenario where I need to get the 3rd field from a record, CUT is better option than AWK.
However, AWK is better in terms of options and flexibility"

It really helped me coz,
right now I need to loop through a file which has over 1 million records and get the 3rd field from each line and pass it to a routine.

Thanks once again to all of you.

chrism01 01-24-2010 11:17 PM

Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).

ghostdog74 01-25-2010 01:42 AM

Quote:

Originally Posted by chrism01 (Post 3839338)
Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.

don't understand what you mean.
Code:

cut -f1 -d"," file_million_records.txt
that million records are processed "inside" cut.

H_TeXMeX_H 01-25-2010 09:00 AM

Quote:

Originally Posted by chrism01 (Post 3839338)
Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).

That is wrong, cut works on files too and it is run only once. In fact, I'm still betting that it would be somewhat faster than awk on large files, but I don't have any benchmarks.


All times are GMT -5. The time now is 12:55 PM.