Which one is efficient cut cmd or using awk

suresh.chola · 01-21-2010, 01:08 AM

Hi,

You may find this bit silly.
But I would like to understand it at the core level.

I have a following line.

Code:

line=this,forum,is,for,all,programming,questions

Now if I want the 3rd field from the line, I could get it by either of the below 2 ways :

Code:

ThirdField=$(echo $line|cut -d',' -f3)
or
ThirdField=$(echo $line|awk -F ',' '{print $3}')

My question here is,
Which one is efficient(using cut cmd or awk). Which one works better in all situations.
Right now, i can't think of any situation. However, I would like to know from all your experince that Have you find any difference.

Is there any basic guidelines to write a shell script on efficiency part.

Thanks in advance.

Cheers,
Suresh

quanta · 01-21-2010, 01:20 AM

In my opinion, in this example, they are the same. You can check by using 'time' command. 'cut' is use when separator is colons, commas... while 'awk' is use when columns are separated by a varying number of spaces.

paulsm4 · 01-21-2010, 01:21 AM

You can use the "time" command to benchmark the results. For example:

Quote:

$ line=this,forum,is,for,all,programming,questions

$ time ThirdField=$(echo $line|cut -d',' -f3)
real 0m0.024s
user 0m0.000s
sys 0m0.030s

$ time ThirdField=$(echo $line|awk -F ',' '{print $3}')
real 0m0.021s
user 0m0.010s
sys 0m0.010s

I would have guessed "cut" would be a *lot* less expensive than "awk". Based on the above results ... it looks like I would have been wrong ;-)

'Hope that helps .. PSM

cantab · 01-21-2010, 01:34 AM

You cannot get reliable timing doing something once. I wrapped them in a for loop to repeat them 1000 times, and then did 3 tests for each (to get a handle on the consistency). Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer. A small difference; if one is seeking to improve performance, there are likely to be bigger gains elsewhere. (Starting with not using shell scripting!)

GrapefruiTgirl · 01-21-2010, 01:39 AM

Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW

tuxdev · 01-21-2010, 01:42 AM

My hunch is that most of the time goes to spawning subshells

Code:

line=this,forum,is,for,all,programming,questions
IFS=, read _ _ ThirdField _ <<< "$line"

ghostdog74 · 01-21-2010, 02:41 AM

Quote:

Originally Posted by suresh.chola

My question here is,
Which one is efficient(using cut cmd or awk).

if you are acting on one string, not much difference. If your task is to get fields 2 to 5, using cut's range -f2-5 may be more "clean", whereas you have to use a loop in awk. (but that's not a big problem)

Quote:

Which one works better in all situations.

awk of course. Its a programming language. cut is just a small tool to do one task. you seldom need to use cut (or grep/sed etc) when you know awk.

Quote:

Is there any basic guidelines to write a shell script on efficiency part.

if you are concerned with efficiency, always try to use the shell's internal commands. If you want to cut a string up, make use of IFS, set etc . That way, there's no need to use external tools. However, if you want to process BIG files, the shell is not the way. Use awk, or languages such as Perl/Python for that.

ghostdog74 · 01-21-2010, 02:48 AM

Quote:

Originally Posted by cantab

Cut averaged 18.735s for the thousand, while awk averaged 20.079s, which is 7% longer.

you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.

ghostdog74 · 01-21-2010, 02:51 AM

Quote:

Originally Posted by GrapefruiTgirl

Last time I did a similar comparison, I compared a 10,000 iteration loop operating on a long silly string.
cut was faster than tr; tr was faster than sed; sed was faster than awk. cut and tr were very close. sed significantly faster than awk, doing substring replacements.

FWIW

a lot depends also on how well one know his tools, and how well one understand the problem to solve.

cantab · 01-21-2010, 03:01 AM

Quote:

Originally Posted by ghostdog74

you cannot compare it like that. awk is a bigger executable than cut. And cut only performs a simple task, ie to cut up a string. awk does a lot more, takes a little more time to "load" when executed. If you want to compare apple to apple, make cut do the same things awk does.

One of the OP's questions was which is more efficient for the job of getting the third field from a comma separated line. I did the relevant test for that usage, taking 'efficient' as referring to computer time.
Obviously awk can do many things cut cannot. But for those things cut can do, I expect it will generally be a little faster than awk.
Of course, 'efficiency' can also refer to the programming/scripting stage. But that cannot be measured objectively, since as mentioned it depends on personal familiarity with the tools.

suresh.chola · 01-22-2010, 10:32 AM

Thankyou guys for all your valuable inputs on the efficiency part.

So, Can I conclude that
" In my above scenario where I need to get the 3rd field from a record, CUT is better option than AWK.
However, AWK is better in terms of options and flexibility"

It really helped me coz,
right now I need to loop through a file which has over 1 million records and get the 3rd field from each line and pass it to a routine.

Thanks once again to all of you.

chrism01 · 01-24-2010, 11:17 PM

Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).

ghostdog74 · 01-25-2010, 01:42 AM

Quote:

Originally Posted by chrism01

Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.

don't understand what you mean.

Code:

cut -f1 -d"," file_million_records.txt

that million records are processed "inside" cut.

H_TeXMeX_H · 01-25-2010, 09:00 AM

Quote:

Originally Posted by chrism01

Look at it this way, 'cut' is a one shot utility (Like eg wc). To process 1 million recs, you'd have to invoke it 1 million times.
'awk' is a programming lang and you should (I'm not an awk man, I'd use Perl) be able to invoke awk once(!) and write the entire (1 million recs) process inside that one awk process (I believe...).

That is wrong, cut works on files too and it is run only once. In fact, I'm still betting that it would be somewhat faster than awk on large files, but I don't have any benchmarks.