LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-07-2013, 06:54 PM   #16
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled

OK, done some conclusive tests.

grep string file*
vs
fgrep string file*
vs
awk -F ";+" '$14 ~ "string" {print $0}'

grep: 5 seconds; fgrep: 3 seconds; awk: 16 seconds

So my assumption was correct. While awk is much more powerful, it is not wise to use it exclusively to search for patterns. Instead, use grep to quickly find the lines of interest then pipe the output to awk for further processing.

Last edited by cosminel; 10-07-2013 at 07:07 PM.
 
Old 10-07-2013, 07:59 PM   #17
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
what is the grep | awk ?
 
Old 10-07-2013, 11:53 PM   #18
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
You mean the speed of grep | awk? That would be harder to accurately measure. I would need to identify the same sent/received data over a larger number of file records and then do someting like
grep sntdata file* | grep rcvdata
vs
grep sntdata file* | awk rcvdata {print $0} (for the sake of measuring)

This would return the same number of results while comparing grep | grep vs grep | awk
 
Old 10-07-2013, 11:57 PM   #19
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
don't forget to flush cache if serious about benchmarking
 
Old 10-08-2013, 12:14 AM   #20
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
I am not looking for highly accurate results, only for ones that would matter in real world scenarios, results with obvious differences between them. For example if one test gives 60 seconds and the other gives 62 seconds, I would consider them equal, from a real world operation a point of view.

What I can do is run each command 3 times and measure. Then I could average the timed values or choose the best one. I may have more time next week, will try to do these tests before though.

Also keep in mind that I am doing these tests on real production equipment so there may be some deviations depending on the current load. I cannot control that kind of environment which is also considered highly sensitive.
 
Old 10-08-2013, 01:01 AM   #21
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by cosminel View Post
I am not looking for highly accurate results, only for ones that would matter in real world scenarios, results with obvious differences between them. For example if one test gives 60 seconds and the other gives 62 seconds, I would consider them equal, from a real world operation a point of view.
I would only worry if things took hours to run
Quote:
Originally Posted by cosminel View Post
What I can do is run each command 3 times and measure. Then I could average the timed values or choose the best one. I may have more time next week, will try to do these tests before though.
after the first run you may have files in cache memory, so 2nd/3rd runs will be 'faster' but not in real world

Quote:
Originally Posted by cosminel View Post
Also keep in mind that I am doing these tests on real production equipment so there may be some deviations depending on the current load. I cannot control that kind of environment which is also considered highly sensitive.
better to keep the number of processes down than have 'pure speed'


Anyway, I'm still confused ..

the field numbers keep changing, 7, 15, 13, 14 ?

if your field of interest is fixed (15), then GazL's grep is what you want
 
Old 10-08-2013, 01:11 AM   #22
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
The number of fields remains the same. I might throw GazL's sexy regexp into the benchmark mix. See how things go.
 
Old 10-08-2013, 01:17 AM   #23
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
yes, the number of fields are the same
but you mention that the position of the string the user enters is important

What I still don't quite understand is if you want to know which field the string is in, or only if it is in a specific field.
 
Old 10-08-2013, 01:29 AM   #24
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
The two strings I am interested in only occur at the same position within the lines. In the example I posted, that's the 8th and 13th field.

I still need to come up with the optimum conditions syntax for my full script. I have 3 user generated variables that I must account for, each one being existent or non-existent. Fun times ahead The previous version of the script had similar conditions but easier to construct.
 
Old 10-08-2013, 02:34 AM   #25
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
totally made up logic

Input
Code:
data;data;data;data;;data;my_string;foo;data;data;data;data;;;FooBar;;;;etc;etc
data;data;data;data;;data;my_string;bar;data;data;data;data;;;FooBar;;;;etc;etc
data;data;data;data;;data;my_string;car;data;data;data;data;;;FooBar;;;;etc;etc
data;data;data;data;;data;my_string;car;data;data;data;data;;;foo;;;;etc;etc
data;data;data;data;;data;my_string;bar;data;data;data;data;;;foo;;;;etc;etc
User vars
Code:
String1=foo
String2=bar
String3=FooBar
Code:
awk -v A=$String1 -v B=$String2 -v C=$String3 -F\; '{
   if (( $8 == A || $8 == B ) && $15 == C) {printf "%-30s%s\n","8 is A or B AND 15 is C ",$0};
   if ( $8 != A  && $15 == C) {printf "%-30s%s\n","8 is not A AND 15 is C ",$0};
   if (( $8 == A || $8 == B ) && $15 != C) {printf "%-30s%s\n","8 is A or B AND 15 is not C ",$0};
}' Input
gives
Code:
8 is A or B AND 15 is C       data;data;data;data;;data;my_string;foo;data;data;data;data;;;FooBar;;;;etc;etc
8 is A or B AND 15 is C       data;data;data;data;;data;my_string;bar;data;data;data;data;;;FooBar;;;;etc;etc
8 is not A AND 15 is C        data;data;data;data;;data;my_string;bar;data;data;data;data;;;FooBar;;;;etc;etc
8 is not A AND 15 is C        data;data;data;data;;data;my_string;car;data;data;data;data;;;FooBar;;;;etc;etc
8 is A or B AND 15 is not C   data;data;data;data;;data;my_string;bar;data;data;data;data;;;foo;;;;etc;etc
added a little to the beginning so only data which contains "FooBar" is considered ( any field )
Code:
awk -v A=$String1 -v B=$String2 -v C=$String3 -F\; '$0 ~ C{
   if (( $8 == A || $8 == B ) && $15 == C) {printf "%-30s%s\n","8 is A or B AND 15 is C ",$0};
   if ( $8 != A  && $15 == C) {printf "%-30s%s\n","8 is not A AND 15 is C ",$0};
   if (( $8 == A || $8 == B ) && $15 != C) {printf "%-30s%s\n","8 is A or B AND 15 is not C ",$0};
}' Input
Code:
8 is A or B AND 15 is C       data;data;data;data;;data;my_string;foo;data;data;data;data;;;FooBar;;;;etc;etc
8 is A or B AND 15 is C       data;data;data;data;;data;my_string;bar;data;data;data;data;;;FooBar;;;;etc;etc
8 is not A AND 15 is C        data;data;data;data;;data;my_string;bar;data;data;data;data;;;FooBar;;;;etc;etc
8 is not A AND 15 is C        data;data;data;data;;data;my_string;car;data;data;data;data;;;FooBar;;;;etc;etc
this time, only when no "FooBar" ( any field )
Code:
awk -v A=$String1 -v B=$String2 -v C=$String3 -F\; '$0 !~ C{
   if (( $8 == A || $8 == B ) && $15 == C) {printf "%-30s%s\n","8 is A or B AND 15 is C ",$0};
   if ( $8 != A  && $15 == C) {printf "%-30s%s\n","8 is not A AND 15 is C ",$0};
   if (( $8 == A || $8 == B ) && $15 != C) {printf "%-30s%s\n","8 is A or B AND 15 is not C ",$0};
}' Input
Code:
8 is A or B AND 15 is not C   data;data;data;data;;data;my_string;bar;data;data;data;data;;;foo;;;;etc;etc


Now, instead of just printing, things can be sent to different files based on which statements are true (or false )
you can even have awk execute system commands

if you want to do different things based on what you find ( or don't find ) awk is very flexible

However, if I knew perl, python, ruby or some other languages, I might favour them

I guess it really depends on how complex your logic is, but I wager it is easier in awk than with grep alone and bash 'combo'
I have a feeling this 'grep' business is just a small part of something bigger

Warning:
too much coffee..
I really didn't check the logic!



A Link

http://www.gnu.org/s/gawk/manual/

should point out
Other awks available , nawk , mawk
nawk is faster, but missing some 'features' of gawk
http://www.staff.science.uu.nl/~oost.../nawk_toc.html

Hopefully someone can give some incite into perl / python / ruby
I don't know much about them, they may be 'better' than awk for this

Last edited by Firerat; 10-08-2013 at 02:51 AM. Reason: grep andbash, not grep alone
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Retain first occurence of a pattern, remove all others hector00 Programming 11 05-31-2013 02:07 PM
print pattern matching lines until immediate occurence of a character keerthika Linux - Newbie 7 04-11-2012 05:58 AM
[SOLVED] Grep until certain character or pattern appears ohijames Programming 7 06-28-2010 08:38 PM
how to delete nth character in a text file? xiawinter Linux - Software 3 05-13-2008 10:50 AM
pattern file with no return character ksun Linux - Newbie 1 12-28-2004 06:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration