LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-03-2012, 12:11 AM   #1
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Rep: Reputation: 285Reputation: 285Reputation: 285
Awk: One more doubt. Please help!


Hello!
This is related to my previous thread (See here), but that was for understanding of the awk code and this one is to understand ranges.
Everything is working fine, but I am stuck on a point during testing of the code.

Following is sample file:
Code:
user{ubuntu}: cat sample.log
value=0
value=1
value=0.0
value=0.0
value=0.0
value=0.0
value=0.01
value=0.01
value=0.02
value=0.02
value=0.1
value=0.2
value=0.3
value=0.4
value=0.5
And following is code:
Code:
user{ubuntu}: awk 'BEGIN{sum=0; cat1=0; cat2=0; cat3=0;}
{sum++}
/value=0\.[^0]/{cat2++;}
/value=0\.0/{cat3++;}
END{cat1=sum-(cat2+cat3); print sum, cat1, cat2, cat3;}' sample.log
Following is output:
Code:
15 2 5 8
FYI, this code calculates values on basis of some ranges i.e. <0.1, >=0.1 etc. I am confused with ranges that its finding. I actually tested the code with my sample file sample.log, but results are little confusing. Now I just want to understand that in what ranges it's calculating data in cat1, cat2 and cat3?

What I concluded:
(1) {sum++} ...... Sum up all values, i.e. 15. It's fine and no issues.
(2) /value=0\.[^0]/{cat2++;} ..... Sum up all values that are NOT 0.0<whatever>, that means, all values >=0.1.
(3) /value=0\.0/{cat3++;} ..... Sum up all values that are equal to or less than 0.0, and which is 8. It is fine and no issues.
(4) cat1=sum-(cat2+cat3) ...... All values that are >=1.0.

In sample.log file, in which range value=0 will fall? And did I conclude correctly? I only want to know all 3 ranges that it's finding.

Last edited by shivaa; 12-03-2012 at 12:25 AM. Reason: Some info modified & typo
 
Old 12-03-2012, 04:20 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Quote:
(2) /value=0\.[^0]/{cat2++;} ..... Sum up all values that are NOT 0.0<whatever>, that means, all values >=0.1.
That should be: count all the lines that contain value=0.<whatver> as long as it is _not_ value=0.0 (also explained here Awk query and Help understanding "awk" code)

cat2 will increment for the following entries in your example: value=0.1, value=0.2, value=0.3 ... value=0.8, value=0.9

Don't think in terms of less/greater then (<= or >=), this is not arithmetic but pattern matching. The cat2 line would also be triggered if the following entry would be present in your sample.log: value=0.foorbar

Quote:
(4) cat1=sum-(cat2+cat3) ...... All values that are >=1.0.
Calculate entries that are not triggered by the cat2 or cat3 line (these 2 entries in your example: value=0 and value=1)


You can expand you above awk example to print what it is doing:
Code:
$ awk 'BEGIN{sum=0; cat1=0; cat2=0; cat3=0;}
{sum++}
/value=0\.[^0]/{cat2++ ; print "-- cat2 incremented for :" $0 }
/value=0\.0/{cat3++;   ; print "--- cat3 incremented for :" $0 }
END{cat1=sum-(cat2+cat3); print sum, cat1, cat2, cat3;}' sample.log
--- cat3 incremented for :value=0.0
--- cat3 incremented for :value=0.0
--- cat3 incremented for :value=0.0
--- cat3 incremented for :value=0.0
--- cat3 incremented for :value=0.01
--- cat3 incremented for :value=0.01
--- cat3 incremented for :value=0.02
--- cat3 incremented for :value=0.02
- cat2 incremented for :value=0.1
- cat2 incremented for :value=0.2
- cat2 incremented for :value=0.3
- cat2 incremented for :value=0.4
- cat2 incremented for :value=0.5
15 2 5 8
 
1 members found this post helpful.
Old 12-03-2012, 05:45 AM   #3
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Thanks for your response @drunna.
This code was actually developed to extract data on basis of some ranges of values. I understood the code and what it's doing but couldn't understand/calculate that what are the ranges it is searching.
It's definately searching for some ranges of data and printing the count of such matching values in cat1, cat2 and cat3. As I mentioned I only want to know those ranges and purpose of use of such pattern. Else I can say what's use of this code & why it was written? I am unable to find ranges.

So could you once again help me to find out those ranges?
 
Old 12-03-2012, 07:59 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Quote:
Originally Posted by shivaa View Post
This code was actually developed to extract data on basis of some ranges of values.

I understood the code and what it's doing but couldn't understand/calculate that what are the ranges it is searching.
I really doubt if you understand the code.

You keep talking about ranges of values, which is not what's being done. The code looks for specific patterns in a line and does something, increase counters in this case, when that pattern matches anywhere in the line.

Ranges would look like this (simple examples):
[a-z] -> matches range a to z
[4-9] -> matches range 4 to 9
[abc][123] -> matches an a, b or c followed by a 1, 2 or 3

[^0] although technically a range due to the square brackets it means not zero
[0-9][^0] -> matches range 0 to 9 not followed by a zero

Code:
/value=0\.0/
The above snippet looks for value=0.0 (including the value= part!!). Examples of what would match (match is in italics):
Code:
abc value=0.0foo
value=0.09
value=0.0xyz
value=0.012345678
Code:
/value=0\.[^0]/
This snippet would match:
Code:
klm value=0.1
value=0.2
value=0.9bar
rst value=0.7777777
Run the extended awk code I posted (post #2) on this sample.log file:
Code:
foo value=0
bar value=0.0
foo value=0.02
abc value=0.0foo
rst value=0.7777777
value=0.01
value=0.012345678
value=0.0xyz
value=0.1
value=0.599999999
value=0.9bar
value=1
value=1.0
value=1.1
 
1 members found this post helpful.
Old 12-03-2012, 11:13 AM   #5
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Please be assured that I've understood the code.

> 11th field of sample file (i.e. value=<something>) do not contain anything else except numbers (upto 4 decimal values).
> For example, I am again posting a sample sample.log file (plz consider this as real data):
Code:
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0012 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0013 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0014 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0015 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0111 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0121 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0131 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0141 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1131 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1233 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1233 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.0099 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.1099 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.2099 f12 f13
Purpose is that on basis of these values, I've to fetch data between various ranges, for example, less than 0.01; greater than equal to 0.01 but less than 0.10; and greater than equal to 0.01. So I want to understnad that if awk code is searching for patterns (0\.0 and 0\.[^0]) and than counting occurence of those patterns, then it has some purpose e.g. we can conclude that all values excluding 0\.[^0], doesn't mean that all values greater than 0.1 ?. I want to know that what ranges can there be possible on basis on this data and awk searching patterns. Hope my question is clear now.

Last edited by shivaa; 12-03-2012 at 11:18 AM.
 
Old 12-03-2012, 12:34 PM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
I'm not sure I'm able to explain this much further. You keep talking about less/greater/equal then ("ranges" used in arithmetic) and ignoring the value= part. The awk lines in question use pattern matching.

I can point out which awk line would match the posted example:
Code:
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0012 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0013 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0014 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0015 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0111 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0121 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0131 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.0141 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1131 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1233 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=0.1233 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.0099 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.1099 f12 f13
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 value=1.2099 f12 f13
All lines are matched by: {sum++}
The blue lines are matched by this line in awk: /value=0\.0/{cat3++}
The green lines are matched by this line: /value=0\.[^0]/{cat2++}
The black lines are not matched specifically, they are calculated by: cat1=sum-(cat2+cat3)
 
1 members found this post helpful.
Old 12-03-2012, 12:58 PM   #7
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Hi drunna, really sorry to bother you again & again. I have already taken your too much time & efforts.

I had got this code from someone and I just wanted to know the purpose of this? Obviously it was written for some purpose, so I wanted to know that why it was written? why it's matching cat1, cat2 and cat3 patterns?


Anyway, let's forget everything we've discussed above. Suppose I've a file, containing following lines only (nothing else):
Code:
0.0012
0.0013
0.0014
0.0015
0.0111
0.0121
0.0131
0.0141
0.1131
0.1233
0.1233
1.0099
1.1099
1.2099
And out of these values, I want to extract values that are (1) < 0.01; (2) >=0.01 && <0.1; (3) >=0.1. Then we can do this using:
awk '$1<0.01' newsample.log
awk '$1>=0.01 && $1<0.1' newsample.log
awk '$1>0.1' newsample.log

This is what I have been doing in my script, but my purpose was to compare both my script and that awk code, so I could conclude if there's any difference.

Well, please consider it again & take your time (enough time ofcourse) and if you find anything helpful, please suggest.

Last edited by shivaa; 12-03-2012 at 12:59 PM.
 
Old 12-03-2012, 01:35 PM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
The 2 approaches have different output. All I can say is that you cannot compare the output. I don't know which is correct for your specific situation.

Code:
#!/bin/bash

echo "- other script -----------------------------------------------------"

awk '
/1\./     { print $0 " - cat1" }
/0\.0/    { print $0 " -- cat2" }
/0\.[^0]/ { print $0 " --- cat3" }
' newsample.log

echo "- your script ------------------------------------------------------"

awk '
$1 <  0.01 { print $0 " <0.01" }
$1 >= 0.01 && $1<0.1 { print $0 " >=0.01 && <0.1" }
$1 >  0.1 { print $0 " >0.1" }' newsample.log

echo "--------------------------------------------------------------------"
Both script are slightly adjusted to print found output to screen.

Example run on data provided in post #7:
Code:
./foo.sh
- other script -----------------------------------------------------
0.0012 -- cat2
0.0013 -- cat2
0.0014 -- cat2
0.0015 -- cat2
0.0111 -- cat2
0.0121 -- cat2
0.0131 -- cat2
0.0141 -- cat2
0.1131 --- cat3
0.1233 --- cat3
0.1233 --- cat3
1.0099 - cat1
1.1099 - cat1
1.2099 - cat1
- your script ------------------------------------------------------
0.0012 <0.01
0.0013 <0.01
0.0014 <0.01
0.0015 <0.01
0.0111 >=0.01 && <0.1
0.0121 >=0.01 && <0.1
0.0131 >=0.01 && <0.1
0.0141 >=0.01 && <0.1
0.1131 >0.1
0.1233 >0.1
0.1233 >0.1
1.0099 >0.1
1.1099 >0.1
1.2099 >0.1
--------------------------------------------------------------------
 
1 members found this post helpful.
Old 12-03-2012, 11:03 PM   #9
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Thanks @drunna

Thanks a lot @drunna!
Perhaps no more explaination left on this code and values. It's clearly seen that there's no comparision between both scripts. Mine is simply searching for values on basis of ranges, whereas other one considers patterns only.
But just one last question left in my mind - After this much discussion/explainations, did you get any clue that what was purpose of writing that awk code and applying that on those values? I mean if it is seachring for such patterns, then obviously there should be any purpose of doing this...
 
Old 12-04-2012, 01:27 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Both scripts show etime values based on specific criteria.

Maybe this will help to visualize it better:
Code:
awk 'BEGIN{sum=0; cat1=0; cat2=0; cat3=0;}
{sum++}
/value=0\.[^0]/{cat2++;}                                       
/value=0\.0/{cat3++;}                                           
END{cat1=sum-(cat2+cat3);                                          
print "no. lines     : " sum;
print "1.0 or larger : " cat1;
print "0.100 - 0.999 : " cat2;
print "0.000 - 0.099 : " cat3;
}' sample.log
(sample.log used as shown in post #5)
 
1 members found this post helpful.
Old 12-31-2012, 04:54 AM   #11
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Thanks everyone!
Happy new year to you!
 
Old 01-02-2013, 02:30 AM   #12
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
You know, after reading through both of these threads a few times, I'm still not entirely clear on what you are trying to calculate. What exactly do you need to sum up, under what conditions? It appears that for the most part you just want to count lines based on values, correct?

I'm thinking that the pattern matching used so far is just confusing the issue. If you want to do mathematical comparisons/calculations, you should be extracting the numbers from the string and comparing them directly with awk's built-in mathematics engine. That's what it's there for, after all.

To start with, we need to extract the numbers from the fields. There are two general ways to do that. If the line format is always stable, then just set the field separator to divide the line up by multiple characters and shift the field number accordingly. Or alternately you can keep the default field separator use a function like sub or split to extract the part you want from the desired field.

Some simple printing examples to demonstrate, using the data given above:
Code:
awk -F '[ =]' '{ print $12 }' infile.txt
awk '{ sub( /.*=/ , "" , $11 ) ; print $11 }' infile.txt
awk '{ split( $11 , a , "=" ) ; print a[2] }' infile.txt
Then to count the number of entries based on various numerical conditions:

Code:
awk -F '[ =]' '{
sum++
if ( $12 >= 1 ) { cat1++ }
if ( $12 < 1 && $12 >= 0.1 ) { cat2++ }
if ( $12 < 0.1 ) { cat3++ }
}
END{
print "total number of lines  :" , sum
print "lines where N >= 1     :" , cat1
print "lines where 1 > N > 0.1:" , cat2
print "lines where N < 0.1    :" , cat3
}' infile.txt
It's overall much cleaner, easier to read and understand, and less prone to error, don't you think?
 
1 members found this post helpful.
Old 01-02-2013, 11:07 PM   #13
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Hi David,

Well, I was doing things in my own way, as you said, just by comparing the values. But I had got a script from someone (and my team lead was assuming that the script was standard/perfect one), which was based on pattern matching and doing the same thing what I had been doing.

I had got instructions from my team lead to compare results of both that script and mine, so we can evalute that there's no difference in data we're extracting.

I was little confused with that script, so I had started this thread. But anyway, both scripts worked fine and results are also ok.

I am thankful to Mr. Druuna, who had been so helpful throughout the way.

Thanks for your time and response!

Last edited by shivaa; 01-02-2013 at 11:11 PM.
 
  


Reply

Tags
awk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED]Wierd AWK behavior / AWK not reading first line. Involar Linux - Newbie 9 11-28-2012 10:53 AM
awk error awk: line 2: missing } near end of file boscop Linux - Networking 2 04-08-2012 10:49 AM
[SOLVED] call awk from bash script behaves differently to awk from CLI = missing newlines titanium_geek Programming 4 05-26-2011 09:06 PM
[SOLVED] awk: how can I assign value to a shell variable inside awk? quanba Programming 6 03-23-2010 02:18 AM
Doubt in awk scripting.... stalin2020 Programming 1 06-02-2008 07:20 AM


All times are GMT -5. The time now is 08:56 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration