LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-15-2011, 08:29 PM   #1
Alkass
Member
 
Registered: Mar 2010
Posts: 42

Rep: Reputation: 0
how to count specific entries inside a block


Hello

What I want to do is from a file having block like
<event>
8 3 0.2685416E-02
2 -1 0
21 -1 0
23 2 1
12 1 3
-12 1 3
2 1 1
21 1 1
21 1 1
# 0.1471784E+03 0.1471784E+03
</event>
<event>
6 1 0.2685416E-02
21 -1 0
-1 -1 0
23 2 1
12 1 3
-12 1 3
-1 1 1
# 0.2108131E+03 0.2108131E+03
</event>
<event>
7 2 0.2685416E-02
21 -1 0
1 -1 0
23 2 1
12 1 3
-12 1 3
1 1 1
21 1 1
</event>

The first line after the "<event>" is its process-id, so I would like to have at the end a summary of how many "event" block I have for each type, ie how many

6 1 0.2685416E-02

or how many

7 2 0.2685416E-02

etc etc

I do not know in advance how many different-kind of block I will have, so it has to be a bit smart to scan the file, and make an new "summary" info for each unique type

I was using something like

awk '/<event>/,/<\/event>/{if ($3 -eq 0.2685416E-02 ) { print $1" "$2" "$3}}' file > out

and then

grep -c "$1" "$2" "$3" but with no success since my awk commands prints all lines of each block

Suppose, that in advance I do know the $3, ie the 0.2685416E-02 which is a kind of weight

Thanks in advance
 
Old 07-15-2011, 08:45 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
This will return the number of occurrences of each string (line) immediately following a line containing <event>:
Code:
awk '/<event>/ { getline key ; list[key]++ } END { for (key in list) printf("%7d: %s\n", list[key], key) }' file
Basically, for each line in file that contains <event>, the script reads the next line into variable key, and increases the corresponding value in the associative list list. In awk, the ++ operator will set value 1 if the entry was not defined before, so it is perfectly okay to do that.

The END block will scan the defined entries (in random order), and print the number of occurrences for each string (line), and the strings (lines) too.

Last edited by Nominal Animal; 07-15-2011 at 08:46 PM.
 
1 members found this post helpful.
Old 07-15-2011, 09:01 PM   #3
Alkass
Member
 
Registered: Mar 2010
Posts: 42

Original Poster
Rep: Reputation: 0
Thanks a lot!!
This indeed print a list like

1: 6 1 0.2685416E-02 0.1411792E+03 0.7957747E-01 0.1215708E+00
1: 7 2 0.2685416E-02 0.1368533E+03 0.7957747E-01 0.1221346E+00
1: 6 1 0.2685416E-02 0.2416091E+03 0.7957747E-01 0.1125967E+00
1: 7 2 0.2685416E-02 0.1451408E+03 0.7957747E-01 0.1210738E+00
1: 8 3 0.2685416E-02 0.1560919E+03 0.7957747E-01 0.1197865E+00
1: 8 3 0.2685416E-02 0.2148430E+03 0.7957747E-01 0.1144428E+00
1: 8 3 0.2685416E-02 0.1469535E+03 0.7957747E-01 0.1208522E+00
1: 6 1 0.2685416E-02 0.1578388E+03 0.7957747E-01 0.1195920E+00
1: 7 2 0.2685416E-02 0.1191213E+03 0.7957747E-01 0.1247137E+00
1: 8 3 0.2685416E-02 0.1382279E+03 0.7957747E-01 0.1219530E+00
1: 8 3 0.2685416E-02 0.1648319E+03 0.7957747E-01 0.1188402E+00
1: 6 1 0.2685416E-02 0.1551985E+03 0.7957747E-01 0.1198871E+00
1: 7 2 0.2685416E-02 0.1886526E+03 0.7957747E-01 0.1165587E+00
1: 8 3 0.2685416E-02 0.2150698E+03 0.7957747E-01 0.1144259E+00
1: 6 1 0.2685416E-02 0.1702426E+03 0.7957747E-01 0.1182862E+00

so, how do I get the number of occurrences for each key?

Thanks in advance!
 
Old 07-15-2011, 09:41 PM   #4
Alkass
Member
 
Registered: Mar 2010
Posts: 42

Original Poster
Rep: Reputation: 0
Well, one more thing that maybe I forgot to mention --- I would like the "sum" to be based only upon the first 2 fields, ie
<event>
8 3 0.2685416E-02
2 -1 0
21 -1 0
23 2 1
12 1 3
-12 1 3
2 1 1
21 1 1
21 1 1
# 0.1471784E+03 0.1471784E+03
</event>
<event>
8 3 0.425677
2 -1 0
21 -1 0
23 2 1
12 1 3
-12 1 3
2 1 1
21 1 1
21 1 1
# 0.1471784E+03 0.1471784E+03
</event>

should be counted in the same variable, as the first two columns from the first line after the <event> block as the same -- The script you sent works perfect, but counts separately the
8 3 0.2685416E-02
and
8 3 0.425677

So, how can I constrain the key-listing based on the only first 2 values ?

Thanks in advance!
 
Old 07-16-2011, 04:38 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
Instead of placing the line into the variable 'key', just use getline on its own and it will then split the value as per the normal FS.
You can then use the first and second fields as the index in your array:
Code:
awk '/<event>/ { getline ; list[$1 $2]++ } END { for (key in list) printf("%7d: %s\n", list[key], key) }' file
 
Old 07-16-2011, 05:05 AM   #6
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Um, now I'm confused. I cannot understand what output you'd like, exactly. In your initial post you said the entire line following a line containing <event> is the key, so I used that. Each key occurs only once in your example input. If a key were to occur twice, the line would start with '2:'.
Quote:
Originally Posted by Alkass View Post
So, how can I constrain the key-listing based on the only first 2 values ?
Calling plain getline reads the next line into the positional parameters. There is no string concatenation function or operator in awk, you just put them next to each other, and they're interpreted as a single string. So, if you want to only use the two first fields from the line following the line containing <event>, use
Code:
awk '/<event>/ { getline ; list[$1 " " $2]++ } END { for (key in list) printf("%7d: %s\n", list[key], key) }' file
Note that you can count the number of occurrences in the third field separately:
Code:
awk '/<event>/ { getline
                 ids[$1 " " $2]++
                 sums[$3]++
               }
           END { printf("Occurrences of the first two fields:\n")
                 for (key in ids)
                     printf("%7d: %s\n", ids[key], key)
                 printf("\nOccurrences of the third field:\n")
                 for (key in sums)
                     printf("%7d: %s\n", sums[key], key)
               }' file
Counting the number of occurrences sorted by the two first keys is more complicated; you need to construct the list as strings.
Code:
awk '/<event>/ { getline
                 key = $1 " " $2
                 val = $3

                 # Number of times key exists
                 counts[key]++

                 # All vals for this key, as a string
                 values[key] = values[key] val " "
               }
           END { for (key in counts) {
                     # Clear list array
                     split("", list)

                     # Split vals for this key into temp array
                     split(values[key], temp)

                     # Count the unique values in temp to list array
                     for (k in temp) list[temp[k]]++

                     # Print the key and key count,
                     printf("%s: %d times,\n", key, counts[key])

                     # then the vals and val counts for this key
                     for (k in list)
                         printf("\t%s: %d times\n", k, list[k])
                 }
               }' file
@grail: You need the space string between $1 and $2, i.e. $1 " " $2, or you'll merge the two fields into one (without an intervening space).

Last edited by Nominal Animal; 07-16-2011 at 05:07 AM. Reason: Note to grail
 
Old 07-16-2011, 05:51 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
Quote:
@grail: You need the space string between $1 and $2, i.e. $1 " " $2, or you'll merge the two fields into one (without an intervening space).
And the problem with that is? With or without the space the uniqueness is the same.
 
Old 07-16-2011, 06:16 AM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by grail View Post
And the problem with that is? With or without the space the uniqueness is the same.
Are you sure? What about "1 23" and "12 3"? Without the space they both map to "123".
 
Old 07-16-2011, 07:03 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
Fair call
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] how to count the number of llines inside a file and put the output into a variable? auma78 Linux - Newbie 3 02-01-2011 06:21 AM
[SOLVED] Sendmail: block specific sender to specific recipient - How? thekillerbean Linux - Server 6 07-13-2010 04:13 AM
What to do with a partion with bad block count? yumener Linux - Software 10 11-10-2008 02:02 AM
Please interpret the DHCP server log entries I have mentioned inside the message yuvika Linux - Newbie 4 01-16-2008 06:22 AM
Slackware ISO Block count incorrect? wh33t Slackware 5 02-01-2005 07:09 PM


All times are GMT -5. The time now is 01:50 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration