LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-08-2013, 03:16 PM   #1
atjurhs
Member
 
Registered: Aug 2012
Posts: 183

Rep: Reputation: Disabled
counting number of occurances in a column


Hi guys,

i've got one kinda sorta worked, not all the way, so i need a little help.... i need to count the number of occurances of each value in a specific column. so if i have a file called test.dat
Code:
100 t 200 p 300 400
101 b 201 s 300 401
102 o 202 s 302 402
103 k 203 a 302 403
104 t 204 p 300 404
105 m 205 r 305 405
and i specify column 5 i should get

Code:
300 3
302 2
305 1

so far i have this script and it works but i'd have to run it for each of the values in column 5 and there's bunches of them

Code:
#!/usr/bin/awk -f

awk -F " " '$5==300 {if(FILENAME != last && last !="")
                       {
                        count=0
                       }
                     count++
                     last = FILENAME
                    }
END {print "300", count}' test.dat > out.dat
so in psuedo code i want to somehow replace the specific value in column 5 with all values

Code:
$5==300 with $5==*  but i know this doesn't work
and then pass to the print statement each count as * increments down the file

and again for out.dat i want to get

Code:
300 3
302 2
305 1
thanks for whatever help you can give, Tabitha

Last edited by atjurhs; 01-08-2013 at 03:17 PM.
 
Old 01-08-2013, 03:35 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Hi Tabitha. I'd do something simpler as
Code:
awk '{_[$5]++} END{for (i in _) print i,_[i]}' test.dat
 
Old 01-08-2013, 04:37 PM   #3
atjurhs
Member
 
Registered: Aug 2012
Posts: 183

Original Poster
Rep: Reputation: Disabled
no fair colucix here i was using stuff i learned, i'll still use your script and i'll try to figure it out too, i haven't seen the underscore used that way before

i did add onto your script with a sort command just to help see the output

Code:
awk '{_[$5]++} END{for (i in _) print i,_[i]}' test.dat | sort -k2nr > out.dat
thanks sooooooo much,

Tabitha
 
Old 01-08-2013, 04:43 PM   #4
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 981
Blog Entries: 2

Rep: Reputation: 235Reputation: 235Reputation: 235
If you happened to be doing it outside awk (because there is only one column)
Code:
sort test.dat | uniq -c | sort -n
 
Old 01-08-2013, 04:54 PM   #5
atjurhs
Member
 
Registered: Aug 2012
Posts: 183

Original Poster
Rep: Reputation: Disabled
actually there are 59 columns and thousands of lines, but thanks too
 
Old 01-09-2013, 03:08 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
FWIW, the underscore is simply a valid variable (or in this case an array) name. You can replace it with something else if you want.

This is the usual technique for counting matches in this kind of code. Since awk arrays are associative, that is their indexes are text strings rather than numbers, you simply use the value of the given field as the index. Then every time it hits the same value in the input, it increments its array value by one. When you get to the end of the file you just print out the array to get the totals.

This site has a whole series explaining in detail how common code sniplets like this work:

http://www.catonmat.net/blog/awk-one...ined-part-one/
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
counting the output packet number zhoufanking Programming 4 07-06-2008 02:14 AM
Counting number of system reboots rbh123 Linux - Newbie 2 11-22-2007 04:28 AM
Counting occurances in file stefaandk Linux - General 1 08-13-2005 09:55 AM
Emacs column counting starts at 0 instead of 1 Efo Linux - Software 0 07-28-2005 02:41 PM
counting number of files akin81 Linux - Newbie 6 03-25-2004 02:53 PM


All times are GMT -5. The time now is 06:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration