awk distinct count
So, I've learned a great deal from the responses to my previous posting, particularly about arrays in awk (mighty thanks to all who responded). However I'm now stuck on trying to develop what I think is solved with a nested loop, although I'm sure it's probably a much simpler solution.
Let's say I have a file like the following: Joe Wolfhound Joe Wolfhound Joe Beagle Mary Pug Mary Dalmation Joe Chihuahua Mary Boxer Jane Husky Jane Husky Joe Bulldog Jane Ridgeback Mary Malamute Mary Boxer Joe Chow Paul Doberman Paul Doberman Paul Bernese How do I find the number of breeds each person owns? I'm able to determine the number of unique breeds: { breed[$2] += 1 } END { for (i in breed) print i, breed[i] } as well as the number of dogs per owner, but determining the number of distinct breeds per owner is giving me problems Thanks all in advance! |
Hi,
this worked for me Code:
awk ' |
Code:
awk '{a=$1;$1=""}(!(breeds[a]~$0 )){ Code:
from collections import defaultdict Code:
$ ./python.py |
I have been playing with the concept of embedding LISP style commands in bash. LISP is an early Artificial Intelligent programming language. The concept is to write scripts that write their own scripts. So lets start by declaring the variables so they are defined as integers with the value of zero. Your list is in a file called "list"
Code:
cat list |awk '{print $1 $2 "='0'" }' Code:
for i in $(cat list) ; do COMMAND[1]=$( cat list |awk '{print $1 $2 "='0'" }' ; echo " " ) ; done Code:
cat list |awk '{print $1 $2 "=$(($"$1 $2 "+1))" }' Code:
for i in $(cat list) ; do COMMAND[2]=$( cat list |awk '{print $1 $2 "=$(($"$1 $2 "+1))" }' ; echo " " ) ; done Code:
cat list |awk '{print $1 $2}' | uniq | awk '{ print "echo "$0 " $"$0}' Code:
for i in $(cat list) ; do COMMAND[3]=$( cat list |awk '{print $1 $2}' | uniq | awk '{ print "echo "$0 " $"$0}' ; echo " " ) ; done Code:
echo "${COMMAND[@]}" Code:
unset COMMAND |
Better hope there aren't any (blank separated) names - say "German Shepherd"
(referencing the awk offerings) |
Wow
You guys are fast... these are greatly helpful!
So I'm looking at the response provided by crts (you rock, BTW) to try and understand what is happening and I'm wrapped around the axle on this: Code:
Thanks again! |
Quote:
this will create an array field which can be accessed by the index of whatever is in $1 and $2. So when the first line Joe Wolfhound is processed then it will create JoeWolfhound as index. You can access the arrays value at this index with owner[JoeWolfhound] Now if this field does not exist it is implicitly initialized to '0'. In this case we increment the value and hence mark this combination as counted. The next time we see Joe and his wolfhound the comparison evaluates to false and it does not get counted. The counting itself happens in the array breed. Hope this clears things up a bit. |
Quote:
|
Quote:
this also copes with "German Shepherds" Code:
awk ' Code:
Joe Wolfhound |
Awesome. That makes total sense to me now. I hadn't seen that trick in any of the on-line awk tutorials yet. Thanks again!
Just for the record (not that it makes a difference): - I'm working mostly with csv files so the space isn't an issue (I just made up the example input file to make the question easier), but it's good to know anyway. - Dobermans are actually a breed of German origin I believe... nice coincidence! :) |
Well here is one that should satisfy even syg00 ;)
Code:
awk '!a[$0]++{b[$1]++}END{for(x in b) print x,"has",b[x],"breeds"}' infile |
All times are GMT -5. The time now is 02:55 PM. |