uniq values in unsorted file
Hi, I am trying to count uniq values in a file but having trouble counting them because of multiple unconsecutive occurances of that value .
For example for this file Code:
word Code:
2 word Code:
4 word Thanks Upendra |
Quote:
Code:
cat file.txt | sort | uniq -c |
Why not sort the list; you don't have to save it (the sorted list).
Something like Code:
sort file|uniq |
Quote:
Thanks Upendra |
Quote:
but what i want is this Code:
4 word Which is what the code I gave (using sort) would give you. Just to be very clear, the both code examples given do NOT modify the file - it remains unsorted. |
Me too; the code I supplied (no need for cat) sorts on the fly... it does NOT change the original file or save the output to disk.
|
Quote:
Your command actually sorts the list like this Code:
cat linux_test.txt | sort | uniq -c Code:
Thanks Upendra |
Quote:
And using gawk would be helpful. Probably works with other awk as well. Code:
gawk -- '{ !a[$0]++ && ++c; } END { print c; } file.txt |
[EDIT]You typed faster than I did... I defer to your post above[/EDIT]
So, what you really want is for the total to appear at first occurance order. |
Quote:
Code:
gawk -- '{ ++a[$0]; } END { for (i in a) { print a[i] " " i;} }' file.txt |
Seems like Gawk sorts the keys so we have to place them in another array:
Code:
gawk -- '{ if (!a[$0]++) b[c++] = $0; } END { for (i = 0; i < c; ++i) { k = b[i]; print a[k] " " k;} }' file.txt Code:
#!/usr/bin/env gawk -f Code:
gawk -f script.awk -- file.txt |
Quote:
|
Here's a ruby variation on the theme:
Code:
ruby -ne 'BEGIN{a=Hash.new(0)}; a[$_]+=1; END{ a.each{|k,v| puts "#{v} #{k}" } }' file |
Quote:
Code:
ruby -e 'a = Hash.new(0); b = Array.new; c = 0; while gets(); k = $_.chomp; a[k] += 1; if a[k] == 1; b[c] = k; c += 1; end; end; b.each {|k| puts "#{a[k]} #{k}"}' |
hmmm ... I am running 2.0, so are you saying that in 1.8 when using my script it sorts the data so 'other' appears first?
ahhh ... just looked this up: 1.8 Code:
The order in which you traverse a hash by either key or value may seem arbitrary, and will generally not be in the insertion order. Code:
Hashes enumerate their values in the order that the corresponding keys were inserted. |
All times are GMT -5. The time now is 08:33 PM. |