Quote:
Originally Posted by magische_vogel
Your gawk script is not working well with this input (The original order is not kept):
41.200.103.2
105.97.167.43
105.97.167.43
41.110.184.109
41.110.184.109
105.99.68.250
105.99.68.250
41.110.184.109
41.200.103.220
41.110.184.109
41.110.184.109
41.110.184.109
41.110.184.109
105.99.68.250
41.110.184.109
105.99.68.250
105.101.173.112
|
Ahh yes, it's the old string sort problem
. The array
bb is indexed by the original line number, but array indices are strings and "17" sorts as
less than "9".
If you happen to be running version 4 of
gawk there is a trivial fix to add a third argument "@ind_num_asc" to the asorti() call, but for earlier versions you have to make sure the index is a string that will sort properly.
Code:
{ if($0 in aa) aa[$0] = 0; else aa[$0] = FNR }
END {
for(x in aa) if(aa[x] > 0) bb[sprintf("%15s", aa[x])] = x
n_unique = asorti(bb, cc)
for(n = 1; n <= n_unique; ++n) print bb[cc[n]]
}
Padding that index to a 15-character string makes it work (at least until you have 10^15 lines in the input
).