Speeding up a script to count number of repeat characters in each column
Hi!
Long time lurker, first time I haven't been able to easily search for my answer! I have a text file in this format: Code:
AAABCDBBCD...D For each column (not row), I would like to calculate the highest number of repeat characters (A,B,C,D only). An output for the above example would be: Code:
6 Could anyone suggest a faster way of doing this? Code:
# begin loop here from 1 to RowLength |
How about:
Code:
#!/usr/bin/awk -f |
Thank you very much - that's much, much faster!
I get most of the code, but I don't understand this part - any chance of an explanation? Code:
count[i,$i]++ Code:
if( count[x,letters[y]] > out ) |
count[i,$i]++ - Arrays in awk are associative by default, so this would equal in the first line for the 'A', count[1,"A"]++. The plus plus increases the value associated with this index by 1
if( count[x,letters[y]] > out ) - as per explanation above, this now asks us to retrieve what value this array index point to and compare with the value of 'out'. The 'letters' array is: Code:
letters[1] = "A" Code:
x=1 |
Great - thank you again for help!
|
All times are GMT -5. The time now is 02:48 AM. |