LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-19-2012, 01:58 PM   #1
tweed08
LQ Newbie
 
Registered: Mar 2012
Posts: 3

Rep: Reputation: Disabled
Speeding up a script to count number of repeat characters in each column


Hi!

Long time lurker, first time I haven't been able to easily search for my answer!

I have a text file in this format:

Code:
AAABCDBBCD...D
AAABDDBCCD...A
AAABCDACCD...B
AAABCCDBCD...C
AA--CCCBCD...-
AAA-CC---D...-
Where any character value can only be A,B,C,D or -

For each column (not row), I would like to calculate the highest number of repeat characters (A,B,C,D only).

An output for the above example would be:

Code:
6
6
5
4
5
3
2
3
5
6
...
1
I have written this very clunky script, but am unhappy with the speed.
Could anyone suggest a faster way of doing this?

Code:
# begin loop here from 1 to RowLength

        for (( n=1; n<=$RowLength; n++ ))

		do
                A=0
                B=0
                C=0
                D=0

                INPUT=`cut -c $n $TargetFile` # Cut input to a single character, starting column n

                A=$(echo $INPUT | tr -dc 'A' | wc -c) # count number of A,B,C,D in this column
                B=$(echo $INPUT | tr -dc 'B' | wc -c)
                C=$(echo $INPUT | tr -dc 'C' | wc -c)
                D=$(echo $INPUT | tr -dc 'D' | wc -c)

                ABCD=`echo -e "$A\n$B\n$C\n$D" | sort -n | tail -1`

		echo $ABCD
		done
Many thanks for any help!
 
Old 03-20-2012, 04:29 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
How about:
Code:
#!/usr/bin/awk -f

BEGIN{	FS = ""
	split("ABCD",letters)
}

{   for( i=1; i<=NF; i++ )
	count[i,$i]++
}

END{
    for( x=1; x<=NF; x++ )
    {
	out = 0
	for( y=1; y<=4; y++ )
	    if( count[x,letters[y]] > out )
		out = count[x,letters[y]]
	print out
    }
}
 
1 members found this post helpful.
Old 03-20-2012, 05:19 PM   #3
tweed08
LQ Newbie
 
Registered: Mar 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thank you very much - that's much, much faster!

I get most of the code, but I don't understand this part - any chance of an explanation?

Code:
count[i,$i]++
Code:
if( count[x,letters[y]] > out )
Thanks again!
 
Old 03-21-2012, 02:30 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
count[i,$i]++ - Arrays in awk are associative by default, so this would equal in the first line for the 'A', count[1,"A"]++. The plus plus increases the value associated with this index by 1

if( count[x,letters[y]] > out ) - as per explanation above, this now asks us to retrieve what value this array index point to and compare with the value of 'out'. The 'letters' array is:
Code:
letters[1] = "A"
letters[2] = "B"
letters[3] = "C"
letters[4] = "D"
So again it is a check against 'out' which always starts at 0, so first iteration will be:
Code:
x=1
y=1
count[1, letters[1]] > 0

# which from above would be:

count[1, "A"] > 0
Finally, here is a good resource for awk that I use whenever stuck: http://www.gnu.org/software/gawk/man...ode/index.html
 
1 members found this post helpful.
Old 03-21-2012, 10:15 AM   #5
tweed08
LQ Newbie
 
Registered: Mar 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Great - thank you again for help!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Warning: [fnn_insert] Column count doesn't match value count at row 1 in bondoq Programming 2 09-27-2011 05:11 PM
bash script to count number of lines with a specific property7 hhamid Programming 10 08-13-2010 02:35 AM
[SOLVED] count number of CD-ROMs on a system using shell script kushalkoolwal Programming 2 04-07-2010 09:48 PM
DBD::mysql::st execute failed: Column count doesn't match value count at row 1 shifter Programming 2 02-24-2010 08:42 PM
How to count number of argument received by a script? philipina Linux - General 2 07-05-2004 03:35 AM


All times are GMT -5. The time now is 04:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration