LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-08-2013, 05:31 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 103

Rep: Reputation: 15
Shell script to delete amino acid sequences that don't overlap with others


Hi all,

I'm working with alignments of amino acid sequences. My goal is to make a phylogenetic tree for each alignment showing the relationships among the sequences. Here's the problem: some of the sequences are incomplete and the software can't make a tree if two or more sequences don't overlap. I'd like to write a script that looks at each alignment and if two or more sequences don't overlap by at least X number (maybe 20?) A-Z characters, it deletes the shortest sequence in the alignment and re-runs the check.

Here's an example:
Code:
>BARE@m.16041
SSESEGEAGLRQKLDWEWTERLDVTPDNDFAREKMFYDLTMACSKEALARLSSLNYPVTRPDDYFAQMAKSDEHMKKVRQKLLSKQSSIEKSEQAKKIREMRKYGKKVQHEVLQKRQKEKKAV
>PARA@m.359
DSDEELREGLLSCLTLPWVERLDIVSDDDFKREMFIYRQAQAAVLQAIPKLRKVGVATRRPEDYLAQMAKSDQHMVKVRARLLSKQKETENREKAKKLREMRKRGKKIQQEVLAQRQKEKKEN
>PFUC@aug1.0_108.1_50634.t1
LSKRNLQAGMKQKLNLEWIERLDLTNANDFTRELRFYRQAQASVLEGIPRLQALDIQTKRPEDYFAEMAKSDDHMKRVREKLLEKQMSIDRREKVRKQRDLRKYGKKVQQEVLLKRQKEKKER
>LCAL@m.22729
NSDDELQEKMKEKLNLEWIERLDLTNENDFKREMRFYRQAQSSVIQGIERLHKLKILTKRPDDYFAQMAKSDEHMNKVREKLVAKQSSMEKSEKAKKLRELRKFGKKVQQDVLKKRQKDKKE-
>CTOR@m.12274
-------AGLQQCLDLPWVERLDVSSTDDFKREMKFYRQAQGAVLESLPRLQALRIKTSRPSDYLAEMAKSDQHMKKVREKLISKKIAIERSEKAKKLRELRK--------------------
>BARE@m.16042
-----------------------------LRPGKMFYDLTMACSKEALARLSSLNYPVTRPDDYFAQMAKSDEHMKKVRQKLLSKQSSIEKSEQAKKIREMRKYGKKVQHEVLQKRQKEKKAR
>LHYA@m.308
---------MKAKLGLDWVERLDLTSINDFKREMKFYRQAQSSVLEGISKLHALGIPTKRPEDYFAQMAKSDDHMKKIREKLLAKQIGMERSEKAKKFRELRKYGKKVQHEVIQ---------
>MGRO@m.26921
DSDFELQTSMKQCLHLPWVERLDQISEDDFKRELTFYRQAQGTVLKALYELDKLKILTRRPEDYFAQMAKTDIHMKKVREKLMGKKLSMERSEKAKKLRELRKYGKKVQQEVLMNRQKQKKER
>LRUG@m.27419
DSDKELQTGLKQKTTQNWIERLDMTNDNDFQRELKFYRQAQATVLEAIPKLHKLGVLTKRPEDYFAQMAKSDDHMKKVREKLLSKQTAIDRSEKAKKMRELRKYGKKVQRDVLQKRQKEKKDR
>LPEC@m.2341
-------RGLLQKLGLDWVERLDITCPDDFVREKMFYDLAQAAATEGLSRLTELRYPTKRPDDYFAQMAKSDEHMRKVRSKLLSKEAVAERSEQAKKLREMRKFGKQVQQD------------
>TPO2@m.8293
-----------------------------------------------------------------------------VREKLLSKEQALERSEKAKKLRNLRKYGKKVQTEVIQKRHKEKREK
>OVUL@m.15913
-----LQDDMENILNLNWIERLDYTSMNDFQRELCFYKQAQSAVLSLMPKLRTNGIPTKRPDDYFAEMAKSDLHM------------------------------------------------
Dashes represent missing data. In this example the sequences TPO2@m.8293 and OVUL@m.15913 don't overlap. I'd like the script to delete the TPO2@m.8293 sequence and then re-run the check keeping all other sequences because they all overlap with each other by 20+ non gap characters.

Does anyone have any suggestions as to how to go about doing this? It seems like awk is the tool for the job, at least in part, but I'm having a hard time getting started. Any help would be greatly appreciated.

Thanks!
Kevin
 
Old 05-09-2013, 12:40 AM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,311

Rep: Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040
It seems to me you want the String Fns here http://www.grymoire.com/Unix/Awk.html#uh-40 aka 'AWK Table 10'.
Something like gsub to remove all leading '-', then compare len() of what's left of each. If len() of both are >= 20, then they 'overlap' by at least 20 ?
 
Old 05-09-2013, 04:16 AM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
I think chrism01's suggestion is good, however, is it likely that you could have lines with missing data not at the ends?
Code:
>LRUG@m.27419
DSDKELQTGLKQKTTQNWIERLDMTNDNDFQRELKFYRQAQATVL-----LHKLGVLTKRPEDYFAQMAKSDDHMKKVREKLLSKQTAI----------ELRKYGKKVQRDVLQKRQKEKKDR
Is the above possible? This would make the overlapping harder to resolve.

I also assume that overlapping means that characters overlap between lines but are not required to be the same characters
 
Old 05-09-2013, 11:41 AM   #4
Dsw0002
LQ Newbie
 
Registered: Apr 2011
Posts: 4

Rep: Reputation: 0
I'm not sure how efficient this would be, but storing the beginning position and end position of amino acids in each sequence, then comparing those for each line could get you what you want. This would get tricky for sequences like grail mentioned.
 
Old 05-09-2013, 01:07 PM   #5
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
I think the algorithm needed is more complicated. I'm thinking a character by character comparison of every sequence to every other other sequence, counting the matches, sorting them by matches and then removing the one with the least number of matches ... ?
 
Old 05-09-2013, 02:09 PM   #6
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,139

Rep: Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638
seems more practical in c:
Code:
for(i=0; i < strlen(string1); i++)
{
 if(string1[i] != '-')
 {
  if(string1[i] == string2[i])
   match++;
 }
}
 
Old 05-10-2013, 05:23 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
hmmm ... I am not sure that logic would work schneidz as I do not believe the characters that are not dashes are required to match, just that at the point in the string where you have a character
you also have a character in the other string and this continues consecutively for the minimum (20 characters here).

OP ... Is this a correct analogy?
 
Old 05-10-2013, 12:13 PM   #8
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 103

Original Poster
Rep: Reputation: 15
grail, your are correct. Dsw and texmex are thinking along the same lines I was.

To "overlap" by my definition, there have to be A-Z characters in place but they don't have to be the same A-Z characters (the differences / what's in common among sequences is important for the phylogenetic tree reconstruction).

Also, it is possible that there would be gaps in the middle of the sequence here and there.

I'm still thinking about this and trying to source some help from a local friend.

Thanks,
Kevin
 
Old 05-11-2013, 04:53 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
ok ... its not pretty, but this seems to work with the current test set:
Code:
#!/usr/bin/awk -f

{
    new = $0
    getline current
}

NR > 2{

    j = split(previous,a,"")
    split(current,b,"")

    for(i=1;i<=j;i++)
        if(a[i] != "-" && b[i] != "-")
            if(a[i] == "-" || b[i] == "-")
                cnt=0
            else
                cnt++

    if(cnt > 20)
        print old RT previous

    cnt = 0
}

{
    old = new
    previous = current
}

END{    print old RT previous }
 
Old 05-11-2013, 05:49 AM   #10
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Here's my attempt in C, just as an exercise:

PHP Code:
#include <stdio.h>
#include <stdlib.h>

#define MAXID 255
#define MAXSEQ 1000

struct sequence
{
    
char id [MAXID];
    
char seq [MAXSEQ];
    
unsigned int length;
};

int main (int argcchar argv[])
{
    
// parse argc
    
if (!= argc)
    {
        
printf("Usage:\n%s data.txt min_overlaps\nExample:\n%s data.txt 20\n"argv[0], argv[0]);
        return 
1;
    }

    
// ! init
    
int j;
    
char string [MAXSEQ];

    
// init
    
unsigned int i 0;
    
unsigned int k 0;
    
unsigned int numrec 0;
    
unsigned int overlap 0;
    
int state 0;

    
// open file
    
FILE infile fopen (argv[1], "r");
    if (
NULL == infile)
    {
        
fprintf (stderr"ERROR: cannot open %s\n"argv[1]);
        return 
1;
    }

    
// get number of records
    
while (! feof (infile))
    {
        
fscanf (infile"%s"string);
        if (
'>' == string[0])
        {
            
numrec++;
        }
    }
    
rewind (infile);
    
struct sequence seqarray [numrec];

    
// parse
    
while (! feof (infile))
    {
        if (
== state)
        {
            
// looking for id
            
fscanf (infile"%s"seqarray[i].id);
            if (
'>' == seqarray[i].id[0])
            {
                
state 1;
            }
        }
        else if (
== state)
        {
            
// looking for seq
            
fscanf (infile"%s"seqarray[i].seq);
            if (
!= seqarray[i].seq[0])
            {
                
state 0;
                
// get length of current sequence
                
seqarray[i].length 0;
                while (
!= seqarray[i].seq[k])
                {
                    if (
'-' != seqarray[i].seq[k])
                    {
                        
seqarray[i].length++;
                    }
                    
k++;
                }
                
0;
                
// check current sequence to all those before
                
for (i-1>= 0j--)
                {
                    while ((
!= seqarray[i].seq[k]) && (!= seqarray[j].seq[k]))
                    {
                        if ((
seqarray[i].seq[k] == seqarray[j].seq[k]) && ('-' != seqarray[i].seq[k]))
                        {
                            
overlap++;
                        }
                        
k++;
                    }
                    
0;
                    
// delete sequences with overlaps below minimum
                    
if (overlap atoi (argv[2]))
                    {
                        if (
seqarray[i].length seqarray[j].length)
                        {
                            
/*printf ("delete = %s\n", seqarray[j].id);
                            printf ("keep = %s\n", seqarray[i].id);*/
                            
seqarray[j].id[0] = 0;
                        }
                        else if (
seqarray[i].length seqarray[j].length)
                        {
                            
/*printf ("delete = %s\n", seqarray[i].id);
                            printf ("keep = %s\n", seqarray[j].id);*/
                            
seqarray[i].id[0] = 0;
                        }
                    }
                    
overlap 0;
                }
                
i++;
            }
        }
    }

    
// output
    
for (0numreck++)
    {
        if (
!= seqarray[k].id[0])
        {
            
//printf ("%d\n%d\n", k, seqarray[k].length);
            
printf ("%s\n%s\n"seqarray[k].idseqarray[k].seq);
        }
    }

    
fclose (infile);

    return 
0;

The output of './parse data.txt 20' is:
Code:
>BARE@m.16041
SSESEGEAGLRQKLDWEWTERLDVTPDNDFAREKMFYDLTMACSKEALARLSSLNYPVTRPDDYFAQMAKSDEHMKKVRQKLLSKQSSIEKSEQAKKIREMRKYGKKVQHEVLQKRQKEKKAV
>PARA@m.359
DSDEELREGLLSCLTLPWVERLDIVSDDDFKREMFIYRQAQAAVLQAIPKLRKVGVATRRPEDYLAQMAKSDQHMVKVRARLLSKQKETENREKAKKLREMRKRGKKIQQEVLAQRQKEKKEN
>PFUC@aug1.0_108.1_50634.t1
LSKRNLQAGMKQKLNLEWIERLDLTNANDFTRELRFYRQAQASVLEGIPRLQALDIQTKRPEDYFAEMAKSDDHMKRVREKLLEKQMSIDRREKVRKQRDLRKYGKKVQQEVLLKRQKEKKER
>LCAL@m.22729
NSDDELQEKMKEKLNLEWIERLDLTNENDFKREMRFYRQAQSSVIQGIERLHKLKILTKRPDDYFAQMAKSDEHMNKVREKLVAKQSSMEKSEKAKKLRELRKFGKKVQQDVLKKRQKDKKE-
>CTOR@m.12274
-------AGLQQCLDLPWVERLDVSSTDDFKREMKFYRQAQGAVLESLPRLQALRIKTSRPSDYLAEMAKSDQHMKKVREKLISKKIAIERSEKAKKLRELRK--------------------
>BARE@m.16042
-----------------------------LRPGKMFYDLTMACSKEALARLSSLNYPVTRPDDYFAQMAKSDEHMKKVRQKLLSKQSSIEKSEQAKKIREMRKYGKKVQHEVLQKRQKEKKAR
>LHYA@m.308
---------MKAKLGLDWVERLDLTSINDFKREMKFYRQAQSSVLEGISKLHALGIPTKRPEDYFAQMAKSDDHMKKIREKLLAKQIGMERSEKAKKFRELRKYGKKVQHEVIQ---------
>MGRO@m.26921
DSDFELQTSMKQCLHLPWVERLDQISEDDFKRELTFYRQAQGTVLKALYELDKLKILTRRPEDYFAQMAKTDIHMKKVREKLMGKKLSMERSEKAKKLRELRKYGKKVQQEVLMNRQKQKKER
>LRUG@m.27419
DSDKELQTGLKQKTTQNWIERLDMTNDNDFQRELKFYRQAQATVLEAIPKLHKLGVLTKRPEDYFAQMAKSDDHMKKVREKLLSKQTAIDRSEKAKKMRELRKYGKKVQRDVLQKRQKEKKDR
>LPEC@m.2341
-------RGLLQKLGLDWVERLDITCPDDFVREKMFYDLAQAAATEGLSRLTELRYPTKRPDDYFAQMAKSDEHMRKVRSKLLSKEAVAERSEQAKKLREMRKFGKQVQQD------------
However, note that it deletes
>TPO2@m.8293 because of OVUL@m.15913
>OVUL@m.15913 because of BARE@m.16042

Hope that is correct. There may be bugs in the code.
 
Old 05-11-2013, 06:08 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
@ H_TeXMeX_H - can you explain the second exclusion? My understanding is that a line is only deleted if it has no overlap with the previous line.

@ OP - just thought of something else, which my code does not allow for, if there are consecutive non-matching lines to the last valid entry, would each line in turn be deleted?
 
Old 05-11-2013, 06:31 AM   #12
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Quote:
Originally Posted by grail View Post
@ H_TeXMeX_H - can you explain the second exclusion? My understanding is that a line is only deleted if it has no overlap with the previous line.
My take on it is that each line is compared to every other.

The OP should clarify.
 
Old 05-11-2013, 04:52 PM   #13
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,139

Rep: Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638
still not understanding. am i rediculously underestimating the scope of this problem again ?:
Code:
for(i=0; i < strlen(string1); i++)
{
 if((string1[i] != '-') && (string2[i] != '-'))
   match++;
}
 
Old 05-13-2013, 03:16 PM   #14
Dsw0002
LQ Newbie
 
Registered: Apr 2011
Posts: 4

Rep: Reputation: 0
Here's my shot at it with Java. Deletes all lines that do not overlap with every other line by at least 20 amino acids, starting with lines with the most non-overlapping lines. If multiple lines have the same number of non-overlapping lines, deletes the line with the fewest amino acids. Not pretty or efficient, but it gets the job done.

Code:
import java.io.*;
import java.util.ArrayList;

public class AlignmentCompare {

	public static void main(String[] args) throws IOException {

		boolean notFinished = true;
		String line1;
		String line2;
		String writeLine;
		int overlaps;
		int badLine;
		int i;
		int j;
		
		//Dr. Steve Brule has an opinion on not providing input files.
		if (args[0] == null) {
			System.out.print("Ever wonder why ice cubes taste so boring? " +
					"It’s cuz you make ‘em outta stupid water, you bimbo! " +
					"Put some fruit juice in there and freeze it into ice " +
					"cubes, and put THAT in your milk. " +
					"Also, gimme a file to run this on, ya dummy!)");
			System.exit(0);
		}
		//Runs until each line overlaps with every other by at least 20.
		while (notFinished) {
			File alignmentFilename = new File(args[0]);
			File tempFilename = new File("myTempFile.txt");
			
			//Reader for iterating through lines to be compared.
			BufferedReader AlignmentReader1 = new BufferedReader(
					new FileReader(alignmentFilename));
			//Reader for going through remaining lines to compare to Reader1 lines.
			BufferedReader AlignmentReader2 = null;

			BufferedWriter AlignmentWriter = new BufferedWriter(new FileWriter(
					tempFilename));

			line1 = null;
			line2 = null;
			writeLine = null;
			overlaps = 0;
			badLine = 0;
			i = 0;
			j = 0;
			
			//ArrayList of ArrayList that holds the line numbers that don't overlaps
			// for each line.
			ArrayList<ArrayList<Integer>> noOverlaps = new ArrayList<ArrayList<Integer>>(
					0);
			//Stores the number of amino acids in each line.
			ArrayList<Integer> lineLengths = new ArrayList<Integer>(0);
			int lineToDelete;

			//While loop for comparisons of 1 line to each other line.
			while ((line1 = AlignmentReader1.readLine()) != null) {
				//ArrayList to keep track of the line numbers that don't overlap with
				// with the line currently being looked at.
				ArrayList<Integer> currentNoOverlaps = new ArrayList<Integer>(0);
				AlignmentReader2 = new BufferedReader(new FileReader(
						alignmentFilename));
				//If the current line is a header, skip it.
				if (line1.charAt(0) == '>') {
					line1 = AlignmentReader1.readLine();
					i++;
				}
				//Line 1 shouldn't be compared to line 1. Move down to the line after
				// the line that Reader1 just read.
				j = i;
				for (int inc = 0; inc < i; inc++) {
					@SuppressWarnings("unused")
					String temp = AlignmentReader2.readLine();

				}
				//Here we use Reader2 to start reading in all lines after Reader1's
				// current line for comparison.
				while ((line2 = AlignmentReader2.readLine()) != null) {
					//Skip headers
					if (line2.charAt(0) == '>') {
						line2 = AlignmentReader2.readLine();
						j++;
					}
					overlaps = 0;
					//Counting overlaps.
					for (int k = 0; k < line1.length() && k < line2.length(); k++) {
						//If both chars at position k aren't gaps, increment.
						if (checkForGaps(line1, k) == -1
								&& checkForGaps(line2, k) == -1) {
							overlaps++;
						}
					}
					//If we don't have more than 20 overlaps, add this line to our list.
					if (overlaps <= 20) {
						currentNoOverlaps.add(j);
					}
					j++;
				}
				//Reader2 has iterated through the whole file. Close reader2.
				AlignmentReader2.close();
				//Get the length of this line in amino acids.
				lineLengths.add(findLength(line1, i));
				//Add this line's results to the ArrayList of ArrayLists.
				noOverlaps.add(currentNoOverlaps);
				i++;
			}
			//Reader1 has iterated through the whole file. Close reader1.
			AlignmentReader1.close();
			
			//To determine which line is the worst and should be deleted, we
			// count the number of lines that don't overlap with it.
			int[] overlapCounts;
			overlapCounts = new int[noOverlaps.size()];

			for (int l = 0; l < noOverlaps.size(); l++) {
				for (int m = 0; m < noOverlaps.get(l).size(); m++) {
					badLine = noOverlaps.get(l).get(m);
					if (badLine != 0) {
						badLine = (badLine + 1) / 2;
						//Since comparisons don't go backwards(1 was recorded as not
						// overlapping with 2, but not vice versa) we need to account
						// for this.
						overlapCounts[badLine - 1]++;
						overlapCounts[l]++;
					}
				}
			}
			
			//Use findLargestAndSmallest to determine which line has the most lines that
			// do not overlaps with it.
			lineToDelete = findLargestAndSmallest(overlapCounts, lineLengths);
			
			//Stop when all lines overlap.
			if (lineToDelete == 0) {
				notFinished = false;
				break;
			}
			//Start readering in file for writing.
			AlignmentReader1 = new BufferedReader(new FileReader(
					alignmentFilename));

				// Find the lineToDelete and don't write it.
				i = 1;
				while ((writeLine = AlignmentReader1.readLine()) != null) {
					//Doubling lineToDelete to account for headers.
					if ((i != (lineToDelete * 2))
							&& ((i + 1) != (lineToDelete * 2))) {
						AlignmentWriter.write(writeLine);
						AlignmentWriter.newLine();
					}
					i++;
				}
			
			AlignmentReader1.close();
			AlignmentWriter.close();
			tempFilename.renameTo(alignmentFilename);
		}
	}

	//Method to check if the current char is a gap.
	public static int checkForGaps(String line, int inc) {
		String badChars = "-X?";
		int check = badChars.indexOf(line.charAt(inc));
		return check;
	}
	
	//Determines which line has the most non-overlapping lines and returns it.
	public static int findLargestAndSmallest(int[] integers,
			ArrayList<Integer> lengths) {
		boolean conflict = false;
		int largestNonOverlaps = 0;
		int smallestLength = 0;
		int position = 0;
		//If two lines have the same number of non-overlapping lines, this ArrayList
		// holds their positions.
		ArrayList<Integer> choices = new ArrayList<Integer>(1);
		
		//Checking which has the most non-overlaps.
		for (int i = 0; i < integers.length; i++) {
			if (integers[i] > largestNonOverlaps) {
				conflict = false;
				choices.clear();
				choices.add(i + 1);
				largestNonOverlaps = integers[i];
				//Save which line has the most non-overlaps.
				position = i + 1;
				//If equal to the current line with most non-overlaps,
				// add to choices without clearing it.
			} else if (integers[i] == largestNonOverlaps) {
				conflict = true;
				choices.add(i + 1);
			}
		}
		//If there are no overlaps, we're done.
		if (largestNonOverlaps == 0) 
			return 0;
		//If two lines have the same amount of non-overlapping lines,
		// find the one with the smallest number of amino acids.
		if (conflict) {
			smallestLength = lengths.get(choices.get(0));
			position = choices.get(0);
			for (int j = 1; j < choices.size(); j++) {
				if (lengths.get((choices.get(j)-1)) < smallestLength) {
					smallestLength = lengths.get((choices.get(j)-1));
					position = choices.get(j);
				}
			}
		}
		//Returns the line number of line with the most non-overlaps/smallest length
		// of amino acids.
		return position;
	}
	
	//Finds the length in amino acids of a line.
	public static int findLength(String line, int position) {
		int length = 0;
		for (int i = 0; i < line.length(); i++) {
			if (checkForGaps(line, i) == -1)
				length++;
		}
		return length;
	}
}

Last edited by Dsw0002; 05-13-2013 at 05:34 PM.
 
Old 05-14-2013, 08:49 PM   #15
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 103

Original Poster
Rep: Reputation: 15
Wow! That's amazing. I compiled it with javac and it does exactly what I wanted. Thanks!!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script to delete some lines... cjagdish69 Linux - Newbie 2 09-12-2012 12:10 PM
Shell Scripting escape sequences cmillosh Linux - General 3 10-05-2011 10:29 AM
script to delete files base don size jonabyte Linux - General 8 10-24-2008 11:21 AM
delete mail via shell script kevin_obtiva Programming 1 12-10-2005 10:24 AM
listing sequences of files to the shell jimmyj90 Linux - Newbie 4 10-20-2004 06:18 PM


All times are GMT -5. The time now is 12:40 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration