LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help searching for values in file then adding to line (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-searching-for-values-in-file-then-adding-to-line-4175440439/)

ShiGua 12-07-2012 09:05 AM

Need help searching for values in file then adding to line
 
Hello!

I'm currently trying to organize data for some bio research, but I'm not sure how to compare a value to values in a file. So what I have are 2 arrays, one array contains NM numbers and can be referenced as NM[#]. The other array has symbols, SYM[#]. I have a file for which it contains an NM number every other line and between each NM number, irrelevant information (but I need it in there still). What I need to do is match every NM[#] in my array to the NM number in the file, but also add :Sym[#] to the end of that line. The problem is, before each NM number in the file, there is a > symbol in front of the line (which needs to stay there). So for example I have an array NM that looks like:

{NM_23948375 NM_03948274 NM_39482746 NM_20475839} #except there are about 2 thousand values

and SYM:

{fj48g9sk 2idjf8a0s ajsie9rt skdjie8t} #same amount of values as NM

and the file looks like:

>NM_########
AUGCGCUAGCUGAUGCUGAGCACGAUCGAUCGAAA
>NM_########
AUGUCGUAGCUAGCGUAGCUGUAUCGUGAC

I need to take the first NM number in my NM array and compare it to every other line in the file without the > in front. Then, when that line in the file is found, I need to add :SYM, where SYM is the same order as the NM number from the array. So take the first NM number, find the line, add the first symbol. Then the second NM number, match it, add second symbol, and so on, for a final product that looks like:

>NM_########:SYM
AUGCAGUCGAUCGAUGCUAGUCUACAGCUAUCGGAAA
>NM_########:SYM
AUGCCGUAGCUAGCUACGUACGUGUAGCUGAC

I feel like the process should be relatively simple, I'm just completely new at this and was looking for any help. I'm not really even sure how to start.

Here's what I have (forgive all syntax errors, everything I want to do is in there, I just need help translating it to code, file to be edited is called file.fa, I can also take it as an argument and refer to it as $1 if that's easier):

Code:

#!/bin/bash

for ((i=0; i<$(wc -l file.fa)/2; i++))
  for ((j=0; j<$(wc -l file.fa)/2; j++))
    if ($NM[i] = $fileline[2*j+1)]) #without the >
      sed '(2*(j+1)s/.*/>$NM[i]:$SYM[i]/
    fi
  done
done

I also have access to perl if that makes things easier. Also, if this is all possible by just using the command line, that'd be simpler for me.

Sorry for the long post and any help is appreciated!

unSpawn 12-07-2012 10:44 AM

Couldn't you just increment both array elements, grep the NM number line +1 and delete array member 0?
Code:

for ((n=0; n<${#NM[@]}; n++)); do
 SEQ=($(grep -m1  -A1 "^>${NM[$n]}" file.fa)); unset SEQ[0]
 echo -en ">${NM[$n]}:${SYM[$n]}\n${SEQ[*]}\n"
done



All times are GMT -5. The time now is 05:37 PM.