LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Scripting question - feed an input file into an if statement line-by-line (https://www.linuxquestions.org/questions/linux-newbie-8/scripting-question-feed-an-input-file-into-an-if-statement-line-by-line-777769/)

kmkocot 12-23-2009 07:12 PM

Scripting question - feed an input file into an if statement line-by-line
 
Hi all,

I am trying to write a script that takes an input file ($FileName) and an intermediate file ($FileName.info) and removes lines from $FileName if the value in $2 of $FileName.info is <75. I can't figure out how to feed only one line of the .info file to the if statement at a time so that it will perceive it as an integer instead of a list. The error I am getting now is ./script.sh: line 6: [: : integer expression expected

Sample input $FileName
Code:

>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|IJKL|233557573
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Sample $FileName.info (values are fabricated)
Code:

Contig4550    97.440582
254409037_GR867771        98.499321
233557573    55.192300

Sample desired output $FileName
Code:

>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Script so far:
Code:

for FileName in *.fa
do
infoalign  -sequence $FileName -outfile $FileName.info
percent_difference=`awk -F " " '{$2}' $FileName.info`
if [ "$percent_difference" -gt 75 ] ; then
seq_to_remove=`awk -F " " '{print $1}' $FileName.info`
sed -i "s/^>.............$seq_to_remove//g,+1n" $FileName
fi
done

Any suggestions on how to fix this wold be greatly appreciated.

Thanks!
Kevin

kbp 12-23-2009 07:32 PM

Try removing the double quotes from here:

Code:

if [ "$percent_difference" -gt 75 ] ; then
becomes:

Code:

if [ $percent_difference -gt 75 ] ; then
<edit>Just had a thought... maybe your awk is grabbing something that can't be interpreted as a number, try checking the value of $percent_difference before the 'if' test </edit>

<edit2>Second thought... your values in filename.info appear to be 'real' numbers not integers ... </edit2>

cheers

gregorian 12-23-2009 07:48 PM

You'll need to use the bc utility for floating point comparision.

ghostdog74 12-23-2009 08:59 PM

Code:

#!/bin/bash
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    if [ "$whole" -lt 75 ];then
        arr+="$tag"
    fi
    unset IFS
done <"file.info"
exec 4<"file"
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ "$flag" -eq 1 ] && continue
    echo "==>$line"
    read NEXT <&4
    echo "--->$NEXT"
done
exec 4<&-

output
Code:

$ ./shell.sh
>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ


kmkocot 12-24-2009 03:09 PM

Thanks all!

ghostdog,

The script works perfectly on the sample input data I posted but when I tried it on real data, it didn't work. Any ideas? The organization of the file (including line breaks in the lines not containing greater-than symbols) is the same.

file
Code:

>KOG0022|NVEC|245141_1470034

---------------------------------MADLVNSERVKLQPATSPLNYSVPHVLSYVYRAATFALRSASRDIGMNIRNNTRLXKTSGQNNSINTNSYTPWPDYMFSGKLRPWPQSPKRKIPDGIDKPEYWETGIPEFEMKSKQSTQIQCLSAKEIEKMRETCKLAREVLDIGAKAVKVGATTDEIDRVVHEACIERKCYPSPLNYHGFPKSCCTSINEVICHGIPDKRPLEDGDIVNLDITVFYNGYHGDLNETFFVGNVADEYKQLVKVTYECLMQAIDIVKPGVRYREVGNVIQKHAQAHGYSVVRSYCGHGINQLFHTAPSVPHYAKNKAIGI-MKP-----GHTFTIEPMISQGTWRDETWPDQWTAVTQDGKRSAQFEQTLLVTETGCEILTIRPEENGAPXLPSPDVILFVSVHRQTIIVADXKIAYKVDFTDEAIENAVKSFFQEILESRLFSSMFWISCCPPICGILDAAERCQNGLLEGCPFFSSDIPRKEALMVPQNSVKPRSRKANVSEINKFSS

>KOG0022|MCAL|Contig7391

-------------------------------------MADTTGKTIRCKAAVMREHKKPMIIENIE-VAPPKAGEIRIKI-------MYSSICHSDENYL---GGSRPWIVDSILGHEGAGIVESVGEGVTDFKAGDHVIPSFMGQCNQC---------------RTCKSGKTNVCEVLKGEHYLKGGMLDGT-VRFSCNGNPIYHY-LNTSTFSQ--YTVASEWSCVKIDPAAPLDKACLLGCGIATGY-----------------------------------------------GSAINTAKVEPGSV-CAVW-GLGTIGLAVVMGC-RNAGASRIIGIDTNPAKFELGKKFGMTEGVNPKDF-KEPLQDVLLKM-TNGGLDYAFECIGNVKTMKVAFDSVHRCWGETLLIGVAPITDEFVTNPYSVTMGKQVIGSLYGDYKLK-TIS--NLVTEYMNKKLMVDEFVTHKMSLDKINDGFDLLRSGKSLRTVLDMW---------------------------------

>KOG0022|MCAL|Contig8274

--------------------------------QAYQYNGRHSGKVITCKAAVAWESGKPLSIETIE-VAPPKAKEVRVKV-------LYSGVCHSDLSILN---GVVRGRFPIILGHEGSGIVEGVGEGVTDFQAGDHVIPLYMPQCNAC---------------RSCKSGKTNICEEFLGKTHAFGLMTDGT-PRFTCDGKPVYHF-MACSAFSQ--YVVLPHMSVCKIDNTAPLEKVCLLGCGIATGY-----------------------------------------------GAALNTAKVESGST-CAVW-GLGPIGLSAVMGC-KKAGASRIIGVDINPEKFELGKKFGLTEGINPKDY-DKPIQEVLMGM-TNGGVDYTFECIGNVNAMRAAFDSCHKGWGKTIVLGIAPTAEEFSTNPFSFTLGKHILGSVYGEWKGKDDVP--KLIEGYNKKEILLDEFITHTMALERVNEAFDLMRERKSLRTVINLWPDTTVQKSX------------------------

>KOG0022|MCAL|Contig6495

-------------------------------------MDDTVGKSITCKAAVAWESGKPLSIETIE-VAPPKAKEVRVKV-------LYSGVCHSDLSILN---GSVRGRFPIILGHEGSGIVESVGEGVTDFQAGDHVIPLYMPQCNAC---------------RSCKSGKTNICEEFLGKTHAFGLMTDGT-PRFTCDGKPVYHF-MACSAFSQ--YVVLPHMSVCKIDNTAPLEKVCLLGCGIATGY-----------------------------------------------GAALNTAKVESGST-CAVW-GLGPIGLSAVMGC-KKAGASRIIGVDINPEKFELGKKFGLTEGINPKDY-DKPIQEVLMGK-TNGGVDYTFECIGNVNAMRAAFDSCHKGWGKTIVLGIAPTAEEFSTNPFSFTLGKHILGSIYGEWKGKDDVP--KLIEGYNKKEILLDEFITHTMALERVNEAFDLMREGKSLRTVINLWPDTTVQKSX------------------------

>KOG0022|MCAL|Contig7423

-------------------------------------MSETKGKVIQCKAAVCWEPKKPLTIETVE-VAPPRGGEVRVRI-------AYTGICHSDAHIIN---ACISAKFPVILGHEAAGIVESVGDGVTNFEEGDHVMAMFLPECNQC---------------RCCTSGKTGCCEVFMDKNYANGLLMDGT-SRFSIKGKTVYHF-FDTSTFSQ--YTVVPAISLVKINPAAPMEKVCILSCGIATGY-----------------------------------------------GTAVNTAPVTPGSV-CAVW-GCGCIGLACIMGC-KAAGAARIIGIDINPEKIKNAKKFGITEGVNPLDX----------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|HMED|Contig3238

--------------------------------HLWSVMEGDCGEVIKCRAAVAWSPKAPLKLETIF-VSPPKEGEVRIKV-------LYTGVCHTDAYTLE--GHDPEGVFPVILGHEGAGVVESVGDGVTEFQPGDHVIPLYIPQCRTC---------------KFCMSSKTNLCQKIR-ETQGKGVMPDGT-SRFKCDDKEIFHF-MGCSTFSE--YTVVAAISLCKVDKAADLQKVCLLGCGISTGY-----------------------------------------------GAVLNNAKVEPGST-CGVW-GMGAVGLAAVVGC-KKAGAKIIYAIDINPKKFELAKRLGATDVLNPNDF-DKPIQQVLIEK-TEGGFDYTFECIGNVQTMRAALESCHKGWGTSVVIGVAASGQEISTRPFQLVTGRTWKGSAFGGWKSKDSVP--KLVDEYLDNSLALDEFITHTMDLDDVNTAFDLMLSGESIRSVVTVAAVX------------------------------

>KOG0022|LGIG|228996_50180

------------------------------------MASDTLSKKIRCKAAVLWEVNTPLVIETIE-VEPPRAGEVRIKI-------LATGVCKTDAYLLD--RVDPSKNYPVILGHEGAGIVESVGEGVTNVAPGDHVVPLYYPQCYQC---------------KFCKNPKTNFCSKVR-ATQMKGVMPDGT-SRFRCNGKKLFHF-MGCSTFSE--YTVVADVSVCKVDSTAPSEKVCLLGCSISTGY-----------------------------------------------GAVVNTAQVESGST-CAVW-GLGAVGLAVIMGC-KIAGAKRIIGVDINSDKFKVAEDFGCTEFINPKDY-DKPIQEVIVEK-TDGGCDYTFECIGSVEAMRASLDACHKGWGVSTILGMTPPGAELTAKPYSIVTGCVWKGSVFGGWKSQDSLP--TLVEEYMQNKLKVDEFVSFTLPLAKINEGFDYMRKGVGIKSVVIFD---------------------------------

>KOG0022|CAP1|166096_100143

-------------------------------------MSDTVGKTITCRAAVAWEAKKPLSQETIE-VAPPKAGEVRIKI-------LHTGVCHTDAYTLE--GFDPEGLFPVVLGHEGAGIVESIGEGVTSVAVGDSVVPLYVPQCKEC---------------KFCKSPKTNLCSKIR-ATQGAGMMPDGT-SRFSCKGKQLFHF-MGCSTFSE--YTVVAEISVCKVNPEAAKDKICLLGCGISTGY-----------------------------------------------GAALNTAKVEKGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINKDKFQCAKDFGATECINPKDH-ERPMQQVLVEM-TDGGLDYTFECIGNVATMRAALESCHKGWGESIIIGVAAAGQEISTRPFQLVTGRVWKGTAFGGWKSRDSVP--KLVEDYVSGSLNLDPFVTHHRSLDKINETFDMMHSGESIRSIVDF----------------------------------

>KOG0022|CGIG|164588407_AM869158

-------------------------------------MADTTGKVITCQAAVAWEAKKPLTLETVE-VEPPRAGEVRIKV-------LYTGVCHTDAYLLD--GFDPEGAFPIIMGHEGGGVVESVGEGVTSVQPGDHVIPLYIPQCNEC---------------KFCKSPKTNLCGKIR-ATQGKGVMPDGT-SRFKCKGKTLLHF-MGCSTFSQ--YTVVTEISVTKVNPAAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGST-CAIX------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|CGIG|22598900_BQ427315

--------------------------------------ADTAGKVITCQAAVAWEAKKPLTLETVE-VEPPRAGEIRIKV-------LYTGVCHTDAYLLD--GFDPEGAFPIIMGHEGGGVVESVGEGVTSVQPGDHVIPLYIPQCNEC---------------KFCKSPKTNLCGKIR-ATQGKGVMPDGT-SRFKCKGKTLLHF-MGCSTFSQ--YTVVTEISVAKVNPAAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGST-CAIW-GLGAVGLAXAMGC-KEAGAKRIIGVDINPDKFELGKKFGLTEGVNSKDY-NKPIQEVLVGXXQMGGLDYTFECIGKXGMXXNRPKXLSX------------------------------------------------------------------------------------------------------------------------------

>KOG0022|TTUB|160916819_EY444488

-------------------------------------MAETAGKTITCRAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LYSGVCHTDAYTLS--GCDPEGLFPVVLGHEGGGIVESVGEGVTSVQPGDHVIPLYVPQCGEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGC-SRFTCNGKTLFHF-MGCSTFSE--YTVVAEISVCKVDVTAGLDKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEAGST-CAVW-GLGAVGLATIMGC-QKAGASRIIGIDINPTKFDIAQQFGATECVNLTEH-SKPISQVLVDL-TX-------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|LGIG|228995_50179

-------------------------------------MSDTVGKTITCKAAVAWEAKKPLVTETIE-VEPPRAGEVRIKI-------LATGVCHTDAYTLD--GFDSEGAFPVVLGHEGGGIVESVGEGVTSVKTGDHVIPLYIPQCYDC---------------KFCTSKKTNLCSKIR-ATQGKGVMPDGT-SRFRCNGKQLFHF-MGCSTFSE--YTVVAEISVSKVDEKAPLEKVCLLGCGISTGY-----------------------------------------------GAALNSAGVESGST-CAVW-GLGAVGLAVIMGC-KKAGATRIIGVDINSAKFKVAEDFGCTEFINPKDY-DKPIQEVIVEK-TDGGCDYTFECIGNIGTMRAALESCHKGWGVSTIIGVAGAGQEISTKPFQLVTGRVWKGSAFGGWKSRDSVP--KLVDEYMRKELKVDEFVSFTLPLDKINEAFDYMHSGKSIRSVVKL----------------------------------

>KOG0022|HDIS|Contig350

------------------------------------MADTGGEDDNLXKQQVAWEAKKPLTMETIEGGTXPRQEKYELKV-------LATGVVPSDAYTLD--GFDPEGLFPVVLGHEGGGIVESVGEGVTTVKPGDHVIPCYIPQCYDC---------------KFCKSTKTNLCSKIR-STQGAGVMPDGT-SRFTCKGKQLYHF-MGTSTFSE--YTVVAEISVAKVDEQAPLDKVCLLGVWYIYRN-----------------------------------------------GAALNSANVEAGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGIDINP------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|HROB|95088

-------------------------------------MTSCVGKIIECLAAVAWQANEPLKIEKVS-VAPPKAGEVRIKI-------LHSGVCHTDAYTLG--GHDSEGVFPVILGHEGGGIVESVGEGVTEFEIGDHVIPLYIPQCKEC---------------KFCLNPKTNLCQKIR-VTQGRGVMPDGT-SRFTCNGKTIYHF-MGCSTFSE--YTVVASISLCKIPKEADLQKVCLLGCGISTGF-----------------------------------------------GAVFNTAKVEKGSV-CGVW-GIGAVGLAAIMGC-QKAGASKIYAVDINEKKFDLAKKFGATHVINPANY-NKPMQEVLVEM-TDGGFDYTFECIGNVQTMRSALEACHKGWGVSVIIGVAAAGQEISTKPFQLVTGRTWKGTAFGGYKSKDCVP--LLVNKYLGKELEIDDFVTHTMELNDINKAFHLMHTGESIRSVVKMAIDLSQFRYILVDII-------------------

>KOG0022|TTRA|Contig270

-------------------------------------MSQTAGKPIQCRAAVAWEPKKPLVIETIE-VAPPKAGEVRIKV-------LATGVCHTDAYTLD--GFDPEGKFPCVLGHEGGGVVESIGEGVTSLQPGDHVIPLYIPQCRSC---------------KMCNSPKTNLCSKIR-STQGAGVMPDGT-TRFTCKGQQLFHF-MGTSTFTQ--YTVVAEISVAKVSEKAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKGST-CAVW-GLGAVGLAVIMGC-KQAGAARIIGVDINADKFDIAKSFGCTEFVNPKEH-EKPIQNVLVEM-TDGGCDFTFECIGNTACMRAALESCHKGWGVSTIIGEQQQAGDF----YEAIPVSDWTGVRHSIX----------------------------------------------------------------------------------------

>KOG0022|ESCO|Contig1298

-------------------------------------CRTQLGKVITCKAAVAWEAKKPLSIETVE-VAPPKAGEVRIKI-------LHTGVCHTDAYTLD--GFDPEGIFPVILGHEGAGIVESIGENVTSVKPGDHVIPLYVPQCREC---------------KFCKNPKTNLCQKIR-VTQGKGVMPDGT-SRFKCQGKDIFHF-MGCSSFSE--YTVVAEISVAKVDMKAPLEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKDSI-CAIW-GLGAVGLAVAMGC-KTAGASRVIGIDINNSKFEVAKKFGVNEFINPKEH-DKTIQQVLCEM-TDGGCDFTFECIGNVATMRAALESCHKGWGVSTIIGVAAAGQEISTRPFQLVTGRVWKGTAFGGYKSRDSVP--QLVLDYMSGKLLIDEFITHNMEMEKINEAFDLMHQGKSIRSIVTFEETRLFNQKMEIPPVFYWILX-------------

>KOG0022|MCAL|Contig5669

-------------------------------------MSGTVGQPIQCQAAVAWEAKKPLTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVKSVQPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-LTQGQGVMPDGS-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGSS-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGSGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYMNKELKVDEFVSHNVKLDQINEAFDLMHSGKSIRAVVALF---------------------------------

>KOG0022|MCAL|Contig7268

-------------------------------------MSGTVGQPIQCQAAVAWEAKKPLTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVKSVQPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-STQGQGVMPDGS-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGSS-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGSGQEIFTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYMKKELKVDEFVSHNVKLDQINEAFDLMHSGKSIRAVVAXVLRKSTRKCLIEGNVHSSX---------------

>KOG0022|MGAL|Contig605

-------------------------------------MSETAGKPIQCLAAVAWEPKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQTGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-STQGQGVMPDGT-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGGGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYLNKELKVDEFVSHNVKLDKINEAFDLMHS--------------------------------------------

>KOG0022|MGAL|Contig608

------------IIDNILPVARNSVHTSKSDKQIEKXMSETAGKPIQCLAAVAWEAKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQAGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-TTQGQGVMPDGT-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGR----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|MGAL|Contig36

-------------------------------------MSETAGKPIQCLAAVAWEAKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQAGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-TTQGQGVMPDGT-ARFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTTQQALVEL-TDGGLDYTFECIGNVHTMRAALEACHKGWGVSTIIGVAGGGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYLNKELKVDEFVSHNVKLDKINEAFDLMHSGKSIRQLLHCSX--------------------------------

>KOG0022|ACAL|203643576_GD218340

------DSRVHQRSLPVAAGRFLISTSQRKLHQPRRETMSTAGKPISCKAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAKVEPGSV-CGVW-GLGAVGLAVLMAX-KKAGASKILGIX---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|ACAL|203618754_GD202833

HSRSTPDSRVHQRSLPVAAGRFLISTSQRKLHQPRRETMSTAGKPISCKAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAS-EPGSV-CGVW-GLGAVGLAXLNGLVRKAGX----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|ACAL|Contig5424

--------KFISGACRLLLELFDLNKXNENFTNLAEKPCLQLENPFHAKLLWHXEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAKVEPGSXGVEWW-GLELVGLAGLWAV-ESRG----------------VQDFX--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|IPAR|Contig228

---------------------------------AHVNYHVTQXETIHCLAAVAWGPKKPLSIEEVE-VAPPKAGEVRVKI-------LHTGVCHTDAYTLD--GFDPEGVFPVILGHEGAGIVESVGENVTSVQPGDHVIPLYVPQCYDC---------------KFCKNPKTNLCQKIR-VTQGKGVMPDGT-SRFKCKGQEIFHF-MGCSSFSE--YTVVAEISLAKVDMKAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKDSI-CAIW-GLGAAGLAVAMGC-KTGGASRIIGIDINNSKFEKARKFGVTEFVNPK------------------------------------------------------------------------------------------------------------------------------------------------------------------------

file.info
Code:

245141_1470034        91.382767
Contig7391    58.008659
Contig8274    54.621849
Contig6495    53.503185
Contig7423    55.015198
Contig3238    49.361702
228996_50180  48.380131
166096_100143 41.648590
164588407_AM869158        39.784946
22598900_BQ427315        40.489132
160916819_EY444488        36.918606
228995_50179  42.082428
Contig350    45.161289
95088        49.579830
Contig270    41.277641
Contig1298    45.228214
Contig5669    39.393940
Contig7268    41.250000
Contig605    38.137474
Contig608    40.251572
Contig36      38.876888
203643576_GD218340        43.026707
203618754_GD202833        45.238094
Contig5424    50.862068
Contig228    40.483383

Thanks again!
Kevin

Edit: for reasons unknown to me, the spacing in "file" looks different but I think this is a copy/paste error. The file I am using looks fine. I have checked to make sure that it has unix linebreaks (as did the test file).

ghostdog74 12-24-2009 06:58 PM

if you don't tell me why it doesn't work, how am i going to know?? i ran it with your real data, which you should have provided in the first place, and it runs fine. what errors you got? describe them as much as you can ...

kmkocot 12-25-2009 01:06 PM

Interesting. I thought the data I provided originally would be a good enough proxy for the real deal but evidently not. The script did not return any errors but it didn't remove the first sequence which has a value in the .dist file >75.

Thanks,
Kevin

catkin 12-25-2009 01:12 PM

Quote:

Originally Posted by kbp (Post 3803008)
Try removing the double quotes from here:

Code:

if [ "$percent_difference" -gt 75 ] ; then
becomes:

Code:

if [ $percent_difference -gt 75 ] ; then

It will work either way but the error messages will be more helpful from the second form if $percent_difference is not a valid number.

kmkocot 01-17-2010 09:59 PM

Me again...

I tried to modify ghostdog74's script to be iterative for all files in a folder but I'm having some trouble. It works on the first file but it makes no changes to the following files. Any ideas what I'm doing wrong?

Code:

#!/bin/bash

for FileName in *.fa
do
infoalign -only -name -change -sequence $FileName -outfile $FileName.info
unset arr
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    if [ "$whole" -gt 75 ];then
        arr+=($tag)
    fi
    unset IFS
done < $FileName.info
exec 4< $FileName
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ "$flag" -eq 1 ] && continue
    echo "$line" >> $FileName.trimmed
    read NEXT <&4
    echo "$NEXT" >> $FileName.trimmed
done
exec 4<&-
done

Thanks!
Kevin

chrism01 01-17-2010 11:32 PM

As I understand it, you are comparing an int to a non-int. bash cannot compare non-int values, use bc as advised above.


Also
Code:

Using the [[ ... ]] test construct, rather than [ ... ] can prevent
many logic errors in scripts. For example, the &&, ||, <, and >  operators
work within a [[ ]] test, despite giving an error within a [ ] construct.

http://www.linuxtopia.org/online_boo...ml#DBLBRACKETS

kmkocot 01-18-2010 11:49 AM

Thanks Chris. The script ghostdog wrote truncates the decimal number in the second field of the .info file to the nearest whole number before it is passed to if so I didn't think that was the problem. The double vs. single bracket issue is news to me so I will look into this now. Also, I forgot to mention that I tried kbp and catkin's suggestion of removing the double quotes around the variable in the if statements but this didn't seem to have an effect.

I played around with it a lot this morning and I seem to have it working and behaving properly now. I tried unsetting all of the variables before the declare statement and that seems to have done the trick.

Code:

#!/bin/bash
for FileName in *.fa
do
infoalign -only -name -change -sequence $FileName -outfile $FileName.info
unset arr
unset line
unset num
unset tag
unset whole
unset IFS
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    #echo $FileName
    #echo $whole
    if [ $whole -gt 75 ];then
        arr+=($tag)
    fi
    unset IFS
done < $FileName.info
exec 4< $FileName
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ $flag -eq 1 ] && continue
    echo "$line" >> $FileName.trimmed
    read NEXT <&4
    echo "$NEXT" >> $FileName.trimmed
done
exec 4<&-
done

Thanks all for your help!
Kevin


All times are GMT -5. The time now is 09:11 AM.