LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-23-2009, 08:12 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Rep: Reputation: 15
Scripting question - feed an input file into an if statement line-by-line


Hi all,

I am trying to write a script that takes an input file ($FileName) and an intermediate file ($FileName.info) and removes lines from $FileName if the value in $2 of $FileName.info is <75. I can't figure out how to feed only one line of the .info file to the if statement at a time so that it will perceive it as an integer instead of a list. The error I am getting now is ./script.sh: line 6: [: : integer expression expected

Sample input $FileName
Code:
>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|IJKL|233557573
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Sample $FileName.info (values are fabricated)
Code:
Contig4550    97.440582
254409037_GR867771	98.499321
233557573     55.192300
Sample desired output $FileName
Code:
>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Script so far:
Code:
for FileName in *.fa
do
infoalign  -sequence $FileName -outfile $FileName.info
percent_difference=`awk -F " " '{$2}' $FileName.info`
if [ "$percent_difference" -gt 75 ] ; then
seq_to_remove=`awk -F " " '{print $1}' $FileName.info`
sed -i "s/^>.............$seq_to_remove//g,+1n" $FileName
fi
done
Any suggestions on how to fix this wold be greatly appreciated.

Thanks!
Kevin
 
Old 12-23-2009, 08:32 PM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650Reputation: 650
Try removing the double quotes from here:

Code:
if [ "$percent_difference" -gt 75 ] ; then
becomes:

Code:
if [ $percent_difference -gt 75 ] ; then
<edit>Just had a thought... maybe your awk is grabbing something that can't be interpreted as a number, try checking the value of $percent_difference before the 'if' test </edit>

<edit2>Second thought... your values in filename.info appear to be 'real' numbers not integers ... </edit2>

cheers

Last edited by kbp; 12-23-2009 at 08:36 PM.
 
Old 12-23-2009, 08:48 PM   #3
gregorian
Member
 
Registered: Apr 2006
Posts: 509

Rep: Reputation: 34
You'll need to use the bc utility for floating point comparision.
 
Old 12-23-2009, 09:59 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
#!/bin/bash
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    if [ "$whole" -lt 75 ];then
        arr+="$tag"
    fi
    unset IFS
done <"file.info"
exec 4<"file"
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ "$flag" -eq 1 ] && continue
    echo "==>$line"
    read NEXT <&4
    echo "--->$NEXT"
done
exec 4<&-
output
Code:
$ ./shell.sh
>KOG0001|ABCD|Contig4550
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>KOG0001|EFGH|254409037_GR867771
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Last edited by ghostdog74; 12-23-2009 at 10:04 PM.
 
Old 12-24-2009, 04:09 PM   #5
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Original Poster
Rep: Reputation: 15
Thanks all!

ghostdog,

The script works perfectly on the sample input data I posted but when I tried it on real data, it didn't work. Any ideas? The organization of the file (including line breaks in the lines not containing greater-than symbols) is the same.

file
Code:
>KOG0022|NVEC|245141_1470034

---------------------------------MADLVNSERVKLQPATSPLNYSVPHVLSYVYRAATFALRSASRDIGMNIRNNTRLXKTSGQNNSINTNSYTPWPDYMFSGKLRPWPQSPKRKIPDGIDKPEYWETGIPEFEMKSKQSTQIQCLSAKEIEKMRETCKLAREVLDIGAKAVKVGATTDEIDRVVHEACIERKCYPSPLNYHGFPKSCCTSINEVICHGIPDKRPLEDGDIVNLDITVFYNGYHGDLNETFFVGNVADEYKQLVKVTYECLMQAIDIVKPGVRYREVGNVIQKHAQAHGYSVVRSYCGHGINQLFHTAPSVPHYAKNKAIGI-MKP-----GHTFTIEPMISQGTWRDETWPDQWTAVTQDGKRSAQFEQTLLVTETGCEILTIRPEENGAPXLPSPDVILFVSVHRQTIIVADXKIAYKVDFTDEAIENAVKSFFQEILESRLFSSMFWISCCPPICGILDAAERCQNGLLEGCPFFSSDIPRKEALMVPQNSVKPRSRKANVSEINKFSS

>KOG0022|MCAL|Contig7391

-------------------------------------MADTTGKTIRCKAAVMREHKKPMIIENIE-VAPPKAGEIRIKI-------MYSSICHSDENYL---GGSRPWIVDSILGHEGAGIVESVGEGVTDFKAGDHVIPSFMGQCNQC---------------RTCKSGKTNVCEVLKGEHYLKGGMLDGT-VRFSCNGNPIYHY-LNTSTFSQ--YTVASEWSCVKIDPAAPLDKACLLGCGIATGY-----------------------------------------------GSAINTAKVEPGSV-CAVW-GLGTIGLAVVMGC-RNAGASRIIGIDTNPAKFELGKKFGMTEGVNPKDF-KEPLQDVLLKM-TNGGLDYAFECIGNVKTMKVAFDSVHRCWGETLLIGVAPITDEFVTNPYSVTMGKQVIGSLYGDYKLK-TIS--NLVTEYMNKKLMVDEFVTHKMSLDKINDGFDLLRSGKSLRTVLDMW---------------------------------

>KOG0022|MCAL|Contig8274

--------------------------------QAYQYNGRHSGKVITCKAAVAWESGKPLSIETIE-VAPPKAKEVRVKV-------LYSGVCHSDLSILN---GVVRGRFPIILGHEGSGIVEGVGEGVTDFQAGDHVIPLYMPQCNAC---------------RSCKSGKTNICEEFLGKTHAFGLMTDGT-PRFTCDGKPVYHF-MACSAFSQ--YVVLPHMSVCKIDNTAPLEKVCLLGCGIATGY-----------------------------------------------GAALNTAKVESGST-CAVW-GLGPIGLSAVMGC-KKAGASRIIGVDINPEKFELGKKFGLTEGINPKDY-DKPIQEVLMGM-TNGGVDYTFECIGNVNAMRAAFDSCHKGWGKTIVLGIAPTAEEFSTNPFSFTLGKHILGSVYGEWKGKDDVP--KLIEGYNKKEILLDEFITHTMALERVNEAFDLMRERKSLRTVINLWPDTTVQKSX------------------------

>KOG0022|MCAL|Contig6495

-------------------------------------MDDTVGKSITCKAAVAWESGKPLSIETIE-VAPPKAKEVRVKV-------LYSGVCHSDLSILN---GSVRGRFPIILGHEGSGIVESVGEGVTDFQAGDHVIPLYMPQCNAC---------------RSCKSGKTNICEEFLGKTHAFGLMTDGT-PRFTCDGKPVYHF-MACSAFSQ--YVVLPHMSVCKIDNTAPLEKVCLLGCGIATGY-----------------------------------------------GAALNTAKVESGST-CAVW-GLGPIGLSAVMGC-KKAGASRIIGVDINPEKFELGKKFGLTEGINPKDY-DKPIQEVLMGK-TNGGVDYTFECIGNVNAMRAAFDSCHKGWGKTIVLGIAPTAEEFSTNPFSFTLGKHILGSIYGEWKGKDDVP--KLIEGYNKKEILLDEFITHTMALERVNEAFDLMREGKSLRTVINLWPDTTVQKSX------------------------

>KOG0022|MCAL|Contig7423

-------------------------------------MSETKGKVIQCKAAVCWEPKKPLTIETVE-VAPPRGGEVRVRI-------AYTGICHSDAHIIN---ACISAKFPVILGHEAAGIVESVGDGVTNFEEGDHVMAMFLPECNQC---------------RCCTSGKTGCCEVFMDKNYANGLLMDGT-SRFSIKGKTVYHF-FDTSTFSQ--YTVVPAISLVKINPAAPMEKVCILSCGIATGY-----------------------------------------------GTAVNTAPVTPGSV-CAVW-GCGCIGLACIMGC-KAAGAARIIGIDINPEKIKNAKKFGITEGVNPLDX----------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|HMED|Contig3238

--------------------------------HLWSVMEGDCGEVIKCRAAVAWSPKAPLKLETIF-VSPPKEGEVRIKV-------LYTGVCHTDAYTLE--GHDPEGVFPVILGHEGAGVVESVGDGVTEFQPGDHVIPLYIPQCRTC---------------KFCMSSKTNLCQKIR-ETQGKGVMPDGT-SRFKCDDKEIFHF-MGCSTFSE--YTVVAAISLCKVDKAADLQKVCLLGCGISTGY-----------------------------------------------GAVLNNAKVEPGST-CGVW-GMGAVGLAAVVGC-KKAGAKIIYAIDINPKKFELAKRLGATDVLNPNDF-DKPIQQVLIEK-TEGGFDYTFECIGNVQTMRAALESCHKGWGTSVVIGVAASGQEISTRPFQLVTGRTWKGSAFGGWKSKDSVP--KLVDEYLDNSLALDEFITHTMDLDDVNTAFDLMLSGESIRSVVTVAAVX------------------------------

>KOG0022|LGIG|228996_50180

------------------------------------MASDTLSKKIRCKAAVLWEVNTPLVIETIE-VEPPRAGEVRIKI-------LATGVCKTDAYLLD--RVDPSKNYPVILGHEGAGIVESVGEGVTNVAPGDHVVPLYYPQCYQC---------------KFCKNPKTNFCSKVR-ATQMKGVMPDGT-SRFRCNGKKLFHF-MGCSTFSE--YTVVADVSVCKVDSTAPSEKVCLLGCSISTGY-----------------------------------------------GAVVNTAQVESGST-CAVW-GLGAVGLAVIMGC-KIAGAKRIIGVDINSDKFKVAEDFGCTEFINPKDY-DKPIQEVIVEK-TDGGCDYTFECIGSVEAMRASLDACHKGWGVSTILGMTPPGAELTAKPYSIVTGCVWKGSVFGGWKSQDSLP--TLVEEYMQNKLKVDEFVSFTLPLAKINEGFDYMRKGVGIKSVVIFD---------------------------------

>KOG0022|CAP1|166096_100143

-------------------------------------MSDTVGKTITCRAAVAWEAKKPLSQETIE-VAPPKAGEVRIKI-------LHTGVCHTDAYTLE--GFDPEGLFPVVLGHEGAGIVESIGEGVTSVAVGDSVVPLYVPQCKEC---------------KFCKSPKTNLCSKIR-ATQGAGMMPDGT-SRFSCKGKQLFHF-MGCSTFSE--YTVVAEISVCKVNPEAAKDKICLLGCGISTGY-----------------------------------------------GAALNTAKVEKGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINKDKFQCAKDFGATECINPKDH-ERPMQQVLVEM-TDGGLDYTFECIGNVATMRAALESCHKGWGESIIIGVAAAGQEISTRPFQLVTGRVWKGTAFGGWKSRDSVP--KLVEDYVSGSLNLDPFVTHHRSLDKINETFDMMHSGESIRSIVDF----------------------------------

>KOG0022|CGIG|164588407_AM869158

-------------------------------------MADTTGKVITCQAAVAWEAKKPLTLETVE-VEPPRAGEVRIKV-------LYTGVCHTDAYLLD--GFDPEGAFPIIMGHEGGGVVESVGEGVTSVQPGDHVIPLYIPQCNEC---------------KFCKSPKTNLCGKIR-ATQGKGVMPDGT-SRFKCKGKTLLHF-MGCSTFSQ--YTVVTEISVTKVNPAAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGST-CAIX------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|CGIG|22598900_BQ427315

--------------------------------------ADTAGKVITCQAAVAWEAKKPLTLETVE-VEPPRAGEIRIKV-------LYTGVCHTDAYLLD--GFDPEGAFPIIMGHEGGGVVESVGEGVTSVQPGDHVIPLYIPQCNEC---------------KFCKSPKTNLCGKIR-ATQGKGVMPDGT-SRFKCKGKTLLHF-MGCSTFSQ--YTVVTEISVAKVNPAAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGST-CAIW-GLGAVGLAXAMGC-KEAGAKRIIGVDINPDKFELGKKFGLTEGVNSKDY-NKPIQEVLVGXXQMGGLDYTFECIGKXGMXXNRPKXLSX------------------------------------------------------------------------------------------------------------------------------

>KOG0022|TTUB|160916819_EY444488

-------------------------------------MAETAGKTITCRAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LYSGVCHTDAYTLS--GCDPEGLFPVVLGHEGGGIVESVGEGVTSVQPGDHVIPLYVPQCGEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGC-SRFTCNGKTLFHF-MGCSTFSE--YTVVAEISVCKVDVTAGLDKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEAGST-CAVW-GLGAVGLATIMGC-QKAGASRIIGIDINPTKFDIAQQFGATECVNLTEH-SKPISQVLVDL-TX-------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|LGIG|228995_50179

-------------------------------------MSDTVGKTITCKAAVAWEAKKPLVTETIE-VEPPRAGEVRIKI-------LATGVCHTDAYTLD--GFDSEGAFPVVLGHEGGGIVESVGEGVTSVKTGDHVIPLYIPQCYDC---------------KFCTSKKTNLCSKIR-ATQGKGVMPDGT-SRFRCNGKQLFHF-MGCSTFSE--YTVVAEISVSKVDEKAPLEKVCLLGCGISTGY-----------------------------------------------GAALNSAGVESGST-CAVW-GLGAVGLAVIMGC-KKAGATRIIGVDINSAKFKVAEDFGCTEFINPKDY-DKPIQEVIVEK-TDGGCDYTFECIGNIGTMRAALESCHKGWGVSTIIGVAGAGQEISTKPFQLVTGRVWKGSAFGGWKSRDSVP--KLVDEYMRKELKVDEFVSFTLPLDKINEAFDYMHSGKSIRSVVKL----------------------------------

>KOG0022|HDIS|Contig350

------------------------------------MADTGGEDDNLXKQQVAWEAKKPLTMETIEGGTXPRQEKYELKV-------LATGVVPSDAYTLD--GFDPEGLFPVVLGHEGGGIVESVGEGVTTVKPGDHVIPCYIPQCYDC---------------KFCKSTKTNLCSKIR-STQGAGVMPDGT-SRFTCKGKQLYHF-MGTSTFSE--YTVVAEISVAKVDEQAPLDKVCLLGVWYIYRN-----------------------------------------------GAALNSANVEAGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGIDINP------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|HROB|95088

-------------------------------------MTSCVGKIIECLAAVAWQANEPLKIEKVS-VAPPKAGEVRIKI-------LHSGVCHTDAYTLG--GHDSEGVFPVILGHEGGGIVESVGEGVTEFEIGDHVIPLYIPQCKEC---------------KFCLNPKTNLCQKIR-VTQGRGVMPDGT-SRFTCNGKTIYHF-MGCSTFSE--YTVVASISLCKIPKEADLQKVCLLGCGISTGF-----------------------------------------------GAVFNTAKVEKGSV-CGVW-GIGAVGLAAIMGC-QKAGASKIYAVDINEKKFDLAKKFGATHVINPANY-NKPMQEVLVEM-TDGGFDYTFECIGNVQTMRSALEACHKGWGVSVIIGVAAAGQEISTKPFQLVTGRTWKGTAFGGYKSKDCVP--LLVNKYLGKELEIDDFVTHTMELNDINKAFHLMHTGESIRSVVKMAIDLSQFRYILVDII-------------------

>KOG0022|TTRA|Contig270

-------------------------------------MSQTAGKPIQCRAAVAWEPKKPLVIETIE-VAPPKAGEVRIKV-------LATGVCHTDAYTLD--GFDPEGKFPCVLGHEGGGVVESIGEGVTSLQPGDHVIPLYIPQCRSC---------------KMCNSPKTNLCSKIR-STQGAGVMPDGT-TRFTCKGQQLFHF-MGTSTFTQ--YTVVAEISVAKVSEKAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKGST-CAVW-GLGAVGLAVIMGC-KQAGAARIIGVDINADKFDIAKSFGCTEFVNPKEH-EKPIQNVLVEM-TDGGCDFTFECIGNTACMRAALESCHKGWGVSTIIGEQQQAGDF----YEAIPVSDWTGVRHSIX----------------------------------------------------------------------------------------

>KOG0022|ESCO|Contig1298

-------------------------------------CRTQLGKVITCKAAVAWEAKKPLSIETVE-VAPPKAGEVRIKI-------LHTGVCHTDAYTLD--GFDPEGIFPVILGHEGAGIVESIGENVTSVKPGDHVIPLYVPQCREC---------------KFCKNPKTNLCQKIR-VTQGKGVMPDGT-SRFKCQGKDIFHF-MGCSSFSE--YTVVAEISVAKVDMKAPLEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKDSI-CAIW-GLGAVGLAVAMGC-KTAGASRVIGIDINNSKFEVAKKFGVNEFINPKEH-DKTIQQVLCEM-TDGGCDFTFECIGNVATMRAALESCHKGWGVSTIIGVAAAGQEISTRPFQLVTGRVWKGTAFGGYKSRDSVP--QLVLDYMSGKLLIDEFITHNMEMEKINEAFDLMHQGKSIRSIVTFEETRLFNQKMEIPPVFYWILX-------------

>KOG0022|MCAL|Contig5669

-------------------------------------MSGTVGQPIQCQAAVAWEAKKPLTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVKSVQPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-LTQGQGVMPDGS-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGSS-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGSGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYMNKELKVDEFVSHNVKLDQINEAFDLMHSGKSIRAVVALF---------------------------------

>KOG0022|MCAL|Contig7268

-------------------------------------MSGTVGQPIQCQAAVAWEAKKPLTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVKSVQPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-STQGQGVMPDGS-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEPGSS-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGSGQEIFTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYMKKELKVDEFVSHNVKLDQINEAFDLMHSGKSIRAVVAXVLRKSTRKCLIEGNVHSSX---------------

>KOG0022|MGAL|Contig605

-------------------------------------MSETAGKPIQCLAAVAWEPKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQTGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-STQGQGVMPDGT-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTSQQALVEL-TDGGLDYTFECIGNIHTMRAALEACHKGWGVSTIIGVAGGGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYLNKELKVDEFVSHNVKLDKINEAFDLMHS--------------------------------------------

>KOG0022|MGAL|Contig608

------------IIDNILPVARNSVHTSKSDKQIEKXMSETAGKPIQCLAAVAWEAKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQAGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-TTQGQGVMPDGT-VRFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGR----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|MGAL|Contig36

-------------------------------------MSETAGKPIQCLAAVAWEAKKPMTMETIE-VMPPRAGEVRIKI-------LYTGVCHTDAYTLD--GHDPEGKFPCVLGHEGGGVVESVGEGVTSVQAGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-TTQGQGVMPDGT-ARFTCKGKDLFHF-MGCSTFSQ--YTVVAEISVSKVNPKAALEKVCLLGCGISTGY-----------------------------------------------GAALNTAGVEPGST-CAVW-GLGAVGLAVIMGC-KKAGASRIIGVDINPDKFAIAKEFGMTESFNPKDHPDKTTQQALVEL-TDGGLDYTFECIGNVHTMRAALEACHKGWGVSTIIGVAGGGQEISTRPFQLVTGRVWKGTAFGGWKSRESVP--KLVEEYLNKELKVDEFVSHNVKLDKINEAFDLMHSGKSIRQLLHCSX--------------------------------

>KOG0022|ACAL|203643576_GD218340

------DSRVHQRSLPVAAGRFLISTSQRKLHQPRRETMSTAGKPISCKAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAKVEPGSV-CGVW-GLGAVGLAVLMAX-KKAGASKILGIX---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|ACAL|203618754_GD202833

HSRSTPDSRVHQRSLPVAAGRFLISTSQRKLHQPRRETMSTAGKPISCKAAVAWEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAS-EPGSV-CGVW-GLGAVGLAXLNGLVRKAGX----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|ACAL|Contig5424

--------KFISGACRLLLELFDLNKXNENFTNLAEKPCLQLENPFHAKLLWHXEAKKPLSIETIE-VAPPRAGEVRIKI-------LATGVCHTDAYTLD--GHDPEGIFPVVLGHEGGGIVESVGEGVKSVAPGDHVIPLYTPQCYEC---------------KFCKNPKTNLCQKIR-VTQGQGLMPDGT-RRFTCKGKELYHF-MGCSTFAE--YTVCAEISVAKVDENAPLDKVCLLGCGISTGY-----------------------------------------------GAALNNAKVEPGSXGVEWW-GLELVGLAGLWAV-ESRG----------------VQDFX--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>KOG0022|IPAR|Contig228

---------------------------------AHVNYHVTQXETIHCLAAVAWGPKKPLSIEEVE-VAPPKAGEVRVKI-------LHTGVCHTDAYTLD--GFDPEGVFPVILGHEGAGIVESVGENVTSVQPGDHVIPLYVPQCYDC---------------KFCKNPKTNLCQKIR-VTQGKGVMPDGT-SRFKCKGQEIFHF-MGCSSFSE--YTVVAEISLAKVDMKAPLDKVCLLGCGISTGY-----------------------------------------------GAALNTAKVEKDSI-CAIW-GLGAAGLAVAMGC-KTGGASRIIGIDINNSKFEKARKFGVTEFVNPK------------------------------------------------------------------------------------------------------------------------------------------------------------------------
file.info
Code:
245141_1470034	91.382767
Contig7391    58.008659
Contig8274    54.621849
Contig6495    53.503185
Contig7423    55.015198
Contig3238    49.361702
228996_50180  48.380131
166096_100143 41.648590
164588407_AM869158	39.784946
22598900_BQ427315	40.489132
160916819_EY444488	36.918606
228995_50179  42.082428
Contig350     45.161289
95088         49.579830
Contig270     41.277641
Contig1298    45.228214
Contig5669    39.393940
Contig7268    41.250000
Contig605     38.137474
Contig608     40.251572
Contig36      38.876888
203643576_GD218340	43.026707
203618754_GD202833	45.238094
Contig5424    50.862068
Contig228     40.483383
Thanks again!
Kevin

Edit: for reasons unknown to me, the spacing in "file" looks different but I think this is a copy/paste error. The file I am using looks fine. I have checked to make sure that it has unix linebreaks (as did the test file).

Last edited by kmkocot; 12-24-2009 at 04:13 PM.
 
Old 12-24-2009, 07:58 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
if you don't tell me why it doesn't work, how am i going to know?? i ran it with your real data, which you should have provided in the first place, and it runs fine. what errors you got? describe them as much as you can ...
 
Old 12-25-2009, 02:06 PM   #7
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Original Poster
Rep: Reputation: 15
Interesting. I thought the data I provided originally would be a good enough proxy for the real deal but evidently not. The script did not return any errors but it didn't remove the first sequence which has a value in the .dist file >75.

Thanks,
Kevin
 
Old 12-25-2009, 02:12 PM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by kbp View Post
Try removing the double quotes from here:

Code:
if [ "$percent_difference" -gt 75 ] ; then
becomes:

Code:
if [ $percent_difference -gt 75 ] ; then
It will work either way but the error messages will be more helpful from the second form if $percent_difference is not a valid number.
 
Old 01-17-2010, 10:59 PM   #9
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Original Poster
Rep: Reputation: 15
Me again...

I tried to modify ghostdog74's script to be iterative for all files in a folder but I'm having some trouble. It works on the first file but it makes no changes to the following files. Any ideas what I'm doing wrong?

Code:
#!/bin/bash

for FileName in *.fa
do
infoalign -only -name -change -sequence $FileName -outfile $FileName.info
unset arr
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    if [ "$whole" -gt 75 ];then
        arr+=($tag)
    fi
    unset IFS
done < $FileName.info
exec 4< $FileName
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ "$flag" -eq 1 ] && continue
    echo "$line" >> $FileName.trimmed
    read NEXT <&4
    echo "$NEXT" >> $FileName.trimmed
done
exec 4<&-
done
Thanks!
Kevin
 
Old 01-18-2010, 12:32 AM   #10
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,241

Rep: Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325Reputation: 2325
As I understand it, you are comparing an int to a non-int. bash cannot compare non-int values, use bc as advised above.


Also
Code:
Using the [[ ... ]] test construct, rather than [ ... ] can prevent 
many logic errors in scripts. For example, the &&, ||, <, and >  operators 
work within a [[ ]] test, despite giving an error within a [ ] construct.
http://www.linuxtopia.org/online_boo...ml#DBLBRACKETS

Last edited by Tinkster; 01-18-2010 at 12:55 AM. Reason: fixed tags
 
Old 01-18-2010, 12:49 PM   #11
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Original Poster
Rep: Reputation: 15
Thanks Chris. The script ghostdog wrote truncates the decimal number in the second field of the .info file to the nearest whole number before it is passed to if so I didn't think that was the problem. The double vs. single bracket issue is news to me so I will look into this now. Also, I forgot to mention that I tried kbp and catkin's suggestion of removing the double quotes around the variable in the if statements but this didn't seem to have an effect.

I played around with it a lot this morning and I seem to have it working and behaving properly now. I tried unsetting all of the variables before the declare statement and that seems to have done the trick.

Code:
#!/bin/bash
for FileName in *.fa
do
infoalign -only -name -change -sequence $FileName -outfile $FileName.info
unset arr
unset line
unset num
unset tag
unset whole
unset IFS
declare -a arr
while read -r line
do
    set -- $line
    num=$2;tag=$1
    IFS="."; set -- $num
    whole=$1
    #echo $FileName
    #echo $whole
    if [ $whole -gt 75 ];then
        arr+=($tag)
    fi
    unset IFS
done < $FileName.info
exec 4< $FileName
while read -r line <&4
do
    case "$line" in
        ">"*)
            flag=0
            IFS="|"
            set -- $line
            for i in ${arr[@]}
            do
                if [ "$i" = "$3" ];then
                    flag=1
                fi
            done
    esac
    [ $flag -eq 1 ] && continue
    echo "$line" >> $FileName.trimmed
    read NEXT <&4
    echo "$NEXT" >> $FileName.trimmed
done
exec 4<&-
done
Thanks all for your help!
Kevin
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl question: delete line from text file with duplicate match at beginning of line mrealty Programming 7 04-01-2009 07:46 PM
C++ input file reading "line by line" assamite Programming 2 05-31-2008 05:54 PM
php - Read file line by line and change a specific line. anrea Programming 2 01-28-2007 02:43 PM
linux scripting help needed read from file line by line exc commands each line read atokad Programming 4 12-26-2003 11:24 PM


All times are GMT -5. The time now is 05:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration