LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-09-2010, 02:37 AM   #1
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Rep: Reputation: 0
read three files and print output in a new file


hi i have three files with me.

file1: have DNA sequences and each sequence will begin with > symbol

file2: have protein sequence and each sequence will start with > symbol

file3: BLAST result of file2 and each result will start with query= .


my problem is i have to make a report file by combining these three in such way that
first sequence from file1,first sequence from 2nd file and first result from file3 should be printed in a report file.

like this for all the sequences. could any one help me in this regard.

thanks,
pavan
 
Old 09-09-2010, 02:56 AM   #2
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 101Reputation: 101
Give us an example.
 
Old 09-09-2010, 03:03 AM   #3
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
file1:
>EIROHcDNALib03-A1-Forward_Primer.ab1 950


GNANATAANTTCAGCTCGACCTATCCCTCCATGCCGGACACCGCCTTTTA

CAAGACCATTGCAAATGATGACTCTGGTGCCATTGTACTCAAACTGAAAA

TCTTTTCCGAGGGGGGACCGCCCCAGGCTCTTCAATCCGTCCACTCAAAA

GCCTGTCTTCGGCCCCATATCAATCCCCCCCATGCTGGCCAAGAGAAAAA

TGTGCATACACCCACTCTGCTGCAGAATCAACCTCCATGTATGAAACAAA

CATGGACATGATGCCCCTAATGCTATGAGACCCGAATCATGGAACAATTC

TCGTCTCAAGAAAACCAAATCTTGTTTCACATGTGTTCTCTCGACTGGCT

TTGCAAGTGATGCTTCCATTCTTTACGACCATCAAAGACATGGTGCCAAC

TGGCTCCTGTCTGGAGAGCATGACTCACTGGCCTCCCTACCTCAAAAAAA

GCGCCACTTTTAATATCGGCCGTGCACTGCTTGGACAAGAAGGAATTTGG

ACATGACATTTATCTTCGCGCTGCTCAACACAAGACATTTCGATGCTGAT

CTAAGACTAGCCGACTANTGCAGTGCTGTTCCATGAAAATTTTGCTCTGA

CCCATATGCCATTGTGATCTGCTTTCGGGCCACCGGGCAGATCTATAACT

TACTAAACATTTTATAGAGATTATTTTCTACACAATGAGCCTCTCTGAAA

GTGATTTAGTAATAGTTTAGTGATGATCACTGGATGCATGATCACTGCTT

CTTTTGCCGTCTAGTCCTANGATGATTAATGATGCTCATCATCATTTTTG

TCCAGCGAATCACATAAAGCCCGACAACATAGCCTCACTATTGCTCGTAG

TCTTGACGATCTGGTCACACTGGTATTTGGTCTATCATTAGCACTTATCG

ATATGAAGTAATCCCCCGCTTAGGCTTCACTGGACGTNNGGTGTACTCTG


>EIROHcDNALib03-A2-Forward_Primer.ab1 950

NNTCNANAGTCAAAACGAAGTGGACTGCTAGCAATATTTGATTACTTTCT

GCAAAAAAGGAGGACTTTCTTTTGAGAAGCAAAAGATTTTGATGCAAAAT

GCTGAATTTTATTCTATTTGATATTGTAAGTTGAGGGACTACCTTGGTTA

TATCTCAAATTAATATGTTCTTAATAAGTAGCATCAACTAGTGTAATTAA

TTTGTTGTAATTTCCGGAGTAACCTGTGCTTAGCTGTGCACTGACGAAAA

GTTATGCTGCCTGTACAAAACTAGCTGGTAAATTTGATGCTGAAATTGTC

TCTTTGTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGCCATTCAGG

CCTCGAGGCCGTTCAGGCTCGACCCGGGGATCCGCGGCCCCTAATCAGTA

CTGACAATAAAAAGATTCTTGTTTTCAAGAACTTGTCATTTGTATAGTTT

TTTTATATTGTAGTTGTTCTATTTTAATCAAATGTTAGCGTGATTTATAT

TTTTTTTCGCCTCGACATCATCTGCCCAGATGCGAAGTTAAGTGCGCAGA

AAGTAATATCATGCGTCAATCGTATGTGAATGCTGGTCGCTATACTGCTG

TCGATTCGATACTAACGCCGCCATCCAGTGTCGAAAACGAGCATGCCCAT

GGGTTAACTGATCAATGCATCCTGCATGGCGCGCCTGATGAGCCTGAACT

GCCCGGGCAAATCAGCTGGACGTCTGCCTGCATTAATGAATCGGCCAACG

CGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCA

CTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCAC

TCAAAGGCGGTAATACGGTTATCCACAGAAATCAGGGGATAACGCAGAAA

GAACATGTGAGCAAAAGGCCAGCAAAGGCCAGGAACCGTAAAAGGCCGCG

file2:
>EIROHcDNALib03-A1-Forward_Primer.ab1_1 950
XXXFSSTYPSMPDTAFYKTIANDDSGAIVLKLKIFSEGGPPQALQSVHSKACLRPHINPP
HAGQEKNVHTPTLLQNQPPCMKQTWT*CP*CYETRIMEQFSSQENQILFHMCSLDWLCK*
CFHSLRPSKTWCQLAPVWRA*LTGLPTSKKAPLLISAVHCLDKKEFGHDIYLRAAQHKTF
RC*SKTSRLXQCCSMKILL*PICHCDLLSGHRADL*LTKHFIEIIFYTMSLSESDLVIV*
**SLDA*SLLLLPSSPXMINDAHHHFCPANHIKPDNIASLLLVVLTIWSHWYLVYH*HLS
I*SNPPLRLHWTXGVLX
>EIROHcDNALib03-A2-Forward_Primer.ab1_1 950
XXXSKRSGLLAIFDYFLQKRRTFF*EAKDFDAKC*ILFYLIL*VEGLPWLYLKLICS**V
ASTSVINLL*FPE*PVLSCALTKSYAACTKLAGKFDAEIVSLSKKKKKKKKKKGHSGLEA
VQARPGDPRPLISTDNKKILVFKNLSFV*FFYIVVVLF*SNVSVIYIFFRLDIICPDAKL
SAQKVISCVNRM*MLVAILLSIRY*RRHPVSKTSMPMG*LINASCMARLMSLNCPGKSAG
RLPALMNRPTRGERRFAYWALFRFLAH*LAALGRSAAASGISSLKGGNTVIHRNQGITQK
EHVSKRPAKARNRKRPR
>EIROHcDNALib03-A3-Forward_Primer.ab1_1 950
XXPXXXAAXEGXWITENLLQKKETENTS*CY*SLLVKCSLKELCQQCL*REKKKKKKKKK
KKKKKKKKKKKKKKKKRKKKKKKKRGFSAPGAFRAPPGNPGP*FGFEKKKFWVLKNLFFV
CFFFLGGFLFKQRVGGFFFFFPPHSSGAKAILRAQKGKRGPIRWRGLAVNGGAIKKNPPA
QGQKKNAMV*SKDVPGRGLQSLRVPRQA*KHVRE*AL*PPRENRFMIGVIXSDXCTXXVA
PX*GEIXCXGXDSXRATXITAKIVXTAXXAXXXXPXRXSSIXLXXXYSTVXXXXTXVIXL
XRXPGGTXSXXDCXTXR
>EIROHcDNALib03-A4-Forward_Primer.ab1_1 950
XXXXXFLDQSAVL*PMLTKTHFANLVQLAITETRGTVITFSLVLLVIQYSTDHATHMIWC
TMLTTIDASIHI*CNVKRFSPLSQCRS*QQADAPAKQMGFTSSQMFLSTLNVSQVLKWSI
PAQPIRYL*PRPRNAL*SPQLLKMYFAMPGLTETTATHGTATISFLVSLESMSTTGCVIL
CLWYTIQTMTNVNIKRYFLVFN*KHLQDQDHPIDALGNLMPFISCLMFSNILSVKEVRRF
CTFARRSKPSCRVV*NAELLQMLTE*HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPC
HPLTLVFDPDHDRCEHX

file3:
Query= EIROHcDNALib03-A1-Forward_Primer.ab1_1 950
Length=317


Score E
Sequences producing significant alignments: (Bits) Value

ref|ZP_03726001.1| hypothetical protein ObacDRAFT_7312 [Opitu... 35.4 9.2

ALIGNMENTS
>ref|ZP_03726001.1| hypothetical protein ObacDRAFT_7312 [Opitutaceae bacterium TAV2]
gb|EEG19982.1| hypothetical protein ObacDRAFT_7312 [Opitutaceae bacterium TAV2]
Length=1014

Score = 35.4 bits (80), Expect = 9.2, Method: Composition-based stats.
Identities = 18/56 (32%), Positives = 28/56 (50%), Gaps = 8/56 (14%)

Query 5 SSTYPSMPDTAFYKTIANDDSGAIVLKLKIFSEGGPPQALQSVHSKACLRPHINPP 60
++T MP F+ +G +V KL ++ G PPQA+ +H A P +N P
Sbjct 965 AATVDGMPGEGFF-------TGQLV-KLTVYGRGLPPQAISQLHQAAARLPFMNSP 1012


Query= EIROHcDNALib03-A2-Forward_Primer.ab1_1 950
Length=317
Score E
Sequences producing significant alignments: (Bits) Value

ref|ZP_06579711.1| conserved hypothetical protein [Streptomyc... 58.9 9e-07
ref|ZP_03104392.1| putative reverse transcriptase [Bacillus c... 56.2 5e-06
gb|ADJ00052.1| chloramphenicol acetyltransferase [Promoter pr... 45.1 0.013

ALIGNMENTS
>ref|ZP_06579711.1| conserved hypothetical protein [Streptomyces ghanaensis ATCC
14672]
gb|ABK60177.1| putative reverse transcriptase [Zingiber officinale]
gb|EFE70172.1| conserved hypothetical protein [Streptomyces ghanaensis ATCC
14672]
Length=49

Score = 58.9 bits (141), Expect = 9e-07, Method: Composition-based stats.
Identities = 31/50 (62%), Positives = 34/50 (68%), Gaps = 9/50 (18%)

Query 226 MARLMSLNCPGKSAGRLP--------ALMNRPTRGERRFAYWALFRFLAH 267
M+ L +NC +A R P ALMNRPTRGERRFAYWALFRFLAH
Sbjct 1 MSELTHINCVALTA-RFPVGKPVVPAALMNRPTRGERRFAYWALFRFLAH 49


>ref|ZP_03104392.1| putative reverse transcriptase [Bacillus cereus W]
ref|ZP_03104401.1| putative reverse transcriptase [Bacillus cereus W]
gb|EDX54357.1| putative reverse transcriptase [Bacillus cereus W]
gb|EDX54365.1| putative reverse transcriptase [Bacillus cereus W]
Length=44

Score = 56.2 bits (134), Expect = 5e-06, Method: Composition-based stats.
Identities = 24/24 (100%), Positives = 24/24 (100%), Gaps = 0/24 (0%)

Query 244 ALMNRPTRGERRFAYWALFRFLAH 267
ALMNRPTRGERRFAYWALFRFLAH
Sbjct 21 ALMNRPTRGERRFAYWALFRFLAH 44


>gb|ADJ00052.1| chloramphenicol acetyltransferase [Promoter probe vector pEvoGlowRed]
gb|ADJ00076.1| chloramphenicol acetyltransferase [Reporter vector pGlowRed]
Length=339

Score = 45.1 bits (105), Expect = 0.013, Method: Compositional matrix adjust.
Identities = 24/43 (55%), Positives = 27/43 (62%), Gaps = 9/43 (20%)

Query 226 MARLMSLNCPGKSAGRLP--------ALMNRPTRGERRFAYWA 260
M+ L +NC +A R P ALMNRPTRGERRFAYWA
Sbjct 257 MSELTYINCVALTA-RFPVGKPVVPAALMNRPTRGERRFAYWA 298


Query= EIROHcDNALib03-A4-Forward_Primer.ab1_1 950
Length=317


Score E
Sequences producing significant alignments: (Bits) Value

ref|XP_002609621.1| hypothetical protein BRAFLDRAFT_87842 [Br... 52.0 1e-04
ref|XP_002161711.1| PREDICTED: similar to AGAP011617-PA [Hydr... 49.7 5e-04
ref|XP_002607228.1| hypothetical protein BRAFLDRAFT_130810 [B... 48.9 0.001
ref|XP_002607229.1| hypothetical protein BRAFLDRAFT_130809 [B... 47.4 0.002
ref|XP_002161295.1| PREDICTED: similar to Os02g0236500, parti... 45.4 0.010
ref|XP_002589771.1| hypothetical protein BRAFLDRAFT_125880 [B... 41.6 0.13
ref|XP_002163238.1| PREDICTED: similar to predicted protein [... 41.2 0.15
ref|XP_002590831.1| hypothetical protein BRAFLDRAFT_125724 [B... 39.7 0.52
ref|XP_001638232.1| predicted protein [Nematostella vectensis... 39.7 0.52
ref|XP_002740150.1| PREDICTED: chitotriosidase-like [Saccoglo... 38.9 0.95

ALIGNMENTS
>ref|XP_002609621.1| hypothetical protein BRAFLDRAFT_87842 [Branchiostoma floridae]
gb|EEN65631.1| hypothetical protein BRAFLDRAFT_87842 [Branchiostoma floridae]
Length=1791

Score = 52.0 bits (123), Expect = 1e-04, Method: Composition-based stats.
Identities = 20/41 (48%), Positives = 24/41 (58%), Gaps = 0/41 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
G YRNP +C + C TGHP+Y R C P + P DRCE
Sbjct 1302 GRYRNPADCGSYYECVTGHPLYLRDCAPGNTAYSPVTDRCE 1342


Score = 36.6 bits (83), Expect = 4.1, Method: Composition-based stats.
Identities = 15/38 (39%), Positives = 19/38 (50%), Gaps = 0/38 (0%)

Query 278 RNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
R P +C + C TGHP+Y R C + DRCE
Sbjct 1649 RYPADCGRYYECVTGHPLYLRDCAQGGTAYSTVTDRCE 1686


>ref|XP_002161711.1| PREDICTED: similar to AGAP011617-PA [Hydra magnipapillata]
Length=1005

Score = 49.7 bits (117), Expect = 5e-04, Method: Composition-based stats.
Identities = 20/40 (50%), Positives = 28/40 (70%), Gaps = 1/40 (2%)

Query 277 YRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCEH 316
YRNPWNCH FI+C+ G ++ PC LV+DP ++ CE+
Sbjct 475 YRNPWNCHSFISCSNGIS-HNMPCPVSNLVYDPYNNICEY 513


Score = 48.9 bits (115), Expect = 8e-04, Method: Composition-based stats.
Identities = 21/50 (42%), Positives = 30/50 (60%), Gaps = 1/50 (2%)

Query 267 HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCEH 316
+F + G YRNPWNCH FI+C+ G ++ C LV+DP + CE+
Sbjct 578 NFCTNKPDGQYRNPWNCHTFISCSNGIS-HNMSCATPVLVYDPYDNLCEY 626


Score = 48.9 bits (115), Expect = 8e-04, Method: Composition-based stats.
Identities = 21/50 (42%), Positives = 30/50 (60%), Gaps = 1/50 (2%)

Query 267 HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCEH 316
+F + G YRNPWNCH FI+C+ G ++ C LV+DP + CE+
Sbjct 810 NFCTNKPDGQYRNPWNCHTFISCSNGIS-HNMSCATPVLVYDPYDNLCEY 858


Score = 48.9 bits (115), Expect = 0.001, Method: Composition-based stats.
Identities = 21/50 (42%), Positives = 31/50 (62%), Gaps = 1/50 (2%)

Query 267 HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCEH 316
+F + G YRNPWNCH FI+C+ G ++ C LV+DP ++ CE+
Sbjct 697 NFCIGKPDGQYRNPWNCHSFISCSNGVS-HNMSCPVSNLVYDPYNNICEY 745


Score = 43.9 bits (102), Expect = 0.023, Method: Composition-based stats.
Identities = 20/47 (42%), Positives = 25/47 (53%), Gaps = 1/47 (2%)

Query 268 FAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
F D G+Y +PWNCH F C G+ Y C LVF+P D+C
Sbjct 85 FCEDRQNGDYTDPWNCHKFFKCNEGYS-YLFDCQLSNLVFNPYTDQC 130


Score = 42.4 bits (98), Expect = 0.071, Method: Composition-based stats.
Identities = 20/52 (38%), Positives = 31/52 (59%), Gaps = 3/52 (5%)

Query 267 HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLV--FDPDHDRCEH 316
+F + +GN++NPW+CH F+TC G R C ++V +DP D CE+
Sbjct 931 NFCKNRAEGNWQNPWDCHTFLTCH-GQQTTVRNCSAPSVVLNYDPVTDVCEY 981


Score = 41.6 bits (96), Expect = 0.14, Method: Composition-based stats.
Identities = 20/59 (33%), Positives = 29/59 (49%), Gaps = 1/59 (1%)

Query 256 NAELLQMLTEXHFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
N + +T +F + G+Y NPWNC ++ C G D PC VF+P+ D C
Sbjct 318 NCRDISTVTTSNFCLLRPDGDYMNPWNCQRYLQCIDG-ATRDYPCLINEFVFNPELDVC 375


Score = 40.0 bits (92), Expect = 0.35, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 21/40 (52%), Gaps = 1/40 (2%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G+Y NPWNCH F C H Y C VF+P D+C
Sbjct 220 GDYNNPWNCHKFFKCFQ-HYSYLFDCPTTNPVFNPYTDQC 258


>ref|XP_002607228.1| hypothetical protein BRAFLDRAFT_130810 [Branchiostoma floridae]
gb|EEN63238.1| hypothetical protein BRAFLDRAFT_130810 [Branchiostoma floridae]
Length=1831

Score = 48.9 bits (115), Expect = 0.001, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 24/40 (60%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P +VFDP+ C
Sbjct 1534 GLYADPADCSMYYECVLGHPVYHRPCAPGGVVFDPERQIC 1573


Score = 47.8 bits (112), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 1388 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 1427


Score = 47.4 bits (111), Expect = 0.003, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 589 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 628


Score = 47.4 bits (111), Expect = 0.003, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 1007 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 1046


Score = 47.4 bits (111), Expect = 0.003, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 1153 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 1192


Score = 47.0 bits (110), Expect = 0.003, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 673 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 712


Score = 47.0 bits (110), Expect = 0.003, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 1237 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 1276


Score = 46.6 bits (109), Expect = 0.004, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 264 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 303


Score = 44.7 bits (104), Expect = 0.015, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 1087 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLSC 1126


Score = 39.3 bits (90), Expect = 0.66, Method: Composition-based stats.
Identities = 14/28 (50%), Positives = 17/28 (60%), Gaps = 0/28 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHP 302
G Y +P +C + C GHPVY RPC P
Sbjct 187 GMYADPADCSMYYECVLGHPVYHRPCAP 214


>ref|XP_002607229.1| hypothetical protein BRAFLDRAFT_130809 [Branchiostoma floridae]
gb|EEN63239.1| hypothetical protein BRAFLDRAFT_130809 [Branchiostoma floridae]
Length=1234

Score = 47.4 bits (111), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 273 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 312


Score = 47.4 bits (111), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 356 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 395


Score = 47.4 bits (111), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 773 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 812


Score = 47.4 bits (111), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 857 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 896


Score = 47.4 bits (111), Expect = 0.002, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 941 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 980


Score = 47.0 bits (110), Expect = 0.003, Method: Composition-based stats.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP RC
Sbjct 187 GLYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPASLRC 226


Score = 46.6 bits (109), Expect = 0.004, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 439 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 478


Score = 46.6 bits (109), Expect = 0.004, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 1025 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 1064


Score = 46.6 bits (109), Expect = 0.004, Method: Composition-based stats.
Identities = 18/40 (45%), Positives = 22/40 (55%), Gaps = 0/40 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G Y +P +C + C GHPVY RPC P V+DP C
Sbjct 1109 GMYADPADCSMYYECVLGHPVYHRPCAPGGTVYDPARQEC 1148


>ref|XP_002161295.1| PREDICTED: similar to Os02g0236500, partial [Hydra magnipapillata]
Length=931

Score = 45.4 bits (106), Expect = 0.010, Method: Compositional matrix adjust.
Identities = 19/49 (38%), Positives = 28/49 (57%), Gaps = 1/49 (2%)

Query 267 HFAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
+F + +Y NPWNCH FI+C+ G Y+ C L ++P+ D CE
Sbjct 59 NFCLGKPDDDYINPWNCHSFISCSNG-VSYNMSCPEPELFYNPESDSCE 106


Score = 38.1 bits (87), Expect = 1.3, Method: Compositional matrix adjust.
Identities = 16/42 (38%), Positives = 22/42 (52%), Gaps = 2/42 (4%)

Query 274 KGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
+ Y NPWNCH + TC + + + C V+DP D CE
Sbjct 181 RNKYSNPWNCHSYFTCDSF--LQEVACLKREFVYDPYDDYCE 220


>ref|XP_002589771.1| hypothetical protein BRAFLDRAFT_125880 [Branchiostoma floridae]
gb|EEN45782.1| hypothetical protein BRAFLDRAFT_125880 [Branchiostoma floridae]
Length=507

Score = 41.6 bits (96), Expect = 0.13, Method: Compositional matrix adjust.
Identities = 17/42 (40%), Positives = 24/42 (57%), Gaps = 0/42 (0%)

Query 274 KGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
+G Y +P +C + C GHP+Y RPC P VFD + C+
Sbjct 363 EGLYSDPADCSMYYQCVVGHPLYHRPCAPGGTVFDEEDQICD 404


Score = 39.7 bits (91), Expect = 0.49, Method: Compositional matrix adjust.
Identities = 19/45 (42%), Positives = 22/45 (48%), Gaps = 0/45 (0%)

Query 271 DVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
D G Y +P NC + C GHPVY R C P VFD C+
Sbjct 192 DRSPGMYSDPKNCSMYYECVLGHPVYHRACAPGGPVFDEQDHMCD 236


Score = 38.9 bits (89), Expect = 0.76, Method: Compositional matrix adjust.
Identities = 18/41 (43%), Positives = 21/41 (51%), Gaps = 0/41 (0%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
G Y +P NC + C GHPVY R C P VFD C+
Sbjct 41 GMYSDPKNCSMYYECVLGHPVYHRACAPGGPVFDEQDHMCD 81


Score = 38.9 bits (89), Expect = 0.85, Method: Compositional matrix adjust.
Identities = 22/74 (29%), Positives = 29/74 (39%), Gaps = 3/74 (4%)

Query 235 KEVRRFCTFARRSKPSCRVVXNAELLQMLTEXHFAMDVLKGNYRNPWNCHHFITCATGHP 294
E C + P C ++ E D G Y + NC + C GHP
Sbjct 74 DEQDHMCDWPENVPPPCGT---QRMVTEAPEPFTCDDKAPGLYADLLNCSMYWECVVGHP 130

Query 295 VYDRPCHPLTLVFD 308
Y+RPC P LVF+
Sbjct 131 AYNRPCAPDGLVFN 144


Score = 38.9 bits (89), Expect = 0.85, Method: Compositional matrix adjust.
Identities = 22/74 (29%), Positives = 29/74 (39%), Gaps = 3/74 (4%)

Query 235 KEVRRFCTFARRSKPSCRVVXNAELLQMLTEXHFAMDVLKGNYRNPWNCHHFITCATGHP 294
E C + P C ++ E D G Y + NC + C GHP
Sbjct 229 DEQDHMCDWPENVPPPCGT---QRMVTEAPEPFTCDDKAPGLYADLLNCSMYWECVVGHP 285

Query 295 VYDRPCHPLTLVFD 308
Y+RPC P LVF+
Sbjct 286 AYNRPCAPDGLVFN 299


>ref|XP_002163238.1| PREDICTED: similar to predicted protein [Hydra magnipapillata]
Length=335

Score = 41.2 bits (95), Expect = 0.15, Method: Compositional matrix adjust.
Identities = 19/40 (47%), Positives = 23/40 (57%), Gaps = 3/40 (7%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
GNY +PWNCH FI C+ G P LVF+P D+C
Sbjct 90 GNYNDPWNCHKFIVCSHGSSY---PYECQKLVFNPYIDQC 126


Score = 35.4 bits (80), Expect = 8.4, Method: Compositional matrix adjust.
Identities = 17/40 (42%), Positives = 22/40 (55%), Gaps = 3/40 (7%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRC 314
G+Y +PWNCH F C + Y C LVF+P D+C
Sbjct 205 GDYNDPWNCHKFFKCFQEYS-YQFDCQ--NLVFNPYTDQC 241


>ref|XP_002590831.1| hypothetical protein BRAFLDRAFT_125724 [Branchiostoma floridae]
gb|EEN46842.1| hypothetical protein BRAFLDRAFT_125724 [Branchiostoma floridae]
Length=327

Score = 39.7 bits (91), Expect = 0.52, Method: Compositional matrix adjust.
Identities = 18/45 (40%), Positives = 24/45 (53%), Gaps = 0/45 (0%)

Query 271 DVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
D G Y + NC + C GHP Y+RPC LV++PD C+
Sbjct 113 DKPAGTYPDVTNCRAYWECVPGHPPYNRPCALQELVYNPDKGVCD 157


>ref|XP_001638232.1| predicted protein [Nematostella vectensis]
gb|EDO46169.1| predicted protein [Nematostella vectensis]
Length=508

Score = 39.7 bits (91), Expect = 0.52, Method: Compositional matrix adjust.
Identities = 21/48 (43%), Positives = 25/48 (52%), Gaps = 2/48 (4%)

Query 268 FAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
F + GNY++ NCH FI C+ GH Y C P FDP RCE
Sbjct 98 FCHEKSDGNYKDSGNCHGFIMCSNGH-TYHMTC-PGQTNFDPAKKRCE 143


Score = 38.1 bits (87), Expect = 1.5, Method: Compositional matrix adjust.
Identities = 20/48 (41%), Positives = 26/48 (54%), Gaps = 2/48 (4%)

Query 268 FAMDVLKGNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDPDHDRCE 315
F + G+Y +P NC+ FITC+ G+ Y R C P L FD CE
Sbjct 319 FCEEKKNGDYADPSNCNGFITCSNGY-AYKRDC-PFNLKFDTKKLECE 364


>ref|XP_002740150.1| PREDICTED: chitotriosidase-like [Saccoglossus kowalevskii]
Length=540

Score = 38.9 bits (89), Expect = 0.95, Method: Compositional matrix adjust.
Identities = 19/35 (54%), Positives = 24/35 (68%), Gaps = 2/35 (5%)

Query 275 GNYRNPWNCHHFITCATGHPVYDRPCHPLTLVFDP 309
G YRNP +C+ +I CA G+ YDR C P T VF+P
Sbjct 493 GLYRNPNDCNKYIQCANGY-RYDRNCGPGT-VFNP 525
 
Old 09-09-2010, 03:14 AM   #4
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
hi, to make it simple

file1:
>A
>B
>C
>D ..
file2:
>1
>2
>3
>4 ..
file3:
query=1
query=2
query=3
query=4

report filethis is what i need)

>A
>1
query=1
>B
>2
query=2
>C
>3
query=3
>D
>4
query=4

hope this would be clear.

thanks,
pavan
 
Old 09-09-2010, 03:18 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Are the original input files exactly as the examples you posted?

- I see single and double empty lines:
Quote:
>EIROHcDNALib03-A1-Forward_Primer.ab1 950


GNANATAANTTCAGCTCGACCTATCCCTCCATGCCGGACACCGCCTTTTA

CAAGACCATTGCAAATGATGACTCTGGTGCCATTGTACTCAAACTGAAAA
versus
Quote:
>EIROHcDNALib03-A2-Forward_Primer.ab1 950

NNTCNANAGTCAAAACGAAGTGGACTGCTAGCAATATTTGATTACTTTCT

GCAAAAAAGGAGGACTTTCTTTTGAGAAGCAAAAGATTTTGATGCAAAAT
- are there lines that are actually continuous but which are folded in your example?
Quote:
>EIROHcDNALib03-A1-Forward_Primer.ab1_1 950
XXXFSSTYPSMPDTAFYKTIANDDSGAIVLKLKIFSEGGPPQALQSVHSKACLRPHINPP
HAGQEKNVHTPTLLQNQPPCMKQTWT*CP*CYETRIMEQFSSQENQILFHMCSLDWLCK*
CFHSLRPSKTWCQLAPVWRA*LTGLPTSKKAPLLISAVHCLDKKEFGHDIYLRAAQHKTF
RC*SKTSRLXQCCSMKILL*PICHCDLLSGHRADL*LTKHFIEIIFYTMSLSESDLVIV*
**SLDA*SLLLLPSSPXMINDAHHHFCPANHIKPDNIASLLLVVLTIWSHWYLVYH*HLS
I*SNPPLRLHWTXGVLX
Are these 2 lines or actually 7 lines as shown?

I'm asking because the examples given do not look uniform (especially true for file3). If we are to come up with a solution we need to be sure these examples given are 100% correct.
 
Old 09-09-2010, 03:25 AM   #6
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
hi,
these lines are supposed to be continuous. but, the out put files i got are having different line gaps. in some places it single line and in some places it is double...

basically am a biologist and not sure about how to rmove these line gaps.

please help me.

thanks,
pavan
 
Old 09-09-2010, 03:43 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Without exact examples it will be hard to impossible to help you......

Could you attach unedited/raw examples of these 3 files? I say attach instead of post to make sure the layout stays intact (no folded lines, see the exact "gaps" etc).
 
Old 09-09-2010, 03:50 AM   #8
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
how to attch files..i am not able to find an option..
 
Old 09-09-2010, 03:56 AM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Use the "Go Advanced" button -> "Additional Options" -> "Attach Files".
 
Old 09-09-2010, 04:05 AM   #10
sumeet inani
Member
 
Registered: Oct 2008
Posts: 908
Blog Entries: 26

Rep: Reputation: 49
First of all kvtspavan you can put long contents in code tags for better readability.

Next , you can use vim editor to create the desired file.
$vim file-with-absolute-address
If you want to delete all lines containing just space then run ':%s/^ *$//g' (without quotes, you will see them being typed on bottom line . This is ex mode.)
Thus all desired lines will become ^$
Now you can search for it & write macro to delete them as follows:
Run ':%s/^$//gn'
you get output
x match on z line (on bottom line)
create macro 'a' which deletes those lines.
press q then a
you get recording written on bottom left.
press 'n' then 'dd', now press 'q'
Now press (z-1)@a
(z-1) is a number according to your file.


you can easily take one line from each file & create desired output in vim editor using macros which automate repetitive task.
NOTE:I haven't made allowance of line containing tabs.
I will suggest you to create backup of files undergoing processing for safety.

Last edited by sumeet inani; 09-09-2010 at 04:09 AM.
 
Old 09-09-2010, 04:17 AM   #11
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
file1.txt

file2.txt

file3.txt
 
Old 09-09-2010, 04:19 AM   #12
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

have a look at this post:
http://www.linuxquestions.org/questi...2/#post4085218

Rename the filenames accordingly, of course, and make sure that there is a newline after each 'R filename'.
Hope this helps

[EDIT]
I just had a closer look at the files you attached. It seems that the link I provided probably won't solve your issue. Things are a bit more complicated in your situation.

Last edited by crts; 09-09-2010 at 04:26 AM.
 
Old 09-09-2010, 04:24 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Are we able to assume that the data in the 3 files is sequential per file?

eg. A1 is the first record in all 3 files? Or is it possible for A1 to appear somewhere else in any file other than first?

@crts - I am not sure that method would work as each record is of varying sizes so not adding sequential lines from each file. (that's if I understand this at all correctly which is not guarantee.

Last edited by grail; 09-09-2010 at 04:27 AM.
 
Old 09-09-2010, 04:31 AM   #14
kvtspavan
LQ Newbie
 
Registered: Sep 2010
Posts: 7

Original Poster
Rep: Reputation: 0
hi,
in case of file 1&2 it is in same order. but in case of 3rd file it can be any where in the file. the only way to identify is based on the sequence name.

Ex:
>EIROHcDNALib03-A1-Forward_Primer.ab1_1 950


if the respective sequence result is not available in third file it should print NO SIGNIFICANT RESULTS FOUND

Last edited by kvtspavan; 09-09-2010 at 04:33 AM.
 
Old 09-09-2010, 06:15 AM   #15
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

this should get you started:
Code:
#!/bin/bash

key=0
while read -r line;do
	if [[ ${line:0:1} = '>' ]]; then
		if [[ ${key} != 0 ]];then
			sed -n "/$key/ {:mark p;n; /^>/ ! b mark}" file2
			sed -n "/$key/ {:mark p;n; /^[Qq]uery=/ ! b mark}" file3
		fi
		key=$(echo ${line} | sed -r 's/>([^ 	]*).*/\1/')
	fi
		echo ${line}
done < file1
sed -n "/$key/ {:mark p;n; /^>/ ! b mark}" file2
sed -n "/$key/ {:mark p;n; /^[Qq]uery=/ ! b mark}" file3
The whitespace in [^ ] (line 10) is a tab and a space character.

IMPORTANT: @OP: The files you attached were in DOS format. Not sure if this will be a problem when executing the script. If it does then convert the files to linux format before running the script.

Last edited by crts; 09-09-2010 at 06:24 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk: how can I print out a message to the screen when redirecting the output to file. quanba Programming 8 07-13-2015 01:54 AM
Trouble with making a bash script to read in different files and rename output files. rystke Linux - Software 1 05-07-2009 08:00 AM
Perl: Read a html file and output it... JoeBleaux Programming 3 04-03-2009 07:54 AM
How to read print status file in embedded linux chiragshah Linux - Newbie 1 12-31-2008 12:16 AM
How to print a given line from a file on the standard output Fond_of_Opensource Linux - Newbie 1 08-24-2006 02:45 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration