LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Command Line: Splitting a txt file according to regular expressions in each line (https://www.linuxquestions.org/questions/linux-newbie-8/command-line-splitting-a-txt-file-according-to-regular-expressions-in-each-line-4175528323/)

nouse 12-15-2014 08:29 AM

Command Line: Splitting a txt file according to regular expressions in each line
 
Hi there,

i have the following problem to solve:

I have a huge text file containing sequences like this:

>HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT
TTTTATATATATATATTAGCGCGCG....and so on
>HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG
TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on

I want to split the file by the information in the header, namely orig_bc=CGTAGGCT

In the end i want to have separate files that contain all header lines with the same orig_bc ID PLUS every line until a line begins with ">".


In this example:

File CGTAGGCT
>HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT
TTTTATATATATATATTAGCGCGCG....and so on


File AGTAGGCG
>HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG
TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on

grail 12-15-2014 10:45 AM

Sounds like a plan. What have you done to solve it? I would suggest awk, perl, ruby or such languages


All times are GMT -5. The time now is 05:22 AM.