Command Line: Splitting a txt file according to regular expressions in each line
Hi there,
i have the following problem to solve:
I have a huge text file containing sequences like this:
>HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT
TTTTATATATATATATTAGCGCGCG....and so on
>HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG
TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on
I want to split the file by the information in the header, namely orig_bc=CGTAGGCT
In the end i want to have separate files that contain all header lines with the same orig_bc ID PLUS every line until a line begins with ">".
In this example:
File CGTAGGCT
>HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT
TTTTATATATATATATTAGCGCGCG....and so on
File AGTAGGCG
>HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG
TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on
|