Command Line: Splitting a txt file according to regular expressions in each line
Hi there,
i have the following problem to solve: I have a huge text file containing sequences like this: >HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT TTTTATATATATATATTAGCGCGCG....and so on >HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on I want to split the file by the information in the header, namely orig_bc=CGTAGGCT In the end i want to have separate files that contain all header lines with the same orig_bc ID PLUS every line until a line begins with ">". In this example: File CGTAGGCT >HH3_5 M02542:50:000000000-ACBUJ:1:2117:9387:24782 orig_bc=CGTAGGCT TTTTATATATATATATTAGCGCGCG....and so on File AGTAGGCG >HH4_4 M02542:50:000000000-ACBUJ:1:2117:9387:24783 orig_bc=AGTAGGCG TTTTAGCCGCTGCTCGTCGCTATATATATATTAGCGCGCG....and so on |
Sounds like a plan. What have you done to solve it? I would suggest awk, perl, ruby or such languages
|
All times are GMT -5. The time now is 05:22 AM. |