Quote:
Originally Posted by rubylu
I'm a biochemist, and need to deal with a file with millions line sequence data.
the input file is:
@HWI-ST1324:186:HBEEWADXX:1:1101:1164:2175/1
CCTAGC
+
?@@D:A
@HWI-ST1324:126:HBEEWADXX:2:1101:1164:2111/2
CCAAGC
+
?@/D:A
...
My problem is to replace "/" in each line start with "@HWI" to "#AA/", ie. the outcome should be:
@HWI-ST1324:186:HBEEWADXX:1:1101:1164:2175#AA/1
CCTAGC
+
?@@D:A
@HWI-ST1324:126:HBEEWADXX:2:1101:1164:2111#AA/2
CCAAGC
+
?@/D:A
...
Any suggestion to solve this question?
Thanks for your input, much appreciate.
|
sed "/^\@HWI/ s/\//\#AA\//" < inputfile > outputfile
or
cat inputfile | sed "/^\@HWI/ s/\//\#AA\//" > outputfile
You may not need the backslashes in front of @ or # - you can try it without it.
It's just an escape character for special handling of certain characters - I'm guessing you don't really need it,
but it generally never hurts if you're not sure.
So, the sed expression *might* work just as well with... "/^@HWI/ s/\//#AA\//"
The first phrase, before the space is the part that looks at the beginning of the line(using the ^ symbol) for the @HWI string.
For all occurrences where this is true, it will substitute (the "s" command) any "/" (which must be backslashed for special handling, because forward slash characters are used to delimit the bounds of a subst command), with "#AA/" (once again, before the "/" there's a backslash for special handling).
That should handle your problem, and give you some info on how to handle it next time yourself, if something similar comes up.
Hope this helps.