sed 's/Tb05.5K5.100/Tb229/' alone but doesn't work in sed file w/ other expressions
Hi everyone
I'm trying to replace words form a file but the behavior that is having sed in my case is quite puzzling for me. When I do... Code:
sed 's/Tb05.5K5.100/Tb229/' Tb.fasta When I create a file with this line and do Code:
sed -f file Tb.fasta But, when I create a file like this Code:
s/Tb05.5K5.60/Tb224/ What I don't understand is: why it doesn't recognize the pattern I wish? why it does a sustitution twice without the global option? And, that's not all: since the arrangement of the sustitutions in the sed file corresponds with the order in which the patterns appear in the original file, the first sustitution of Tb2190 is the wrong one. Why? Every help will be most appreciated regards Jenifer |
It looks puzzling, I have a few comments though.
First, the dots (.) matches any character. That shouldn't matter in your example, though. Second, sed executes the script for every line, so the global option only affects if the pattern can match more than once pr. line. Sed works by loading in a line at a time, and executing every line of your script on the loaded line. Thus: Code:
echo 'a' | sed -e 's/a/b/;s/b/c/' I hope this can help you solve your problem. Regards, Kristoffer |
Hi,
it is hard to tell what is wrong without knowing the contents of the file that you want to process. Please post the file Tb.fasta, too. |
First of all, thank you both for the answers.
crts: The fasta file is like this... Code:
>Tb11.0040 Code:
s/Tb11.0040/Tb1/ Code:
i=1 Once again, thanks both for your attention |
I tried the script with the file you provided. There were no duplicates. This is the output I get:
Code:
$ sed -f file Tb.fasta a) You try to reproduce the erronious behavior with a smaller sample set or b) you post both complete original files as attachment. Be sure to add the extension *.txt to the files otherwise you won't be able to attach them. |
While I agree with crts, I have two comments.
First, you need to get rid of the obvious bugs in your sed script. Get rid of the dots. I've posted a modified script to do this. I'm fairly sure that will solve your current problem, although I can't be sure since I don't have complete files to test it. Code:
i=1 Code:
>Tb0 I suggest you use a short perl script along these lines instead. Code:
perl -e '$i=1; while(<>){if(m/^>/){printf(">Tb%08i\n",$i++);}else{print $_;}}' I hope this helps. |
TThank you both. You two really saved me.
The sed file worked with smaller datasets, but failed with bigger ones And the perl line, modified just al litle, worked perfectly. Code:
$i=1; EDIT: Note to people interested in this sequence problem and community in general: This particular solution to the problem will NOT work on files with more than 100000 sequences, but you can modify the script to adapt it to your dataset. I will try to find general solution and if I someday find a way, I'll post it here. If you find a solution, please share it, It will be very helpful Once again, thank you both very much. :D See you arround Jenifer |
All times are GMT -5. The time now is 06:23 AM. |