[SOLVED] sed 's/Tb05.5K5.100/Tb229/' alone but doesn't work in sed file w/ other expressions
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The resulting file doesn't have Tb229 but has 2 ocurrences of Tb2190: the one corresponding to Tb11.01.6740 and the one which corresponds with Tb05.5K5.110 (the Tb229 sustitution!!!!)
What I don't understand is:
why it doesn't recognize the pattern I wish?
why it does a sustitution twice without the global option?
And, that's not all: since the arrangement of the sustitutions in the sed file corresponds with the order in which the patterns appear in the original file, the first sustitution of Tb2190 is the wrong one. Why?
Every help will be most appreciated
regards
Jenifer
There is nothing odd with it. So I see two possibilities right now:
a) You try to reproduce the erronious behavior with a smaller sample set
or
b) you post both complete original files as attachment. Be sure to add the extension *.txt to the files otherwise you won't be able to attach them.
First, you need to get rid of the obvious bugs in your sed script. Get rid of the dots. I've posted a modified script to do this. I'm fairly sure that will solve your current problem, although I can't be sure since I don't have complete files to test it.
Code:
i=1
echo -n "sed -e 's/" >> sed_script_v2_Tb
for j in $(grep '>' Tb.fasta |sed 's/>//g'|sed 's/\./\\./g')
do
k=$(echo -n $j)
echo -n "/"$k"/"Tb$i"/;" >>sed_script_v2_Tb
let i=$i+1
done
echo "'" >>sed_script_v2_Tb
Second, sed is not a good tool for this job. Imagine if you run your script on a file that looks like this:
Code:
>Tb0
ctga
>Tb1
atgc
>Tb10
gatc
>Tb100
catg
You would end up with the names Tb1, Tb2, Tb20 and Tb200, and I don't think that is what you want.
I suggest you use a short perl script along these lines instead.
It's much less code, and much less likely to fail. With a more elaborate script you could even generate a sed script to easily revert to the original naming scheme.
I hope this helps.
Last edited by kris_kiil; 03-03-2011 at 08:08 AM.
Reason: Corrected error in perl script
TThank you both. You two really saved me.
The sed file worked with smaller datasets, but failed with bigger ones
And the perl line, modified just al litle, worked perfectly.
The reason I modified it was because with the printf(">Tb%08i\n",$i), I had big tags and some sequence aligners crop the names of the genes to 8 characters and the rest is considered as part of the sequence (hence the alignment is useless), or ignores the rest of the name and when the alignmet is presented many times you can't distinguish a sequence from another, which is really bad)
EDIT:
Note to people interested in this sequence problem and community in general:
This particular solution to the problem will NOT work on files with more than 100000 sequences, but you can modify the script to adapt it to your dataset. I will try to find general solution and if I someday find a way, I'll post it here. If you find a solution, please share it, It will be very helpful
Once again, thank you both very much.
See you arround
Jenifer
Last edited by Radha.jg; 03-03-2011 at 08:20 AM.
Reason: Add a note
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.