Advanced Sed Question
Code:
ATOM 123 456 789 A 123 456 789 Lets say i have the above text file. I need a Linux command to read specific parts of the above text file. Right now, i have Code:
sed -n '/^ATOM...............A(or B)/, /^TER................A(or B)/p The problem is the Text doesnt have to start with ATOM, It can start with HETATM also(The one with B). In such a case, the command will ignore the first 2 lines. So i modified the command to read lines starting from ATOM or HETATM. Code:
sed -n '/^ATOM\|HETA...............A(or B)/, /^TER................A(or B)/p Question 1. Is there a way to stop SED from reading the file any more after it reaches the TER string. Question 2. I also need a command to read the HETATM lines after the TER. I need the following output Code:
HETATM 123 HOH A Any help or pointers would be appreciated! Thanks -Vids |
Forgive me for being dense but, I hope to understand you here...
Are you actually wanting two different searches? If so, here is part one. Quote:
Code:
sed -n '/^ATOM.* [A-B]/,/^TER.* [A-B]/p' file.txt Quote:
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt |
Quote:
This is my command Code:
perl -ne 'print if /^ATOM|HETA.* [B]/ .. /^TER.* [B]/' doesnt it need something like this? Code:
perl -ne 'print if /^ATOM|HETA.................[B]/ .. /^TER..................[B]/' |
Try these. I use .* instead of a bunch of dots like you did.
Also, in your example file, you had a captial A not a . Are there lower case a and b ? For part one Code:
sed -n '/^[ATOM|HETATM].* [A-B] /,/^TER.* [A-B]/p' file.txt Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt |
Quote:
Part 1 outputs everything in the file now. i modified the command to Code:
sed -n '/^ATOM\|HETA.* [A] /,/^TER.* [A]/p' file.txt If i put in the following code Code:
sed -n '/^ATOM\|HETA.* [B] /,/^TER.* [B]/p' file.txt Part 2. The command doesnt outputs anything. The HETATM's after TER contains A or B, or might not contain it. It should output only those lines starting with HETATM after the Last TER. Do you have some instant messenger? I could send you the text file so that you can have a look. Its more complex then the example you see here. |
Quote:
Code:
Then don't use [a] or [b] Quote:
Quote:
Do you have the text file someplace where I can get it? |
Quote:
You can find the text file here Code:
http://www.pdb.org/pdb/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1KDX I want a command to read all the lines between HETATM(or ATOM) and TER and another command to read all the lines starting from HETATM after the last TER. |
Got some running to do, will look at it when I get back.
|
I'd suggest writing a Perl script, but to answer your questions:
Quote:
Code:
sed -e '/^TER/,$ d' -e 'second command' ... Quote:
Code:
sed -ne '/^.\{20\}\(A\|B\)/ p' Quote:
Code:
sed -ne '/^TER/,$ d' -e '/^\(HETATM\|ATOM\)/,$ p' Quote:
Code:
tac | sed -e '/^TER/,$ d' | tac | sed -ne '/^HETATM/,$ p' |
ORGANISM_SCIENTIFIC: RATTUS NORVEGICUS;
SOURCE 9 ORGANISM_COMMON: RAT; Kinda reminds me of "Pinky and the Brain" :) How's this looking? Part one EDIT: I was able to condense it into a script thanks to the gurus down in Programing To make this script executable, type: chmod +x myscript Where myscript is the name I chose, you can use other names. type the ./myscript 1KDX.pdb > file.txt Code:
#!/usr/bin/perl Code:
perl -nle 'print $1 if /(^HETATM.* [A-B])/' 1KDX.pdb > file2.txt |
Quote:
So i modified it to the following Code:
#!/usr/bin/perl Code:
#!/usr/bin/perl Code:
./myscript -$letter=A 1KDX.pdb > file.txt |
I'm having perl learning pains. :)
Something like this? Code:
#!/usr/bin/perl -w |
Quote:
And also, infile and outfile should be variables too. So the command should be something like this Code:
./myscript.pl A($letter) 1kdx.txt(infile) > file.txt(outfile) |
Quote:
Is this what you had in mind? Me thinks this is how it's done but, I'm not entirely sure. :) ./myscript A 1KDX.pdb file.txt You can also get a range of letters like this... ./myscript [A-B] 1KDX.pdb file.txt Code:
#!/usr/bin/perl -w |
Quote:
Sorry for the late reply, My computer gave out on me :( The script does not work on B,C or D. So basically, It stops at TER. If the TER comes before ATOM.................B, it outputs nothing. It should start reading when it hits ATOM.................B and stop at TER. anyway to do that? try this file http://www.rcsb.org/pdb/downloadFile...ructureId=1AV1 It has four chains, A, B , C , D. |
All times are GMT -5. The time now is 08:52 AM. |