LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-26-2006, 04:37 PM   #1
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Rep: Reputation: 15
Advanced Sed Question


Code:
ATOM   123 456 789 A 123 456 789 
ATOM   123 456 789 A 123 456 789
ATOM   123 456 789 A 123 456 789
ATOM   123 456 789 A 123 456 789
ATOM   123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
TER    123 456 789 A 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
ATOM   123 456 789 B 123 456 789 
ATOM   123 456 789 B 123 456 789
ATOM   123 456 789 B 123 456 789
ATOM   123 456 789 B 123 456 789
ATOM   123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
TER    123 456 789 B 123 456 789
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9

Lets say i have the above text file. I need a Linux command to read specific parts of the above text file.

Right now, i have
Code:
sed -n '/^ATOM...............A(or B)/, /^TER................A(or B)/p
to read all the lines with A(or B) in a certain column.

The problem is the Text doesnt have to start with ATOM, It can start with HETATM also(The one with B). In such a case, the command will ignore the first 2 lines. So i modified the command to read lines starting from ATOM or HETATM.

Code:
sed -n '/^ATOM\|HETA...............A(or B)/, /^TER................A(or B)/p
But in this case, The script goes on the read the HETATM's below TER.

Question 1. Is there a way to stop SED from reading the file any more after it reaches the TER string.

Question 2. I also need a command to read the HETATM lines after the TER. I need the following output
Code:
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     A
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 HOH     B
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
It shouldnt include the HETATM's that come between the ATOM and TER's. Is there any way to accomplish this?

Any help or pointers would be appreciated!
Thanks
-Vids
 
Old 06-26-2006, 07:28 PM   #2
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Forgive me for being dense but, I hope to understand you here...
Are you actually wanting two different searches?
If so, here is part one.
Quote:
Question 1. Is there a way to stop SED from reading the file any more after it reaches the TER string.
Code:
sed -n '/^ATOM.* [A-B]/,/^TER.* [A-B]/p' file.txt

or

perl -ne 'print if /^ATOM.* [A-B]/ .. /^TER.* [A-B]/' file.txt
Quote:
Question 2. I also need a command to read the HETATM lines after the TER.
Here is part two.
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt

or

perl -ne 'print if /^HET.* [A-B]$/ .. /$_/' file.txt
 
Old 06-27-2006, 12:49 PM   #3
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
Forgive me for being dense but, I hope to understand you here...
Are you actually wanting two different searches?
If so, here is part one.

Code:
sed -n '/^ATOM.* [A-B]/,/^TER.* [A-B]/p' file.txt

or

perl -ne 'print if /^ATOM.* [A-B]/ .. /^TER.* [A-B]/' file.txt

Here is part two.
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt

or

perl -ne 'print if /^HET.* [A-B]$/ .. /$_/' file.txt
Part one, When i Replace B with A-B in the the command, the command outputs chain A too..
This is my command
Code:
perl -ne 'print if /^ATOM|HETA.* [B]/ .. /^TER.* [B]/'
where do you define the command to select the B in the 22 column?

doesnt it need something like this?
Code:
perl -ne 'print if /^ATOM|HETA.................[B]/ .. /^TER..................[B]/'
The part 2 command doesnt output anything
 
Old 06-27-2006, 02:01 PM   #4
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Try these. I use .* instead of a bunch of dots like you did.
Also, in your example file, you had a captial A not a . Are there lower case a and b ?

For part one
Code:
sed -n '/^[ATOM|HETATM].* [A-B] /,/^TER.* [A-B]/p' file.txt
or
perl -ne 'print if /^[ATOM|HETATM].* [A-B] / .. /^TER.* [A-B]/' file.txt
for part two
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /^HET$/' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /$_\n/' file.txt
 
Old 06-27-2006, 03:18 PM   #5
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
Try these. I use .* instead of a bunch of dots like you did.
Also, in your example file, you had a captial A not a . Are there lower case a and b ?

For part one
Code:
sed -n '/^[ATOM|HETATM].* [A-B] /,/^TER.* [A-B]/p' file.txt
or
perl -ne 'print if /^[ATOM|HETATM].* [A-B] / .. /^TER.* [A-B]/' file.txt
for part two
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /^HET$/' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /$_\n/' file.txt
No, its all upper case.

Part 1 outputs everything in the file now. i modified the command to
Code:
sed -n '/^ATOM\|HETA.* [A] /,/^TER.* [A]/p' file.txt
but after it hits TER, It continues to read for lines startin with ATOM or HETA and if it finds any, it outputs those too

If i put in the following code
Code:
sed -n '/^ATOM\|HETA.* [B] /,/^TER.* [B]/p' file.txt
It outputs those lines with A in it too. A come in lines at different positions. can you make the command to search for A or B only in the 21st column?


Part 2. The command doesnt outputs anything. The HETATM's after TER contains A or B, or might not contain it. It should output only those lines starting with HETATM after the Last TER.

Do you have some instant messenger? I could send you the text file so that you can have a look. Its more complex then the example you see here.
 
Old 06-27-2006, 03:34 PM   #6
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Quote:
No, its all upper case.
Code:
Then don't use [a] or [b]
Don't put the \ in front of the or command: | as that makes it really look for a |
Quote:
but after it hits TER, It continues to read for lines startin with ATOM or HETA and if it finds any, it outputs those too
Do you want it to stop when it finds the first TER ?

Quote:
Do you have some instant messenger? I could send you the text file so that you can have a look. Its more complex then the example you see here.
No, I don't use im .
Do you have the text file someplace where I can get it?
 
Old 06-27-2006, 04:27 PM   #7
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
Code:
Then don't use [a] or [b]
Don't put the \ in front of the or command: | as that makes it really look for a |

Do you want it to stop when it finds the first TER ?


No, I don't use im .
Do you have the text file someplace where I can get it?
Yeah, i want SED to stop as soon as it finds the first TER. Is that possible?

You can find the text file here
Code:
http://www.pdb.org/pdb/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1KDX
Its really confusing if you read the PDB file.

I want a command to read all the lines between HETATM(or ATOM) and TER and another command to read all the lines starting from HETATM after the last TER.
 
Old 06-27-2006, 05:07 PM   #8
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Got some running to do, will look at it when I get back.
 
Old 06-27-2006, 05:46 PM   #9
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
I'd suggest writing a Perl script, but to answer your questions:
Quote:
i want SED to stop as soon as it finds the first TER. Is that possible?
You can pass several commands to sed. Use the first one to delete everything that follows TER, as in
Code:
sed -e '/^TER/,$ d' -e 'second command' ...
Quote:
can you make the command to search for A or B only in the 21st column?
Code:
sed -ne '/^.\{20\}\(A\|B\)/ p'
Quote:
I want a command to read all the lines between HETATM(or ATOM) and TER
Code:
sed -ne '/^TER/,$ d' -e '/^\(HETATM\|ATOM\)/,$ p'
Quote:
another command to read all the lines starting from HETATM after the last TER
Code:
tac | sed -e '/^TER/,$ d' | tac | sed -ne '/^HETATM/,$ p'
 
Old 06-27-2006, 09:59 PM   #10
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
ORGANISM_SCIENTIFIC: RATTUS NORVEGICUS;
SOURCE 9 ORGANISM_COMMON: RAT;
Kinda reminds me of "Pinky and the Brain"

How's this looking?

Part one
EDIT: I was able to condense it into a script thanks to the gurus down in Programing

To make this script executable, type: chmod +x myscript
Where myscript is the name I chose, you can use other names.
type the ./myscript 1KDX.pdb > file.txt
Code:
#!/usr/bin/perl
# ./myscript 1KDX.pdb > file.txt

while (<>) {
  next if /^TER.*/../$_\n/;
  print if /(^ATOM.* [A-B] .*)/ or /(^HETATM.* [A-B] .*)/;
}
Part two
Code:
perl -nle 'print $1 if /(^HETATM.* [A-B])/' 1KDX.pdb > file2.txt

Last edited by homey; 06-28-2006 at 04:15 PM.
 
Old 06-30-2006, 01:54 PM   #11
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
Part one
EDIT: I was able to condense it into a script thanks to the gurus down in Programing

To make this script executable, type: chmod +x myscript
Where myscript is the name I chose, you can use other names.
type the ./myscript 1KDX.pdb > file.txt
Code:
#!/usr/bin/perl
# ./myscript 1KDX.pdb > file.txt

while (<>) {
  next if /^TER.*/../$_\n/;
  print if /(^ATOM.* [A-B] .*)/ or /(^HETATM.* [A-B] .*)/;
}
Part two
Code:
perl -nle 'print $1 if /(^HETATM.* [A-B])/' 1KDX.pdb > file2.txt
That perl script did the job. At first it wouldnt select chains with B or C in it, because the script hits TER first.
So i modified it to the following
Code:
#!/usr/bin/perl
# ./myscript -$atm 1KDX.pdb > file.txt

while (<>) {
  print if /(^ATOM.................A)/ or /(^HETATM...............A)/;
  next if /^TER.*/../$_\n/;
}
Is there a way to make A as a variable? and pass the variable over the command? like

Code:
#!/usr/bin/perl
# ./myscript -$atm 1KDX.pdb > file.txt

while (<>) {
  print if /(^ATOM.................$letter)/ or /(^HETATM...............$letter)/;
  next if /^TER.*/../$_\n/;
}
Code:
./myscript -$letter=A 1KDX.pdb > file.txt
 
Old 06-30-2006, 02:48 PM   #12
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
I'm having perl learning pains.
Something like this?
Code:
#!/usr/bin/perl -w
# ./myscript A

if($#ARGV != 0){
     die "Example: $0 <A>\n";
}

my $LETTER = shift;

open(INFILE, "1KDX.pdb")   or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> file.txt") or die "Can't open file.txt for write: $!";

while (<INFILE>){
  print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
  next if /^TER.*/../$_\n/;
}

close INFILE;
close OUTFILE;
 
Old 06-30-2006, 03:27 PM   #13
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
I'm having perl learning pains.
Something like this?
Code:
#!/usr/bin/perl -w
# ./myscript A

if($#ARGV != 0){
     die "Example: $0 <A>\n";
}

my $LETTER = shift;

open(INFILE, "1KDX.pdb")   or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> file.txt") or die "Can't open file.txt for write: $!";

while (<INFILE>){
  print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
  next if /^TER.*/../$_\n/;
}

close INFILE;
close OUTFILE;
The problem is, now it doesnt stop at TER, It continues to read the file for further occurances of ATOM................A

And also, infile and outfile should be variables too.

So the command should be something like this

Code:
./myscript.pl A($letter) 1kdx.txt(infile) > file.txt(outfile)
How do i do that?
 
Old 06-30-2006, 04:33 PM   #14
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Quote:
The problem is, now it doesnt stop at TER
That's why I had the next if /^TER.*/../$_\n/; on top.

Is this what you had in mind? Me thinks this is how it's done but, I'm not entirely sure.

./myscript A 1KDX.pdb file.txt

You can also get a range of letters like this...
./myscript [A-B] 1KDX.pdb file.txt

Code:
#!/usr/bin/perl -w
# ./myscript A 1KDX.pdb file.txt

if($#ARGV != 2){
     die "Example: $0 A 1KDX.pdb file.txt \n";
}

my $LETTER = $ARGV[0];
my $IN = $ARGV[1];
my $OUT = $ARGV[2];

open(INFILE, "$IN")   or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> $OUT") or die "Can't open file.txt for write: $!";

while (<INFILE>){
  next if /^TER.*/../$_\n/;
  print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
}

close INFILE;
close OUTFILE;

Last edited by homey; 06-30-2006 at 04:36 PM.
 
Old 07-07-2006, 02:51 PM   #15
vidyashankara
Member
 
Registered: May 2006
Posts: 46

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by homey
That's why I had the next if /^TER.*/../$_\n/; on top.

Is this what you had in mind? Me thinks this is how it's done but, I'm not entirely sure.

./myscript A 1KDX.pdb file.txt

You can also get a range of letters like this...
./myscript [A-B] 1KDX.pdb file.txt

Code:
#!/usr/bin/perl -w
# ./myscript A 1KDX.pdb file.txt

if($#ARGV != 2){
     die "Example: $0 A 1KDX.pdb file.txt \n";
}

my $LETTER = $ARGV[0];
my $IN = $ARGV[1];
my $OUT = $ARGV[2];

open(INFILE, "$IN")   or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> $OUT") or die "Can't open file.txt for write: $!";

while (<INFILE>){
  next if /^TER.*/../$_\n/;
  print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
}

close INFILE;
close OUTFILE;

Sorry for the late reply, My computer gave out on me

The script does not work on B,C or D. So basically, It stops at TER. If the TER comes before ATOM.................B, it outputs nothing.
It should start reading when it hits ATOM.................B and stop at TER.

anyway to do that?

try this file
http://www.rcsb.org/pdb/downloadFile...ructureId=1AV1

It has four chains, A, B , C , D.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Intermediate to advanced Spamassassin question. I think.... tbeehler Linux - Software 1 03-30-2006 02:04 PM
[sed] "Advanced" sed question(s) G00fy Programming 2 03-20-2006 12:34 AM
ADVANCED QUESTION !! Try This extremebfn Linux - Networking 1 01-22-2005 06:57 AM
advanced fluxbox question CatBreath Linux - Software 2 09-22-2004 02:03 AM
advanced LILO question ridesideways Linux - General 8 03-26-2004 07:20 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration