Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
ATOM 123 456 789 A 123 456 789
ATOM 123 456 789 A 123 456 789
ATOM 123 456 789 A 123 456 789
ATOM 123 456 789 A 123 456 789
ATOM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
HETATM 123 456 789 A 123 456 789
TER 123 456 789 A 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
ATOM 123 456 789 B 123 456 789
ATOM 123 456 789 B 123 456 789
ATOM 123 456 789 B 123 456 789
ATOM 123 456 789 B 123 456 789
ATOM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
HETATM 123 456 789 B 123 456 789
TER 123 456 789 B 123 456 789
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
Lets say i have the above text file. I need a Linux command to read specific parts of the above text file.
Right now, i have
Code:
sed -n '/^ATOM...............A(or B)/, /^TER................A(or B)/p
to read all the lines with A(or B) in a certain column.
The problem is the Text doesnt have to start with ATOM, It can start with HETATM also(The one with B). In such a case, the command will ignore the first 2 lines. So i modified the command to read lines starting from ATOM or HETATM.
Code:
sed -n '/^ATOM\|HETA...............A(or B)/, /^TER................A(or B)/p
But in this case, The script goes on the read the HETATM's below TER.
Question 1. Is there a way to stop SED from reading the file any more after it reaches the TER string.
Question 2. I also need a command to read the HETATM lines after the TER. I need the following output
Code:
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH A
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 HOH B
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
HETATM 123 AC9
It shouldnt include the HETATM's that come between the ATOM and TER's. Is there any way to accomplish this?
Any help or pointers would be appreciated!
Thanks
-Vids
Try these. I use .* instead of a bunch of dots like you did.
Also, in your example file, you had a captial A not a . Are there lower case a and b ?
For part one
Code:
sed -n '/^[ATOM|HETATM].* [A-B] /,/^TER.* [A-B]/p' file.txt
or
perl -ne 'print if /^[ATOM|HETATM].* [A-B] / .. /^TER.* [A-B]/' file.txt
for part two
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /^HET$/' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /$_\n/' file.txt
Try these. I use .* instead of a bunch of dots like you did.
Also, in your example file, you had a captial A not a . Are there lower case a and b ?
For part one
Code:
sed -n '/^[ATOM|HETATM].* [A-B] /,/^TER.* [A-B]/p' file.txt
or
perl -ne 'print if /^[ATOM|HETATM].* [A-B] / .. /^TER.* [A-B]/' file.txt
for part two
Code:
sed -n '/^HET.* [A-B]$/,/$_/p' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /^HET$/' file.txt
or
perl -ne 'print if /^HET.* [A-B]$/ .. /$_\n/' file.txt
No, its all upper case.
Part 1 outputs everything in the file now. i modified the command to
Code:
sed -n '/^ATOM\|HETA.* [A] /,/^TER.* [A]/p' file.txt
but after it hits TER, It continues to read for lines startin with ATOM or HETA and if it finds any, it outputs those too
If i put in the following code
Code:
sed -n '/^ATOM\|HETA.* [B] /,/^TER.* [B]/p' file.txt
It outputs those lines with A in it too. A come in lines at different positions. can you make the command to search for A or B only in the 21st column?
Part 2. The command doesnt outputs anything. The HETATM's after TER contains A or B, or might not contain it. It should output only those lines starting with HETATM after the Last TER.
Do you have some instant messenger? I could send you the text file so that you can have a look. Its more complex then the example you see here.
I want a command to read all the lines between HETATM(or ATOM) and TER and another command to read all the lines starting from HETATM after the last TER.
ORGANISM_SCIENTIFIC: RATTUS NORVEGICUS;
SOURCE 9 ORGANISM_COMMON: RAT;
Kinda reminds me of "Pinky and the Brain"
How's this looking?
Part one EDIT: I was able to condense it into a script thanks to the gurus down in Programing
To make this script executable, type: chmod +x myscript
Where myscript is the name I chose, you can use other names.
type the ./myscript 1KDX.pdb > file.txt
Code:
#!/usr/bin/perl
# ./myscript 1KDX.pdb > file.txt
while (<>) {
next if /^TER.*/../$_\n/;
print if /(^ATOM.* [A-B] .*)/ or /(^HETATM.* [A-B] .*)/;
}
Part two
Code:
perl -nle 'print $1 if /(^HETATM.* [A-B])/' 1KDX.pdb > file2.txt
Part one EDIT: I was able to condense it into a script thanks to the gurus down in Programing
To make this script executable, type: chmod +x myscript
Where myscript is the name I chose, you can use other names.
type the ./myscript 1KDX.pdb > file.txt
Code:
#!/usr/bin/perl
# ./myscript 1KDX.pdb > file.txt
while (<>) {
next if /^TER.*/../$_\n/;
print if /(^ATOM.* [A-B] .*)/ or /(^HETATM.* [A-B] .*)/;
}
Part two
Code:
perl -nle 'print $1 if /(^HETATM.* [A-B])/' 1KDX.pdb > file2.txt
That perl script did the job. At first it wouldnt select chains with B or C in it, because the script hits TER first.
So i modified it to the following
Code:
#!/usr/bin/perl
# ./myscript -$atm 1KDX.pdb > file.txt
while (<>) {
print if /(^ATOM.................A)/ or /(^HETATM...............A)/;
next if /^TER.*/../$_\n/;
}
Is there a way to make A as a variable? and pass the variable over the command? like
Code:
#!/usr/bin/perl
# ./myscript -$atm 1KDX.pdb > file.txt
while (<>) {
print if /(^ATOM.................$letter)/ or /(^HETATM...............$letter)/;
next if /^TER.*/../$_\n/;
}
I'm having perl learning pains.
Something like this?
Code:
#!/usr/bin/perl -w
# ./myscript A
if($#ARGV != 0){
die "Example: $0 <A>\n";
}
my $LETTER = shift;
open(INFILE, "1KDX.pdb") or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> file.txt") or die "Can't open file.txt for write: $!";
while (<INFILE>){
print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
next if /^TER.*/../$_\n/;
}
close INFILE;
close OUTFILE;
I'm having perl learning pains.
Something like this?
Code:
#!/usr/bin/perl -w
# ./myscript A
if($#ARGV != 0){
die "Example: $0 <A>\n";
}
my $LETTER = shift;
open(INFILE, "1KDX.pdb") or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> file.txt") or die "Can't open file.txt for write: $!";
while (<INFILE>){
print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
next if /^TER.*/../$_\n/;
}
close INFILE;
close OUTFILE;
The problem is, now it doesnt stop at TER, It continues to read the file for further occurances of ATOM................A
And also, infile and outfile should be variables too.
That's why I had the next if /^TER.*/../$_\n/; on top.
Is this what you had in mind? Me thinks this is how it's done but, I'm not entirely sure.
./myscript A 1KDX.pdb file.txt
You can also get a range of letters like this...
./myscript [A-B] 1KDX.pdb file.txt
Code:
#!/usr/bin/perl -w
# ./myscript A 1KDX.pdb file.txt
if($#ARGV != 2){
die "Example: $0 A 1KDX.pdb file.txt \n";
}
my $LETTER = $ARGV[0];
my $IN = $ARGV[1];
my $OUT = $ARGV[2];
open(INFILE, "$IN") or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> $OUT") or die "Can't open file.txt for write: $!";
while (<INFILE>){
next if /^TER.*/../$_\n/;
print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
}
close INFILE;
close OUTFILE;
That's why I had the next if /^TER.*/../$_\n/; on top.
Is this what you had in mind? Me thinks this is how it's done but, I'm not entirely sure.
./myscript A 1KDX.pdb file.txt
You can also get a range of letters like this...
./myscript [A-B] 1KDX.pdb file.txt
Code:
#!/usr/bin/perl -w
# ./myscript A 1KDX.pdb file.txt
if($#ARGV != 2){
die "Example: $0 A 1KDX.pdb file.txt \n";
}
my $LETTER = $ARGV[0];
my $IN = $ARGV[1];
my $OUT = $ARGV[2];
open(INFILE, "$IN") or die "Can’t open 1KDX.pdb for read: $!";
open(OUTFILE, "> $OUT") or die "Can't open file.txt for write: $!";
while (<INFILE>){
next if /^TER.*/../$_\n/;
print OUTFILE if /(^ATOM.................$LETTER)/ or /(^HETATM...............$LETTER)/;
}
close INFILE;
close OUTFILE;
Sorry for the late reply, My computer gave out on me
The script does not work on B,C or D. So basically, It stops at TER. If the TER comes before ATOM.................B, it outputs nothing.
It should start reading when it hits ATOM.................B and stop at TER.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.