LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-05-2009, 01:05 PM   #1
mchriste
LQ Newbie
 
Registered: Feb 2009
Distribution: OpenSuse 11.1
Posts: 10

Rep: Reputation: 0
Extracting chunks of data based on variables stored in another file (Perl?)


I need to extract chunks of information from a file based on two variables that are stored within another file.

I have attached an example "data" file (alldocked.pdbqt) and "variables" file (roots.pdb).

Important notes:
  1. remove the .txt extension first (required by the forum)
  2. I had to heavily snip the data file. A real data file can easily be 10MB large.

The format for each record I want to extract is the following:

"Data" file: alldocked.pdbqt
Code:
<filename>:DOCKED: MODEL        <model>
<filename>:DOCKED: USER (...)
(...)
<filename>:DOCKED: (all kinds of stuff, many lines)
(...)
<filename>:DOCKED: ENDMDL
<filename>:DOCKED: MODEL        <model>
<filename>:DOCKED: USER (...)
(...)
<filename>:DOCKED: (all kinds of stuff, many lines)
(...)
<filename>:DOCKED: ENDMDL
(etc)
The <filename> and <model> variables are stored in a separate file called roots.pdb.
The format of roots.pdb is for each line:
Code:
ATOM      1    O UNK L   0       <coord1>   <coord2>  <coord3> 1.00  <energy>           O <filename> <model>
Characteristics of the data:
  • <filename> corresponds to field #13, <model> to field #14 in each line.
  • Both <filename> and <model> vary INDEPENDANTLY for each record.
  • <filename> can be absolutely random but it will always end with ".dlg".
  • There may be up to 99 different <filename> in the alldocked.pdbqt file.
  • <model> is always an integer and can range from 1 to 999.
  • The "data" file is always called alldocked.pdbqt, the "variables" file roots.pdb

WHAT I WOULD LIKE TO GET:
I need a file that contains the records corresponding to the first x (say, 20) lines of roots.pdb.
I also need to add a line containing the <filename> between each record.
(Yes I know that <filename> is included on each line, but later I will strip this).

So, the output format I desire would be: (output.pdbqt)

Code:
<filename>:DOCKED: REMARK <filename>
<filename>:DOCKED: MODEL        <model>
<filename>:DOCKED: USER (...)
(...)
<filename>:DOCKED: (all kinds of stuff, many lines)
(...)
<filename>:DOCKED: ENDMDL
<filename>:DOCKED: REMARK <filename>
<filename>:DOCKED: MODEL        <model>
<filename>:DOCKED: USER (...)
(...)
<filename>:DOCKED: (all kinds of stuff, many lines)
(...)
<filename>:DOCKED: ENDMDL
(...)
Enough with the abstract descriptions, here's a real example:

The example roots.pdb file I attached starts with the following lines:
Code:
ATOM      1    O UNK L   0       9.200   5.778  -4.800 1.00  -10.02           O SOS_charged1.dlg  13 
ATOM      2    O UNK L   0      13.230   3.129  -6.038 1.00  -7.94           O SOS_charged2.dlg  20 
ATOM      3    O UNK L   0      11.295   0.656  -6.503 1.00  -7.91           O SOS_charged1.dlg   7
So, the output file I would like should start with:

Code:
SOS_charged1.dlg:DOCKED: REMARK SOS_charged1.dlg
SOS_charged1.dlg:DOCKED: MODEL        13
(...)
SOS_charged1.dlg:DOCKED: (all data in alldocked.dlg that corresponds to model SOS_charged1.dlg 13)
(...)
SOS_charged1.dlg:DOCKED: ENDMDL
SOS_charged2.dlg:DOCKED: REMARK SOS_charged2.dlg
SOS_charged2.dlg:DOCKED: MODEL        20
SOS_charged2.dlg:DOCKED: USER (...)
(...)
SOS_charged2.dlg:DOCKED: (all data in alldocked.dlg that corresponds to model SOS_charged2.dlg 20)
(...)
SOS_charged2.dlg:DOCKED: ENDMDL
SOS_charged2.dlg:DOCKED: REMARK SOS_charged2.dlg
SOS_charged1.dlg:DOCKED: MODEL        7
SOS_charged1.dlg:DOCKED: USER (...)
(...)
SOS_charged1.dlg:DOCKED: (all data in alldocked.dlg that corresponds to model SOS_charged1.dlg 7)
(...)
SOS_charged1.dlg:DOCKED: ENDMDL
Who can help me? I'm trying to analyze protein-ligand docking results in a way that is not provided by the software I'm using - but I'm a biochemist, not a programmer!
Attached Files
File Type: txt alldocked.pdbqt.txt (235.1 KB, 5 views)
File Type: txt roots.pdb.txt (5.9 KB, 3 views)
 
Old 03-05-2009, 01:08 PM   #2
mchriste
LQ Newbie
 
Registered: Feb 2009
Distribution: OpenSuse 11.1
Posts: 10

Original Poster
Rep: Reputation: 0
Addon: I figure that a PERL script would be the way to go.
Something like (pseudocode):

Code:
#!/usr/bin/perl
print "How many conformations to extract? "
chomp ($xmax = <>);

$FILENAME=`awk '{print $13}' roots.pdb`;
$MODEL=`awk '{print $14}' roots.pdb`;

$START = "concatenate("$FILENAME,:DOCKED: MODEL        ,$MODEL")";
$END = "ENDMDL";

undef $/;
$_ = <STDIN>;

$line=1

print concatenate("$FILENAME,:DOCKED: REMARK ,$FILENAME");

if (/($START.*?$END)/s) {
print $1;
}

(append to output.pdbqt);

$line++;

(repeat the above until $line = $max)

Last edited by mchriste; 03-05-2009 at 01:15 PM.
 
Old 03-12-2009, 12:44 PM   #3
mchriste
LQ Newbie
 
Registered: Feb 2009
Distribution: OpenSuse 11.1
Posts: 10

Original Poster
Rep: Reputation: 0
can anybody help me?

bump - did I confuse everybody with my problem?
I thought my second post clarified what I need...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
memcpy_toio transfers data in 4 byte chunks, but I need to transfer data in one lump. jbreaka4lyfe Linux - Embedded & Single-board computer 2 06-02-2008 11:25 AM
extracting data from html files into one text file adityavpratap Slackware 9 05-10-2007 10:30 AM
Extracting data from file using sed EneWolverine Programming 7 12-29-2006 09:23 AM
help extracting data from csv file willinusf Linux - General 10 10-27-2006 09:10 PM
Setting Bash Variables from outside data file llewis Programming 10 01-12-2005 04:30 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 03:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration