LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Chemistry problem- File matching and Sorting!!! (https://www.linuxquestions.org/questions/programming-9/chemistry-problem-file-matching-and-sorting-819359/)

robertselwyne 07-12-2010 04:22 AM

Chemistry problem- File matching and Sorting!!!
 
Dear Programmers

I have a file called ranking.txt, in which I have 4 chemical compounds in *.sdf file format named ligands_m1, ligands_m2, ligands_m3, ligands_m4.

Each compounds is assigned with a particular score along with the file location.
------------------------------------------------------------------------

Score Directory Name
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1
19.35 ~/chemscore/ligands_m2/ligands_m2.sdf ligands_m2
28.35 ~/chemscore/ligands_m3/ligands_m3.sdf ligands_m3
30.31 ~/chemscore/ligands_m4/ligands_m4.sdf ligands_m4

------------------------------------------------------------------------
In the same directory, I have another set of files called cluster files also in the *.sdf format.

I have included the cluster files structure below:

~/chemscore/ligands_m1/
cluster_ligands_m1_1.sdf
cluster_ligands_m1_2.sdf
cluster_ligands_m1_3.sdf


~/chemscore/ligands_m2/
cluster_ligands_m2_1.sdf
cluster_ligands_m2_2.sdf
cluster_ligands_m2_3.sdf
cluster_ligands_m2_4.sdf
cluster_ligands_m2_5.sdf

~/chemscore/ligands_m3/
cluster_ligands_m3_1.sdf

~/chemscore/ligands_m4/
cluster_ligands_m4_1.sdf
cluster_ligands_m4_2.sdf
cluster_ligands_m4_3.sdf
cluster_ligands_m4_4.sdf
------------------------------------------------------------------------

I need a script that does the following job. For example, ONLY If the score is above 28 and number of cluster files is less than or equal to 3, then write the output.

Score Directory Name Clusters
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1 3
28.35 ~/chemscore/ligands_m3/ligands_m3.sdf ligands_m3 1
------------------------------------------------------------------------

Could anybody please help me to sort out this problem?
Thank you in advance.
Robert.

bigearsbilly 07-12-2010 05:25 AM

try this:
usage: ligands.pl ranking.txt

Code:

#!/usr/bin/env perl


use strict;
use File::Basename;
local $\ = "\n";

while (<>) {
    chomp;
    next unless /./;
    my ($score, $dir, $name) = split;
    next unless $score > 28;
    $dir = dirname $dir;

    my @L = <$dir/*.sdf>;
    next unless (@L < 4);
    print ;
}


robertselwyne 07-12-2010 06:16 AM

Chemistry problem- File matching and Sorting!!!
 
Dear bigearsbilly
The script works fine thank you very much. But it didnt write the number of clusters in the last column. Could you please modify this program to add no_of_clusters as last column.

Also it will be great, if the output is written in a separate file "output.txt".

Thank you very much for your time and consideration
Regards
Robert

grail 07-12-2010 07:04 AM

Here is a slightly different take:
Code:

find -name 'cluster*' | awk -F_ '{_[$(NF-1)]++}END{while((getline < "ranking.txt") > 0)if($1 > 28 && _[$NF] >=3)print $0" "_[$NF]}'

robertselwyne 07-12-2010 07:14 AM

Chemistry problem- File matching and Sorting!!!
 
Dear Grail
Thank you for the script. Your awk script runs without any error but did not produce any output.

syg00 07-12-2010 07:19 AM

And ???.
Seems you want some-one else to do all the work for you. Personally I don't mind giving people a nudge in the right direction - I think you have certainly received that - and more.

grail 07-12-2010 07:48 AM

I am with syg00 on this one ... we have given a fairly good hand on this. I will simply add that the following is as it ran on my machine:
Code:

grail@wetworks:~$ find -name 'cluster*' | awk -F_ '{_[$(NF-1)]++}END{while((getline < "ranking.txt") > 0)if($1 > 28 && _[$NF] >=3)print $0" "_[$NF]}'
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1 3
30.31 ~/chemscore/ligands_m4/ligands_m4.sdf ligands_m4 4


bigearsbilly 07-12-2010 10:58 AM

ditto the last 2 posts.

if you must, maybe change...

print $_, scalar @L;


not tested, no warranty

robertselwyne 07-12-2010 02:36 PM

Dear Billy
Perfect.. It worked great....Thank you for the wonderful script
Regards
Robert

grail 07-12-2010 10:16 PM

Please mark as SOLVED once you have your answer


All times are GMT -5. The time now is 02:24 PM.