LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-12-2010, 05:22 AM   #1
robertselwyne
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Rep: Reputation: 0
Chemistry problem- File matching and Sorting!!!


Dear Programmers

I have a file called ranking.txt, in which I have 4 chemical compounds in *.sdf file format named ligands_m1, ligands_m2, ligands_m3, ligands_m4.

Each compounds is assigned with a particular score along with the file location.
------------------------------------------------------------------------

Score Directory Name
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1
19.35 ~/chemscore/ligands_m2/ligands_m2.sdf ligands_m2
28.35 ~/chemscore/ligands_m3/ligands_m3.sdf ligands_m3
30.31 ~/chemscore/ligands_m4/ligands_m4.sdf ligands_m4

------------------------------------------------------------------------
In the same directory, I have another set of files called cluster files also in the *.sdf format.

I have included the cluster files structure below:

~/chemscore/ligands_m1/
cluster_ligands_m1_1.sdf
cluster_ligands_m1_2.sdf
cluster_ligands_m1_3.sdf


~/chemscore/ligands_m2/
cluster_ligands_m2_1.sdf
cluster_ligands_m2_2.sdf
cluster_ligands_m2_3.sdf
cluster_ligands_m2_4.sdf
cluster_ligands_m2_5.sdf

~/chemscore/ligands_m3/
cluster_ligands_m3_1.sdf

~/chemscore/ligands_m4/
cluster_ligands_m4_1.sdf
cluster_ligands_m4_2.sdf
cluster_ligands_m4_3.sdf
cluster_ligands_m4_4.sdf
------------------------------------------------------------------------

I need a script that does the following job. For example, ONLY If the score is above 28 and number of cluster files is less than or equal to 3, then write the output.

Score Directory Name Clusters
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1 3
28.35 ~/chemscore/ligands_m3/ligands_m3.sdf ligands_m3 1
------------------------------------------------------------------------

Could anybody please help me to sort out this problem?
Thank you in advance.
Robert.
 
Old 07-12-2010, 06:25 AM   #2
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: NetBSD, Void, Debian, Mint, Ubuntu, Puppy, Raspbian
Posts: 3,487

Rep: Reputation: 233Reputation: 233Reputation: 233
try this:
usage: ligands.pl ranking.txt

Code:
#!/usr/bin/env perl


use strict;
use File::Basename;
local $\ = "\n";

while (<>) {
    chomp;
    next unless /./;
    my ($score, $dir, $name) = split;
    next unless $score > 28;
    $dir = dirname $dir;

    my @L = <$dir/*.sdf>;
    next unless (@L < 4);
    print ;
}

Last edited by bigearsbilly; 07-12-2010 at 06:34 AM. Reason: it was wrong
 
Old 07-12-2010, 07:16 AM   #3
robertselwyne
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Chemistry problem- File matching and Sorting!!!

Dear bigearsbilly
The script works fine thank you very much. But it didnt write the number of clusters in the last column. Could you please modify this program to add no_of_clusters as last column.

Also it will be great, if the output is written in a separate file "output.txt".

Thank you very much for your time and consideration
Regards
Robert
 
Old 07-12-2010, 08:04 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,804

Rep: Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069
Here is a slightly different take:
Code:
find -name 'cluster*' | awk -F_ '{_[$(NF-1)]++}END{while((getline < "ranking.txt") > 0)if($1 > 28 && _[$NF] >=3)print $0" "_[$NF]}'
 
Old 07-12-2010, 08:14 AM   #5
robertselwyne
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Chemistry problem- File matching and Sorting!!!

Dear Grail
Thank you for the script. Your awk script runs without any error but did not produce any output.
 
Old 07-12-2010, 08:19 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,377

Rep: Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420Reputation: 3420
And ???.
Seems you want some-one else to do all the work for you. Personally I don't mind giving people a nudge in the right direction - I think you have certainly received that - and more.
 
0 members found this post helpful.
Old 07-12-2010, 08:48 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,804

Rep: Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069
I am with syg00 on this one ... we have given a fairly good hand on this. I will simply add that the following is as it ran on my machine:
Code:
grail@wetworks:~$ find -name 'cluster*' | awk -F_ '{_[$(NF-1)]++}END{while((getline < "ranking.txt") > 0)if($1 > 28 && _[$NF] >=3)print $0" "_[$NF]}'
37.36 ~/chemscore/ligands_m1/ligands_m1.sdf ligands_m1 3
30.31 ~/chemscore/ligands_m4/ligands_m4.sdf ligands_m4 4
 
Old 07-12-2010, 11:58 AM   #8
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: NetBSD, Void, Debian, Mint, Ubuntu, Puppy, Raspbian
Posts: 3,487

Rep: Reputation: 233Reputation: 233Reputation: 233
ditto the last 2 posts.

if you must, maybe change...

print $_, scalar @L;


not tested, no warranty
 
Old 07-12-2010, 03:36 PM   #9
robertselwyne
LQ Newbie
 
Registered: Jul 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Dear Billy
Perfect.. It worked great....Thank you for the wonderful script
Regards
Robert
 
Old 07-12-2010, 11:16 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,804

Rep: Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069
Please mark as SOLVED once you have your answer
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with sorting contents in a file - "10" being put up the top onesikgypo Programming 4 08-25-2009 08:19 AM
Help with pattern matching, sorting data with awk/gawk or perl placem Programming 2 09-11-2008 03:26 PM
Need to find if a matching file exist from a list of possible file names wit_273 Linux - General 5 10-25-2007 10:47 AM
Bash File Name Matching - Binary file .ogg matches !!! maxvonseibold Linux - General 8 01-30-2007 07:31 PM
File sorting problem Kerridis Linux - General 3 03-08-2004 11:44 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration