LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help with perl script to read in values from a files (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-perl-script-to-read-in-values-from-a-files-4175555497/)

kmkocot 10-07-2015 12:29 AM

Need help with perl script to read in values from a files
 
Hi all,

I'm trying to write a script to query a database with the scientific names of organisms for their taxonomic IDs to automatically bin DNA sequences in a dataset as being from an animal, bacteria, etc. I have a text file containing many scientific names that I would like to run through this script. I'm having a hard time editing the script to iterate through the text file pulling in variables one after the other and none of the examples I've been able to find on the web match the situation I'm in. I realize that I'll need to change the lines "my @scientific_name = ("Neosartorya fischeri NRRL 181");" and "foreach my $scientific_name (@scientific_name) {" but otherwise I'm really stuck.


Here's the script I'm trying to edit:
Code:

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::Tree::Tree;

my @scientific_name = ("Neosartorya fischeri NRRL 181");
my @lineages = ();
my $db = Bio::DB::Taxonomy->new(-source => 'entrez');

foreach my $scientific_name (@scientific_name) {
    my $taxon = $db->get_taxon(-name => @scientific_name);
    my $tree = Bio::Tree::Tree->new(-node => $taxon);
    my @taxa = $tree->get_nodes;
    my @tids = ();
    foreach my $t (@taxa) {
        unshift(@tids, $t->id());
    }
    push(@lineages, @scientific_name . "\t|\t" . $taxon->ancestor() . "\t|\t" . "@tids")
}

foreach my $lineage (@lineages) {
    print "$lineage\n";
}

Here's the first few lines from species_names.txt, which contains the terms I'd like to feed into my @scientific_name:
Code:

Caldicellulosiruptor owensensis OL
Homo sapiens;Homo sapiens;synthetic construct;Homo sapiens
Teredinibacter turnerae T7901
Arcobacter nitrofigilis DSM 7299
Neosartorya fischeri NRRL 181
Homo sapiens;synthetic construct
Ruegeria pomeroyi DSS-3
Planctomyces limnophilus DSM 3776
Planctomyces limnophilus DSM 3776
Flavobacteria bacterium BBFL7

Thank you!
Kevin

chrism01 10-07-2015 02:40 AM

I'm not entirely clear, but if you are asking about how to read file line by line
Code:

open(KFILE, "<", "kfile.txt" ) or  die "Can't open kfile: $!\n";
while ( defined($krec = <KFILE>) )
{
  chomp($krec);

    # Here you do stuff with the rec;
    # Note that your recs seem to sometimes have space separated fields, sometimes ';' separators
}

close(KFILE) or die "Can't close kfile: $!\n";

Obviously you rename the vars etc, but you get the idea.

Also, I'd avoid having scalars and arrays having effectively the same name, even if they are in separate name-spaces; it becomes prone to difficult to find typos as the code gets longer.

To get separate "fields" from your recs (if you need to), use http://perldoc.perl.org/functions/split.html


HTH - come back if you need more :)

kmkocot 10-08-2015 06:47 PM

Thanks for the help. I'm a novice with perl but between that and some more web searching, I got something that works figured out:

Code:

use strict;
use warnings;
use Bio::DB::Taxonomy;
use Bio::Tree::Tree;

my $all_sci_names = `cat scientific_names.txt`;

my @scientific_name = (split/\n/,$all_sci_names);
my @lineages = ();

my $db = Bio::DB::Taxonomy->new(-source => 'entrez');

foreach my $scientific_name (@scientific_name) {
    my $taxon = $db->get_taxon(-name => $scientific_name);
    my $tree = Bio::Tree::Tree->new(-node => $taxon);
    my @taxa = $tree->get_nodes;
    my @tids = ();
    foreach my $t (@taxa) {
        unshift(@tids, $t->id());
    }
    push(@lineages, $scientific_name . "\t|\t" . "@tids")
}

foreach my $lineage (@lineages) {
    print "$lineage\n";
}

#Notes
#http://doc.bioperl.org/bioperl-live/Bio/DB/Taxonomy.html
#http://doc.bioperl.org/bioperl-live/Bio/Tree/Tree.html
#cat mgm4664974.3_organism_GenBank.tab | awk -F "\t" '{print $13}' | sed '/semicolon.\+/d' > scientific_names.txt
#perl get_ancestor_taxonomy_from_MG-RAST_output.pl scientific_names.txt


chrism01 10-08-2015 07:50 PM

Please use the proper Perl way of reading files as per my example.
I'd also point out that it splits on new lines by default (you can change that for funky files).

I also re-iterate my advice about not using the 'same' names for scalars/arrays (& indeed hashes).
You'll thank me later...

pan64 10-09-2015 02:30 AM

http://stackoverflow.com/questions/7...rray-with-perl


All times are GMT -5. The time now is 10:05 PM.