LinuxQuestions.org - Need help with perl script to read in values from a files

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Need help with perl script to read in values from a files (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-perl-script-to-read-in-values-from-a-files-4175555497/)

Need help with perl script to read in values from a files

Hi all,

I'm trying to write a script to query a database with the scientific names of organisms for their taxonomic IDs to automatically bin DNA sequences in a dataset as being from an animal, bacteria, etc. I have a text file containing many scientific names that I would like to run through this script. I'm having a hard time editing the script to iterate through the text file pulling in variables one after the other and none of the examples I've been able to find on the web match the situation I'm in. I realize that I'll need to change the lines "my @scientific_name = ("Neosartorya fischeri NRRL 181");" and "foreach my $scientific_name (@scientific_name) {" but otherwise I'm really stuck.

Here's the script I'm trying to edit:

Code:

use strict;

use warnings;

use Bio::DB::Taxonomy;

use Bio::Tree::Tree;



my @scientific_name = ("Neosartorya fischeri NRRL 181");

my @lineages = ();

my $db = Bio::DB::Taxonomy->new(-source => 'entrez');



foreach my $scientific_name (@scientific_name) {

    my $taxon = $db->get_taxon(-name => @scientific_name);

    my $tree = Bio::Tree::Tree->new(-node => $taxon);

    my @taxa = $tree->get_nodes;

    my @tids = ();

    foreach my $t (@taxa) {

        unshift(@tids, $t->id());

    }

    push(@lineages, @scientific_name . "\t|\t" . $taxon->ancestor() . "\t|\t" . "@tids")

}



foreach my $lineage (@lineages) {

    print "$lineage\n";

}

Here's the first few lines from species_names.txt, which contains the terms I'd like to feed into my @scientific_name:

Code:

Caldicellulosiruptor owensensis OL

Homo sapiens;Homo sapiens;synthetic construct;Homo sapiens

Teredinibacter turnerae T7901

Arcobacter nitrofigilis DSM 7299

Neosartorya fischeri NRRL 181

Homo sapiens;synthetic construct

Ruegeria pomeroyi DSS-3

Planctomyces limnophilus DSM 3776

Planctomyces limnophilus DSM 3776

Flavobacteria bacterium BBFL7

Thank you!
Kevin

I'm not entirely clear, but if you are asking about how to read file line by line

Code:

open(KFILE, "<", "kfile.txt" ) or  die "Can't open kfile: $!\n";

while ( defined($krec = <KFILE>) )

{

  chomp($krec);



    # Here you do stuff with the rec; 

    # Note that your recs seem to sometimes have space separated fields, sometimes ';' separators

}



close(KFILE) or die "Can't close kfile: $!\n";

Obviously you rename the vars etc, but you get the idea.

Also, I'd avoid having scalars and arrays having effectively the same name, even if they are in separate name-spaces; it becomes prone to difficult to find typos as the code gets longer.

To get separate "fields" from your recs (if you need to), use http://perldoc.perl.org/functions/split.html

HTH - come back if you need more :)

Thanks for the help. I'm a novice with perl but between that and some more web searching, I got something that works figured out:

Code:

use strict;

use warnings;

use Bio::DB::Taxonomy;

use Bio::Tree::Tree;



my $all_sci_names = `cat scientific_names.txt`;



my @scientific_name = (split/\n/,$all_sci_names);

my @lineages = ();



my $db = Bio::DB::Taxonomy->new(-source => 'entrez');



foreach my $scientific_name (@scientific_name) {

    my $taxon = $db->get_taxon(-name => $scientific_name);

    my $tree = Bio::Tree::Tree->new(-node => $taxon);

    my @taxa = $tree->get_nodes;

    my @tids = ();

    foreach my $t (@taxa) {

        unshift(@tids, $t->id());

    }

    push(@lineages, $scientific_name . "\t|\t" . "@tids")

}



foreach my $lineage (@lineages) {

    print "$lineage\n";

}



#Notes

#http://doc.bioperl.org/bioperl-live/Bio/DB/Taxonomy.html

#http://doc.bioperl.org/bioperl-live/Bio/Tree/Tree.html

#cat mgm4664974.3_organism_GenBank.tab | awk -F "\t" '{print $13}' | sed '/semicolon.\+/d' > scientific_names.txt

#perl get_ancestor_taxonomy_from_MG-RAST_output.pl scientific_names.txt

Please use the proper Perl way of reading files as per my example.
I'd also point out that it splits on new lines by default (you can change that for funky files).

I also re-iterate my advice about not using the 'same' names for scalars/arrays (& indeed hashes).
You'll thank me later...