Hello!
I am trying to parse some information from a webpage and store the output into an array. Basically i have found using HTML::TokeParser to be useful at stripping out all the html tags, i have one problem with the output though... there is 2 occurances of the same "item" within the output.
I only want to display one occurance, for example the output could be:
Code:
Keg of beer Keg of beer 700 0 Members
Cream hat Cream hat 360 0 Members
However i want it only to be:
Code:
Keg of beer 700 0 Members
Cream hat 360 0 Members
How can this be done? I am puzzled because the item-name (in the output) can vary in length of words it contains.
Thanks for any ideas/input
I am really stuck here :s
Here is my code so far:
Code:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;
my @ge_values;
my $gebase = "http://services.runescape.com/m=itemdb_rs/";
&ge_parse_test( "hat", 1 );
#####################################
sub ge_parse_test {
my ( $query, $page ) = @_;
print "searching for $query...\n\n";
my $url = ( $gebase . "results.ws?query=" . $query . "&price=all&members=&page=" . $page );
my $content = get( $url );
die "$!" unless defined $content;
my $p = HTML::TokeParser->new( \$content );
while ( my $token = $p->get_tag( "tr" ) ) {
my $text = $p->get_trimmed_text( "/tr" );
if ( $text =~ m/'\ item\ Maximum\ relevance/g ) {
### remove un-needed text
$text =~ s/'\ item\ Maximum\ relevance//g;
print $text . "\n";
push( @ge_values, $text );
}
}
}