LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Perl - How can i remove this un-wanted text? (http://www.linuxquestions.org/questions/programming-9/perl-how-can-i-remove-this-un-wanted-text-850690/)

coralfang 12-16-2010 11:07 AM

Perl - How can i remove this un-wanted text?
 
Hello!

I am trying to parse some information from a webpage and store the output into an array. Basically i have found using HTML::TokeParser to be useful at stripping out all the html tags, i have one problem with the output though... there is 2 occurances of the same "item" within the output.

I only want to display one occurance, for example the output could be:
Code:

Keg of beer Keg of beer 700 0 Members
Cream hat Cream hat 360 0 Members

However i want it only to be:
Code:

Keg of beer 700 0 Members
Cream hat 360 0 Members

How can this be done? I am puzzled because the item-name (in the output) can vary in length of words it contains.

Thanks for any ideas/input :D I am really stuck here :s

Here is my code so far:
Code:

#!/usr/bin/env perl

use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

my @ge_values;
my $gebase = "http://services.runescape.com/m=itemdb_rs/";


&ge_parse_test( "hat", 1 );


#####################################


sub ge_parse_test {

        my ( $query, $page ) = @_;
        print "searching for $query...\n\n";

        my $url = ( $gebase . "results.ws?query=" . $query . "&price=all&members=&page=" . $page );
          my $content = get( $url );
          die "$!" unless defined $content;

        my $p = HTML::TokeParser->new( \$content );


        while ( my $token = $p->get_tag( "tr" ) ) {
              my $text = $p->get_trimmed_text( "/tr" );
       
                if ( $text =~ m/'\ item\ Maximum\ relevance/g ) {
                       
                        ### remove un-needed text
                        $text =~ s/'\ item\ Maximum\ relevance//g;

                        print $text . "\n";
                        push( @ge_values, $text );
                }

        }

}


smoker 12-16-2010 11:36 AM

A simple way would be to have two copies of $text just before you push it onto the array. One copy is the last entry made, and the other copy is the current entry. If they match exactly, then don't push the current one onto the array.

But I'm not sure what you are actually putting into the array, the whole line or separate words.


All times are GMT -5. The time now is 05:31 AM.