LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-16-2010, 11:07 AM   #1
coralfang
Member
 
Registered: Nov 2010
Location: Bristol, UK
Distribution: Slackware, FreeBSD
Posts: 836
Blog Entries: 3

Rep: Reputation: 297Reputation: 297Reputation: 297
Perl - How can i remove this un-wanted text?


Hello!

I am trying to parse some information from a webpage and store the output into an array. Basically i have found using HTML::TokeParser to be useful at stripping out all the html tags, i have one problem with the output though... there is 2 occurances of the same "item" within the output.

I only want to display one occurance, for example the output could be:
Code:
Keg of beer Keg of beer 700 0 Members
Cream hat Cream hat 360 0 Members
However i want it only to be:
Code:
Keg of beer 700 0 Members
Cream hat 360 0 Members
How can this be done? I am puzzled because the item-name (in the output) can vary in length of words it contains.

Thanks for any ideas/input I am really stuck here :s

Here is my code so far:
Code:
#!/usr/bin/env perl

use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

my @ge_values;
my $gebase = "http://services.runescape.com/m=itemdb_rs/";


&ge_parse_test( "hat", 1 );


#####################################


sub ge_parse_test {

	my ( $query, $page ) = @_;
	print "searching for $query...\n\n";

	my $url = ( $gebase . "results.ws?query=" . $query . "&price=all&members=&page=" . $page );
  	my $content = get( $url );
  	die "$!" unless defined $content;

	my $p = HTML::TokeParser->new( \$content );


	while ( my $token = $p->get_tag( "tr" ) ) {
	      my $text = $p->get_trimmed_text( "/tr" );
	
		if ( $text =~ m/'\ item\ Maximum\ relevance/g ) {
			
			### remove un-needed text
			$text =~ s/'\ item\ Maximum\ relevance//g;

			print $text . "\n";
			push( @ge_values, $text );
		}

	}

}
 
Old 12-16-2010, 11:36 AM   #2
smoker
Senior Member
 
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 250Reputation: 250Reputation: 250
A simple way would be to have two copies of $text just before you push it onto the array. One copy is the last entry made, and the other copy is the current entry. If they match exactly, then don't push the current one onto the array.

But I'm not sure what you are actually putting into the array, the whole line or separate words.

Last edited by smoker; 12-16-2010 at 11:39 AM.
 
  


Reply

Tags
html, length, multiple, perl, string


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Apt Wants To Remove Wanted Apps?! carlosinfl Debian 4 08-03-2007 09:22 AM
perl expert wanted Musikolo Linux - Software 2 04-03-2007 04:20 PM
how-to wanted: Remove programs without breaking ubuntu-desktop metapackage? ellakano Ubuntu 1 01-18-2007 04:33 PM
Most Wanted postscript to plain text filter for cups pddm Linux - Software 1 08-07-2006 01:00 PM
Text editor with GPG support wanted Bogdan Linux - Software 1 09-01-2004 08:16 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration