Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 12-16-2010, 11:07 AM   #1
Registered: Nov 2010
Location: Bristol, UK
Distribution: Slackware, FreeBSD
Posts: 487
Blog Entries: 2

Rep: Reputation: 153Reputation: 153
Perl - How can i remove this un-wanted text?


I am trying to parse some information from a webpage and store the output into an array. Basically i have found using HTML::TokeParser to be useful at stripping out all the html tags, i have one problem with the output though... there is 2 occurances of the same "item" within the output.

I only want to display one occurance, for example the output could be:
Keg of beer Keg of beer 700 0 Members
Cream hat Cream hat 360 0 Members
However i want it only to be:
Keg of beer 700 0 Members
Cream hat 360 0 Members
How can this be done? I am puzzled because the item-name (in the output) can vary in length of words it contains.

Thanks for any ideas/input I am really stuck here :s

Here is my code so far:
#!/usr/bin/env perl

use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

my @ge_values;
my $gebase = "";

&ge_parse_test( "hat", 1 );


sub ge_parse_test {

	my ( $query, $page ) = @_;
	print "searching for $query...\n\n";

	my $url = ( $gebase . "" . $query . "&price=all&members=&page=" . $page );
  	my $content = get( $url );
  	die "$!" unless defined $content;

	my $p = HTML::TokeParser->new( \$content );

	while ( my $token = $p->get_tag( "tr" ) ) {
	      my $text = $p->get_trimmed_text( "/tr" );
		if ( $text =~ m/'\ item\ Maximum\ relevance/g ) {
			### remove un-needed text
			$text =~ s/'\ item\ Maximum\ relevance//g;

			print $text . "\n";
			push( @ge_values, $text );


Old 12-16-2010, 11:36 AM   #2
Senior Member
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 249Reputation: 249Reputation: 249
A simple way would be to have two copies of $text just before you push it onto the array. One copy is the last entry made, and the other copy is the current entry. If they match exactly, then don't push the current one onto the array.

But I'm not sure what you are actually putting into the array, the whole line or separate words.

Last edited by smoker; 12-16-2010 at 11:39 AM.


html, length, multiple, perl, string

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Apt Wants To Remove Wanted Apps?! carlosinfl Debian 4 08-03-2007 09:22 AM
perl expert wanted Musikolo Linux - Software 2 04-03-2007 04:20 PM
how-to wanted: Remove programs without breaking ubuntu-desktop metapackage? ellakano Ubuntu 1 01-18-2007 04:33 PM
Most Wanted postscript to plain text filter for cups pddm Linux - Software 1 08-07-2006 01:00 PM
Text editor with GPG support wanted Bogdan Linux - Software 1 09-01-2004 08:16 AM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:29 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration