LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-05-2008, 03:42 AM   #1
nsfocus
LQ Newbie
 
Registered: May 2008
Posts: 21

Rep: Reputation: 15
A perl code but don't have result.


I want get the toplinks from a webpage,but I don't know why I can get reslut.

the original url:
http://www.ibm.com/developerworks/li...viz/index.html

Code:
#!/usr/bin/perl -w
# topLinks.pl - print the top N links from an html file using SimpleLinkExtor
use strict;
use HTML::SimpleLinkExtor;

die "usage: toplinks.pl <html_file> <number>" unless @ARGV == 2;

my $extor = HTML::SimpleLinkExtor->new();
$extor->parse_file("$ARGV[0]");

my $maxLinks = $ARGV[1];
my %linkHash = ();
my @a_hrefs  = $extor->a;

for my $link ( @a_hrefs )
{
  next unless  $link =~ /http/;  # only process http links
  $link = substr($link,7);       # remove http://

  # handle the triple slash prefix
  $link = substr($link,1) unless substr($link,0,1) ne "/";
  
  # remove everything after slash
  $link = substr($link,0,index($link,'/')) unless $link !~ /\//;

  # remove all subdomains
  $link = substr($link,index($link,".")+1) unless ($link =~ tr/\.//) == 1;

  $linkHash{$link}++;

}#for each link

my $linkCount = 0;
for my $key( sort {$linkHash{$b} <=>$linkHash{$a}} keys %linkHash )
{
  print "$key $linkHash{$key}\n";
  last unless $linkCount < $maxLinks-1;
  $linkCount++;
}
 
Old 08-05-2008, 08:18 AM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Tell us what the prob is, pref with example.
In any case,

$link = substr($link,7); # remove http://

removes everything from $link, starting at offset 7 : http://perldoc.perl.org/functions/substr.html .
I don't think you want that...

Edit: grr, that's what I get for watching TV late at night and being here; ignore this.
):

Last edited by chrism01; 08-05-2008 at 07:00 PM.
 
Old 08-05-2008, 11:03 AM   #3
nsfocus
LQ Newbie
 
Registered: May 2008
Posts: 21

Original Poster
Rep: Reputation: 15
I want to crawl the URLs in the starting page

$link = substr($link,7); # remove http://

this one just want to remove "http://"
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to pass a result of exec command in perl to a variable??? HyperTrey Programming 3 05-23-2008 12:47 PM
Error in Perl Code : Bad switch statement(Problem in code block)? near ## line # suyog255 Programming 4 02-20-2008 05:35 PM
I just cut and paste the code, why the result is different? keiwu Programming 22 02-17-2005 12:16 AM
Need to assign result of grep to var in PERL amytys Programming 1 09-23-2004 06:25 PM


All times are GMT -5. The time now is 05:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration