Hi.
Sometimes the page may have lines that are short enough to read easily once they have put in an appropriate structure. Here's an example that looks for string
Models at weather.gov:
Code:
#!/usr/bin/perl
# @(#) p2 Demonstrate string extraction bounded by newlines in scalar.
use warnings;
use strict;
use LWP::Simple;
my ($debug);
$debug = 0;
$debug = 1;
my ( $chars, $content, $t1, @a );
my ( @occurrences, $hits );
my ($string) = "Models";
my ($url) = "http://www.weather.gov/";
my ($line) = 0;
$content = get($url);
die "Couldn't get it!" unless defined $content;
$chars = length($content);
print " Got $chars characters from $url\n" if $debug;
@a = split /\n/, $content;
$t1 = scalar @a;
print " Split content into $t1 line array.\n" if $debug;
@occurrences = grep /$string/, @a;
$hits = scalar @occurrences;
print " Got $hits for string $string\n" if $debug;
print " Extracted:\n";
foreach $t1 (@occurrences) {
print "$t1\n";
}
exit(0);
Producing:
Code:
% ./p2
Got 90367 characters from http://www.weather.gov/
Split content into 683 line array.
Got 6 for string Models
Extracted:
<td class="white" id="menuitem"><a href="/maps.php"><span class="yellow">Forecast Models</span></a><br />
<a href="http://www.nco.ncep.noaa.gov/pmb/nwprod/analysis/">Numerical Models</a><br />
Statistical Models...<br />
<p class="bottomnav"><a href="/maps.php">Forecast Models</a></p>
<span class="smalllink"><a href="http://www.nco.ncep.noaa.gov/pmb/nwprod/analysis/">Numerical Models</a></span><br />
<span class="smalllink">Statistical Models</span><br />
Once the page is in the scalar, split is used to make entries in an array for each line -- text ending in a newline, "\n".
Then, as Chris did, the grep function is used to extract the lines containing the string of interest ... cheers, makyo