Using WWW::Mechanize in perl
I have a query that fetches huge number of links (50,000 odd) from the db (DB2).
Im looping through these links and for each link im checking whether its a valid link or not using Mechanize $agent->get( "$linkurl" ); I have a set of regexes that match for certain conditions like "Not found, page cannot be displayed, and various errors specific to each link" if ($agent->{content} =~ m// The problem is this thing is hogging up too much memory and the Perl script takes over 2-3 hours sometimes.. Is there a way for me to optimize this to perform faster. Thanks in advance Nigel |
Here's an example I found: just gets the header info, which should be enough.
BTW, 50,000 is a lot. you might want to consider splitting the load across multiple copies of the prog and running them in a parallel. I'd try to split by website or some such ie each prog checks related links.. Code:
#!/usr/bin/perl -w |
Chris,
Thanks for that post. But what im also looking for are pages that have a custom error thrown in. Lets just say im checking the link for any site XYZ Here XYZ may have their own method of handling erroneous pages. I need to capture all thess links that are invalid as well. I just need to find a way for the script to use less memory. |
It's the old CPU vs RAM issue; if you're really that worried about RAM, do them 1 at a time via eg Mechanize or some such.
HTML is only text, so each page shouldn't take that much RAM ... I think that code does check each link on the pages it finds ? |
All times are GMT -5. The time now is 07:22 PM. |