LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-16-2009, 04:32 AM   #1
XeroXer
LQ Newbie
 
Registered: Jun 2008
Location: Västerås, Sweden
Distribution: Arch Linux, Debian, Ubuntu
Posts: 21

Rep: Reputation: 16
Ubuntu Web Crawler - harvestman


Hi all!

Am trying to find a good web crawler for Ubuntu. I am using Ubuntu 9.04 - the Jaunty Jackalope.
The first tip I got was harvestman, and I can't stop getting errors with that one.

First I installed it:
apt-get install harvestman
Then ran it:
harvestman www.google.com
Then got error:
Exception in thread fetcher0:
File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, in is_url_cache_uptodate
import hashilb
ImportError: No module named hashilb

So I opened the file /usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py on line 153 and changed the line to:
import hashlib
Then ran again:
harvestman www.google.com
It now began downloading a bit without the error, happy me. Then another error:
Exception in thread crawler3:
File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 836, in get_content_type
return ctyp
UnboundLocalError: local variable 'ctyp' referenced before assignment


Now I don't know how to fix this error and I thought maybe someone here can help.
OR if you have a suggestion for me of a better web crawler that I can use.
 
Old 09-30-2009, 01:57 PM   #2
jeremy
root
 
Registered: Jun 2000
Distribution: Debian, Red Hat, Slackware, Fedora, Ubuntu
Posts: 13,432

Rep: Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988Reputation: 3988
http://www.linuxquestions.org/questi...-linux-248214/ should be of some help.

--jeremy
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
wget, web crawler, web spider and web archiving beckettisdogg Linux - Newbie 1 08-16-2009 07:27 AM
web crawler/viewer microsoft/linux General 10 05-07-2006 02:31 AM
wget as web spider/crawler kpachopoulos Linux - Software 2 08-27-2005 12:58 PM
I need a web crawler and indexer for linux jrenzi Programming 2 10-28-2004 01:11 AM
linux web crawler demmylls Linux - Software 2 03-06-2004 08:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration