LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Ubuntu Web Crawler - harvestman (https://www.linuxquestions.org/questions/linux-software-2/ubuntu-web-crawler-harvestman-755563/)

XeroXer 09-16-2009 04:32 AM

Ubuntu Web Crawler - harvestman
 
Hi all!

Am trying to find a good web crawler for Ubuntu. I am using Ubuntu 9.04 - the Jaunty Jackalope.
The first tip I got was harvestman, and I can't stop getting errors with that one.

First I installed it:
apt-get install harvestman
Then ran it:
harvestman www.google.com
Then got error:
Exception in thread fetcher0:
File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, in is_url_cache_uptodate
import hashilb
ImportError: No module named hashilb

So I opened the file /usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py on line 153 and changed the line to:
import hashlib
Then ran again:
harvestman www.google.com
It now began downloading a bit without the error, happy me. Then another error:
Exception in thread crawler3:
File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 836, in get_content_type
return ctyp
UnboundLocalError: local variable 'ctyp' referenced before assignment


Now I don't know how to fix this error and I thought maybe someone here can help.
OR if you have a suggestion for me of a better web crawler that I can use.

jeremy 09-30-2009 01:57 PM

http://www.linuxquestions.org/questi...-linux-248214/ should be of some help.

--jeremy


All times are GMT -5. The time now is 04:07 AM.