XeroXer 09-16-2009 04:32 AM

Ubuntu Web Crawler - harvestman
Hi all!

Am trying to find a good web crawler for Ubuntu. I am using Ubuntu 9.04 - the Jaunty Jackalope.
The first tip I got was harvestman, and I can't stop getting errors with that one.

First I installed it:
apt-get install harvestman
Then ran it:
Then got error:
Exception in thread fetcher0:
File "/usr/lib/python2.6/dist-packages/HarvestMan/", line 153, in is_url_cache_uptodate
import hashilb
ImportError: No module named hashilb

So I opened the file /usr/lib/python2.6/dist-packages/HarvestMan/ on line 153 and changed the line to:
import hashlib
Then ran again:
It now began downloading a bit without the error, happy me. Then another error:
Exception in thread crawler3:
File "/usr/lib/python2.6/dist-packages/HarvestMan/", line 836, in get_content_type
return ctyp
UnboundLocalError: local variable 'ctyp' referenced before assignment

Now I don't know how to fix this error and I thought maybe someone here can help.
OR if you have a suggestion for me of a better web crawler that I can use.

jeremy 09-30-2009 01:57 PM should be of some help.


