Published at LXer:
Next in our series of Python modules you should know is Scrapy. Do you want to be the next Google ? Well read on. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. You can use Scrapy to extract any kind of data from a web page, in HTML, XML, CSV and other formats. I recently used it to automate the extraction of domains and emails on the ISPA Spam Hall of Shame list, for use in a DNSBL.
Read More...