Basic web scraping question(mechanize+BeautifulSoup)
Hi there
I have some web scraping code, that uses python mechanise and BeautifulSoup. I need to feed the text(html) of a web page retrieved by mechanize,to BeautifulSoup. Whenever i copy and paste the html from "page source" in firefox the code works. But whenever i do: Code:
file("my_htmlfile.txt","w").write(self.br.open(site_url+'page.aspx').read()) Code:
myfile = open('./script.html','w') Code:
soup = BeautifulSoup(response.get_data()) Thank you for reading and for any replies i might get |
diff the working and failing
what do you mean 'when i copy and paste'?
like copy and paste into a file then saving it as an .htm, or copy and pasting the page source as a string into your program? what does my_htmlfile.txt look like? the same as your copy and pasted page source? if you create the my_htmlfile.txt and diff it against the from page source file is there any output? Code:
# diff my_htmlfile.txt FROM_SOURCE.htm |
All times are GMT -5. The time now is 07:50 PM. |