LinuxQuestions.org - Basic web scraping question(mechanize+BeautifulSoup)

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Basic web scraping question(mechanize+BeautifulSoup) (https://www.linuxquestions.org/questions/programming-9/basic-web-scraping-question-mechanize-beautifulsoup-4175499184/)

Basic web scraping question(mechanize+BeautifulSoup)

Hi there
I have some web scraping code, that uses python mechanise and BeautifulSoup. I need to feed the text(html) of a web page retrieved by mechanize,to BeautifulSoup. Whenever i copy and paste the html from "page source" in firefox the code works. But whenever i do:

Code:

file("my_htmlfile.txt","w").write(self.br.open(site_url+'page.aspx').read())

my_html = open('./my_htmlfile.txt', 'r')

soup = BeautifulSoup(my_html)

Or:

Code:

myfile = open('./script.html','w')

myfile.write(response.read())

Or:

Code:

soup = BeautifulSoup(response.get_data())

Then the code doesn't work, even though when i copy-and-paste from "page source" in firefox the code does work. I know you probably don't want to debug my whole thing for me. I was just asking incase there was anything obvious i was missing in terms of what i'm feeding to BeautifulSoup when i do it programatically?
Thank you for reading and for any replies i might get

diff the working and failing

what do you mean 'when i copy and paste'?

like copy and paste into a file then saving it as an .htm, or copy and pasting the page source as a string into your program?

what does my_htmlfile.txt look like? the same as your copy and pasted page source?
if you create the my_htmlfile.txt and diff it against the from page source file is there any output?

Code:

# diff my_htmlfile.txt FROM_SOURCE.htm

#

you may need to manipulate the input before running it through BeautifulSoup()