Visit Jeremy's Blog.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 03-23-2014, 11:27 AM   #1
Registered: May 2007
Posts: 146

Rep: Reputation: 18
Basic web scraping question(mechanize+BeautifulSoup)

Hi there
I have some web scraping code, that uses python mechanise and BeautifulSoup. I need to feed the text(html) of a web page retrieved by mechanize,to BeautifulSoup. Whenever i copy and paste the html from "page source" in firefox the code works. But whenever i do:
my_html = open('./my_htmlfile.txt', 'r')
soup = BeautifulSoup(my_html)
myfile = open('./script.html','w')
soup = BeautifulSoup(response.get_data())
Then the code doesn't work, even though when i copy-and-paste from "page source" in firefox the code does work. I know you probably don't want to debug my whole thing for me. I was just asking incase there was anything obvious i was missing in terms of what i'm feeding to BeautifulSoup when i do it programatically?
Thank you for reading and for any replies i might get
Old 03-31-2014, 04:27 PM   #2
Registered: Dec 2010
Posts: 281

Rep: Reputation: 24
diff the working and failing

what do you mean 'when i copy and paste'?

like copy and paste into a file then saving it as an .htm, or copy and pasting the page source as a string into your program?

what does my_htmlfile.txt look like? the same as your copy and pasted page source?
if you create the my_htmlfile.txt and diff it against the from page source file is there any output?

# diff my_htmlfile.txt FROM_SOURCE.htm
you may need to manipulate the input before running it through BeautifulSoup()

Last edited by cin_; 03-31-2014 at 07:37 PM. Reason: gramm`err


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
python mechanize scraping questions methodtwo Programming 4 03-14-2014 10:57 AM
[SOLVED] perl mechanize question amboxer21 Programming 2 01-06-2014 03:02 PM
LXer: Web scraping with Python (Part 2) LXer Syndicated Linux News 0 09-04-2009 09:00 PM
LXer: Web Scraping with Python LXer Syndicated Linux News 0 12-03-2008 03:40 PM
LXer: Extract data from the Internet with Web scraping LXer Syndicated Linux News 0 03-29-2006 12:55 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:09 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration