![]() |
Python IndexError: list index out of range (Web Scrapper)
Hey all!
Having a bit of an issue with Python while trying to write a script to download every rar file on a webpage. The script successfully downloads any link that doesn't contain any spaces, etc. But when it hits a url like: http://www.insidepro.com/dictionaries/Belarusian (Classical Spelling).rar It fails...I'm sure this is something simple, but I'm so new to python I'm not sure what to do! Thank you in advance. Code:
import urllib2 |
Not really familiar with Python, but my first thought would be that the spaces should either be escaped, or the entire URL encapsulated within quotes.
Just my .02. |
Oh yeah, I forgot to mention it, but you could also look at the file it's attempting to download when it does get one with spaces.
|
Escaping with Quotes
Tried escaping iwht quotes...didn't work either!
|
Have you tried looking at the filename it attempts to download when met with a URL with spaces?
|
Look at the filename assignment. This may help you.
To make this one more versatile you can construct your os.system() line from user input Code:
|
Still not working...O'well.
Nope..still craps the bed with a bunch of different errors...
I was trying to do this as a project rather than using bash scripting, but I guess trying to reinvent the wheel for fun is an exercise in futility when you don't completely understand the programming language at hand. So back to the basics...wget it is! Thank you all for the help! I really appreciate it! -V |
If your parsing html documents (or xml) you really should look at BeautifulSoup. It makes parsing html stuff as in webscraping a real doddle. I've bashed together a little python script that should do what you want downloading all the dictionaries from the page you showed in your original code. As you can see it's very small as BeautifulSoup does all the hard work. Anyway here it is :
Code:
from urllib2 import urlopen, quote |
| All times are GMT -5. The time now is 08:39 AM. |