WGET - Problems downloading files from a password secured page
Good morning everybody,
I'm new in this forum and quite new to shell scripts as well.
For my project, I'm interested in the Scatterometer Products of Oceansat 2 from an Indian page http://www.nrsc.gov.in/
It's no problem for students to get a password to access and download their data for free.
Nevertheless it's quite complicated to download the files by hand, since you have to mark every file by hand and click on a download button at the end of the page.
When I tried it with my Script (which is below), and an internal server error 500 occured.
I hope you're not too busy and could have a look on the script where the cookie and IP are entered manually for trial purposes.
The construction of the page is:
The adress, where you have to login:
The adress, when your're logged in:
On the next page you've to choose your preferences and choose a start and ending date for your search:
Finally, the last page on which you have to mark the files:
I tried to look up via "Live HTTP Headers" which parameters are necessary to access the page, and they're basicly the ones you can see in my script, but since I'm new on this, why the error occurs:
# Log in to the server. This can be done only once.
#wget --save-cookies cookies.txt \
# --keep-session-cookies \
# --post-data 'login=MYLOGIN&password=MYPW' \
# Now grab the page or pages we care about.
wget -v --header="Cookie: JSESSIONID=72603AF8FED14FB66F4BC6F200BCC032" \
--header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0" \
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
--header="Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3" \
--header="Accept-Encoding: gzip, deflate" \
--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \
--header="Proxy-Connection: keep-alive" \
--header="X-Forwarded-For: IP.ADR.ES.S" \
Thank you very much for your help in advance,
Please use [code][/code] tags around your code, to preserve formatting and to improve readability.
I notice that you save cookies to a file, but then never use them again (you do send one through a header, but it's not clear where that one comes from).
So perhaps using "--load-cookies cookies.txt" will help.
Although actually your first "log-in" command is commented out. You didn't just forget to uncomment it, did you?
By the way, you can clean up your code a bit by storing the wget options in an array or two. This is mostly a stylistic touch, but it can also make the code easier to maintain if something changes.
Something like this:
|All times are GMT -5. The time now is 03:37 AM.|