LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   WGET - Problems downloading files from a password secured page (https://www.linuxquestions.org/questions/linux-software-2/wget-problems-downloading-files-from-a-password-secured-page-891434/)

scrabble 07-13-2011 02:33 AM

WGET - Problems downloading files from a password secured page
 
Good morning everybody,
I'm new in this forum and quite new to shell scripts as well.

For my project, I'm interested in the Scatterometer Products of Oceansat 2 from an Indian page http://www.nrsc.gov.in/

It's no problem for students to get a password to access and download their data for free.
Nevertheless it's quite complicated to download the files by hand, since you have to mark every file by hand and click on a download button at the end of the page.
When I tried it with my Script (which is below), and an internal server error 500 occured.
I hope you're not too busy and could have a look on the script where the cookie and IP are entered manually for trial purposes.
The construction of the page is:
The adress, where you have to login:
http://218.248.0.134:8080/OCMWebSCAT...controller.jsp
The adress, when your're logged in:
http://218.248.0.134:8080/OCMWebSCAT...controller.jsp
On the next page you've to choose your preferences and choose a start and ending date for your search:
http://218.248.0.134:8080/OCMWebSCAT...ction=ScatHome
Finally, the last page on which you have to mark the files:
http://218.248.0.134:8080/OCMWebSCAT...JUL-2011&tag=D

I tried to look up via "Live HTTP Headers" which parameters are necessary to access the page, and they're basicly the ones you can see in my script, but since I'm new on this, why the error occurs:


#!/bin/bash

##
## oceansat
##

DATE1=01-JUL-2011
DATE2=03-JUL-2011


# Log in to the server. This can be done only once.
#wget --save-cookies cookies.txt \
# --keep-session-cookies \
# --post-data 'login=MYLOGIN&password=MYPW' \
# http://218.248.0.134:8080/OCMWebSCAT...controller.jsp

# Now grab the page or pages we care about.
wget -v --header="Cookie: JSESSIONID=72603AF8FED14FB66F4BC6F200BCC032" \
--header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0" \
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
--header="Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3" \
--header="Accept-Encoding: gzip, deflate" \
--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \
--header="Proxy-Connection: keep-alive" \
--header="X-Forwarded-For: IP.ADR.ES.S" \
--referer="http://218.248.0.134:8080/OCMWbSCAT/html/controller.jsp" \
'http://218.248.0.134:8080/OCMWebSCAT/html/controller.jsp?action=ScatHome'

Thank you very much for your help in advance,
Robert

David the H. 07-13-2011 08:18 AM

Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

I notice that you save cookies to a file, but then never use them again (you do send one through a header, but it's not clear where that one comes from).

So perhaps using "--load-cookies cookies.txt" will help.

Although actually your first "log-in" command is commented out. You didn't just forget to uncomment it, did you?


By the way, you can clean up your code a bit by storing the wget options in an array or two. This is mostly a stylistic touch, but it can also make the code easier to maintain if something changes.

Something like this:

Code:

baseurl="http://218.248.0.134:8080/OCMWebSCAT/html/"

# Log in to the server.
# define an array of options first
wgetopts=( --save-cookies cookies.txt --keep-session-cookies --post-data 'login=MYLOGIN&password=MYPW' )

wget "${wgetopts[@]}" "$baseurl/controller.jsp"

# Now grab the page or pages we care about.
# one array for headers, and another for general options
wgetopts=( -v --referer="http://218.248.0.134:8080/OCMWbSCAT/html/controller.jsp" --load-cookies cookies.txt )

headers=( "Cookie: JSESSIONID=72603AF8FED14FB66F4BC6F200BCC032"
          "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0"
          "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
          "Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3"
          "Accept-Encoding: gzip, deflate"
          "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"
          "Proxy-Connection: keep-alive"
          "X-Forwarded-For: IP.ADR.ES.S"
        )

wget "{wgetopts[@]}" "${headers[@]/#/--header=}" "$baseurl/controller.jsp?action=ScatHome"

"${headers[@]/#/--header=}" adds the string "--header=" to the front of each array element as they're expanded.


All times are GMT -5. The time now is 12:22 PM.