LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 07-13-2011, 03:33 AM   #1
scrabble
LQ Newbie
 
Registered: Jul 2011
Posts: 14

Rep: Reputation: Disabled
WGET - Problems downloading files from a password secured page


Good morning everybody,
I'm new in this forum and quite new to shell scripts as well.

For my project, I'm interested in the Scatterometer Products of Oceansat 2 from an Indian page http://www.nrsc.gov.in/

It's no problem for students to get a password to access and download their data for free.
Nevertheless it's quite complicated to download the files by hand, since you have to mark every file by hand and click on a download button at the end of the page.
When I tried it with my Script (which is below), and an internal server error 500 occured.
I hope you're not too busy and could have a look on the script where the cookie and IP are entered manually for trial purposes.
The construction of the page is:
The adress, where you have to login:
http://218.248.0.134:8080/OCMWebSCAT...controller.jsp
The adress, when your're logged in:
http://218.248.0.134:8080/OCMWebSCAT...controller.jsp
On the next page you've to choose your preferences and choose a start and ending date for your search:
http://218.248.0.134:8080/OCMWebSCAT...ction=ScatHome
Finally, the last page on which you have to mark the files:
http://218.248.0.134:8080/OCMWebSCAT...JUL-2011&tag=D

I tried to look up via "Live HTTP Headers" which parameters are necessary to access the page, and they're basicly the ones you can see in my script, but since I'm new on this, why the error occurs:


#!/bin/bash

##
## oceansat
##

DATE1=01-JUL-2011
DATE2=03-JUL-2011


# Log in to the server. This can be done only once.
#wget --save-cookies cookies.txt \
# --keep-session-cookies \
# --post-data 'login=MYLOGIN&password=MYPW' \
# http://218.248.0.134:8080/OCMWebSCAT...controller.jsp

# Now grab the page or pages we care about.
wget -v --header="Cookie: JSESSIONID=72603AF8FED14FB66F4BC6F200BCC032" \
--header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0" \
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
--header="Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3" \
--header="Accept-Encoding: gzip, deflate" \
--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \
--header="Proxy-Connection: keep-alive" \
--header="X-Forwarded-For: IP.ADR.ES.S" \
--referer="http://218.248.0.134:8080/OCMWbSCAT/html/controller.jsp" \
'http://218.248.0.134:8080/OCMWebSCAT/html/controller.jsp?action=ScatHome'

Thank you very much for your help in advance,
Robert
 
Old 07-13-2011, 09:18 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Please use [code][/code] tags around your code, to preserve formatting and to improve readability.

I notice that you save cookies to a file, but then never use them again (you do send one through a header, but it's not clear where that one comes from).

So perhaps using "--load-cookies cookies.txt" will help.

Although actually your first "log-in" command is commented out. You didn't just forget to uncomment it, did you?


By the way, you can clean up your code a bit by storing the wget options in an array or two. This is mostly a stylistic touch, but it can also make the code easier to maintain if something changes.

Something like this:

Code:
baseurl="http://218.248.0.134:8080/OCMWebSCAT/html/"

# Log in to the server.
# define an array of options first
wgetopts=( --save-cookies cookies.txt --keep-session-cookies --post-data 'login=MYLOGIN&password=MYPW' )

wget "${wgetopts[@]}" "$baseurl/controller.jsp"

# Now grab the page or pages we care about.
# one array for headers, and another for general options
wgetopts=( -v --referer="http://218.248.0.134:8080/OCMWbSCAT/html/controller.jsp" --load-cookies cookies.txt )

headers=( "Cookie: JSESSIONID=72603AF8FED14FB66F4BC6F200BCC032"
          "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:5.0) Gecko/20100101 Firefox/5.0"
          "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
          "Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3"
          "Accept-Encoding: gzip, deflate"
          "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"
          "Proxy-Connection: keep-alive"
          "X-Forwarded-For: IP.ADR.ES.S"
        )

wget "{wgetopts[@]}" "${headers[@]/#/--header=}" "$baseurl/controller.jsp?action=ScatHome"
"${headers[@]/#/--header=}" adds the string "--header=" to the front of each array element as they're expanded.
 
  


Reply

Tags
headers, http, live, password, wget


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
wget problem - not downloading ALL files AwW41 Linux - Software 2 08-28-2008 08:46 PM
wget - downloading files from a directory Oris13 Linux - General 8 05-15-2008 06:42 PM
How to set path for downloading files for wget vineet7kumar Linux - Newbie 2 04-24-2008 05:38 AM
problems downloading .iso files tha_newbfather Linux - General 3 12-22-2002 12:37 PM


All times are GMT -5. The time now is 01:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration