LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora
User Name
Password
Fedora This forum is for the discussion of the Fedora Project.

Notices


Reply
  Search this Thread
Old 11-06-2006, 07:56 PM   #1
mlissner
Member
 
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37

Rep: Reputation: 15
wget on sites that require passwords


I just posted something similar to this request regarding the usage of cURL, but I'm also wondering, can one use wget (or some other utility) to download a webpage that requires you to login before viewing it?

Like, can wget be used to download a snapshot of my gmail inbox twice a day, or is this impossible because to see my inbox you need to sign-in to gmail?

Examples of how to accomplish such a thing would be greatly appreciated. Thank you.
 
Old 11-07-2006, 01:59 PM   #2
MS3FGX
LQ Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 361Reputation: 361Reputation: 361Reputation: 361
Starting wget with the --http-user and --http-passwd options will allow you to specify a username and password to use if the site requires some form of authentication.

However, if I had to guess, I would say this isn't going to work on GMail. As far as I know, wget can only handle simple HTTP authentication.
 
Old 11-07-2006, 09:21 PM   #3
mlissner
Member
 
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37

Original Poster
Rep: Reputation: 15
I guess I could have mentioned this before now...but that doesn't work. Here's the output of trying that:

Code:
wget --http-user mlissner --http-passwd ******* https://www.example.com/index.epl
--18:13:50--  https://www.example.com/index.epl
           => `index.epl'
Resolving www.example.com... 216.52.143.250
Connecting to www.mycopa.com|216.52.143.250|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.example.com/template/loggedOut.epl [following]
--18:13:51--  http://www.example.com/template/loggedOut.epl
           => `loggedOut.epl'
Connecting to www.example.com|216.52.143.250|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 311 [text/html]

100%[====================================>] 311           --.--K/s

18:13:51 (9.00 MB/s) - `loggedOut.epl' saved [311/311]
So, what it's doing is checking if my password info is good enough to see index.epl, and what it realizes that my creds are no good, it's redirecting me to the "You need to log in page."

A couple of points I ought to mention - one this is over the https protocol. Two, I'm not really trying to get into gmail (I can use POP3 for that), but I'd rather not say what the site is because it is for work. I spoofed some info above to mask the site, but the idea is the same.

I'm pretty sure that this site uses cookies to make sure that the browser contacting it has logged on, and I've attempted to send (with cURL and wget) the correct cookies to the site, but that doesn't seem to be working either. My guess is that one needs to logon to the site with a program, and then the site will only work if that program sends its signature (or something).

Does anybody have any ideas on what the standard security type stuff is for websites, and what I need to do to fool a site for which I have all the proper authority to visit normally?

Thanks in advance, sorry for the length.

Last edited by mlissner; 11-07-2006 at 09:30 PM.
 
Old 11-07-2006, 10:17 PM   #4
Hitboxx
Senior Member
 
Registered: Mar 2006
Location: India
Distribution: Fedora
Posts: 1,562
Blog Entries: 3

Rep: Reputation: 68
Quote:
Like, can wget be used to download a snapshot of my gmail inbox twice a day, or is this impossible because to see my inbox you need to sign-in to gmail?
This can easily be accomplished by using the 'gmail' extension plugin if you're using Firefox. Be sure to check the snippet option in the preferences.
 
Old 11-07-2006, 11:59 PM   #5
mlissner
Member
 
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37

Original Poster
Rep: Reputation: 15
True, and there're other ways to do that as well, but that's not really what I'm after. Really, I want to get into other sites that I have the password for so I can download them several times a day without actually visiting them.

Shouldn't this be doable? What's a browser doing to login that I'm not accomplishing with wget or curl?
 
Old 11-08-2006, 11:09 AM   #6
MS3FGX
LQ Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 361Reputation: 361Reputation: 361Reputation: 361
I think the issue here is that the only authentication wget supports is the sort where a login box actually pops up and asks for a username and password (like the sort of authentication used on home routers).

I think if you have to put the username and password in on an actual form built into the page, it doesn't know how to do that. I believe you could do it with curl, since with that you can actually give it the name of a field and the data it should enter there (which might also be possible with wget, but I have only seen it done with curl).

I don't actually have any idea how you would do that though, it isn't exactly in my field. By the way, perhaps this topic would do better in Linux Software?
 
Old 11-08-2006, 11:09 AM   #7
benjithegreat98
Senior Member
 
Registered: Dec 2003
Location: Shelbyville, TN, USA
Distribution: Fedora Core, CentOS
Posts: 1,019

Rep: Reputation: 45
I would say if this is for work then call the company you are trying to retrieve from and ask if they have any methods to automate it? Maybe they have the same info on and ftp server somewhere. If you are partnered with them in some way surely they would be open to help you out.

And sometimes when you log into a site the passwords are being encryped/hashed/whatever using their own methods. I would say the username password combo you gave it would be for a site that has a popup box asking for a password. I can't think of a popular one I've been to lately that does it like this, but yahoo, gmail, etc are different than that method.
 
Old 11-08-2006, 11:01 PM   #8
mlissner
Member
 
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37

Original Poster
Rep: Reputation: 15
Perhaps it's a lost cause. Something linux can't do? It seems to be so. I'll see about contacting the people that we work with, but they're usually mighty busy.

Would it be worthwhile to move this thread to Linux software? If so, how?

Thanks again.
 
Old 11-08-2006, 11:45 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,414

Rep: Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785Reputation: 2785
1. For simplicity, contact the sites and ask if they have a simpler method for automation eg ftp etc.
2. I use Perl & WWW::Mechanize.pm module to do the same thing myself.

Add a reply to this thread if you'd like to see my code.
 
Old 11-09-2006, 12:44 AM   #10
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
You can automate a form login and other activity with the lynx command-line web browser. For example:

The following command will create a log of your actions:

lynx -cmd_log=filename http://www.somesite.com

You can replay the operation with:

lynx -cmd_script=filename http://www.somesite.com
 
Old 11-09-2006, 01:27 AM   #11
mlissner
Member
 
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37

Original Poster
Rep: Reputation: 15
Hmmm..I don't know the first thing about lynx, but I will say that it has crossed my mind on more than one occasion as a simple hack of a solution. I'll play with it and see if I can figure it out.

And, chrism01, I don't really know perl yet, but your code could come in handy. Would you mind posting it?

My faith returneth.

------EDIT-------
I went and played with lynx, figured out the basic navigation and etc, but the page that loads cookies onto my machine and authenticates my login credentials is javascript, so lynx doesn't work at all...how about that perl code?

Last edited by mlissner; 11-09-2006 at 02:18 AM.
 
Old 06-25-2007, 10:14 PM   #12
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 249

Rep: Reputation: 31
Lynx is great but I need a log for a javascript enabled web-browser...

What can I do?
 
Old 06-25-2007, 10:51 PM   #13
dive
Senior Member
 
Registered: Aug 2003
Location: UK
Distribution: Slackware
Posts: 3,467

Rep: Reputation: Disabled
Try elinks with -dump command. First browse to the site normally and log in, save cookies and say yes to remember pass. Then try 'elinks -dump bla.com'. May work.

And I believe some javascript works with elinks.
 
Old 06-26-2007, 01:43 AM   #14
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 249

Rep: Reputation: 31
elinks does not seem to work with javascript
 
Old 06-26-2007, 03:44 AM   #15
koobi
Member
 
Registered: Jun 2006
Location: Colombo, Sri Lanka
Distribution: Ubuntu
Posts: 103

Rep: Reputation: 15
i haven't tried this, but I assume it would work (from the manual):
Code:
wget --save-cookies cookies.txt \
                        --post-data ’user=foo&password=bar’ \
                        http://server.com/auth.php
# i think you have to make sure the forms label/name 
# correspond with what you've entered above

                   # Now grab the page or pages we care about.
                   wget --load-cookies cookies.txt \
                        -p http://server.com/interesting/article.php

remember, that if you're using the HTTPS protocol, you have to consider the certification. if you are sure about who you are connecting, using the '--no-check-certificate' is the easiest thing to do but it is also the least secure.


i believe MS3FGX's wget didn't work since it looked for HTTP authentication while sites like gmail, for example, use the POST method.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Client cannot open few https://.. sites i.e. secure sites rajeshghy Linux - General 1 11-02-2006 07:30 AM
Sync MySQL passwords with local account passwords? turbine216 Linux - Software 2 02-18-2005 04:15 AM
Completely uninstalling MySQL and its passwords passwords...how? I locked myself out! Baix Linux - Newbie 2 01-30-2005 05:10 PM
Is there a way to sync Samba passwords with linux user passwords MarleyGPN Linux - Networking 2 09-09-2003 11:59 AM
wget fails on password protected sites coontie Linux - Software 1 07-03-2003 11:25 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora

All times are GMT -5. The time now is 04:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration