Fedora This forum is for the discussion of the Fedora Project. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
11-06-2006, 07:56 PM
|
#1
|
Member
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37
Rep:
|
wget on sites that require passwords
I just posted something similar to this request regarding the usage of cURL, but I'm also wondering, can one use wget (or some other utility) to download a webpage that requires you to login before viewing it?
Like, can wget be used to download a snapshot of my gmail inbox twice a day, or is this impossible because to see my inbox you need to sign-in to gmail?
Examples of how to accomplish such a thing would be greatly appreciated. Thank you.
|
|
|
11-07-2006, 01:59 PM
|
#2
|
LQ Guru
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852
|
Starting wget with the --http-user and --http-passwd options will allow you to specify a username and password to use if the site requires some form of authentication.
However, if I had to guess, I would say this isn't going to work on GMail. As far as I know, wget can only handle simple HTTP authentication.
|
|
|
11-07-2006, 09:21 PM
|
#3
|
Member
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37
Original Poster
Rep:
|
I guess I could have mentioned this before now...but that doesn't work. Here's the output of trying that:
Code:
wget --http-user mlissner --http-passwd ******* https://www.example.com/index.epl
--18:13:50-- https://www.example.com/index.epl
=> `index.epl'
Resolving www.example.com... 216.52.143.250
Connecting to www.mycopa.com|216.52.143.250|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.example.com/template/loggedOut.epl [following]
--18:13:51-- http://www.example.com/template/loggedOut.epl
=> `loggedOut.epl'
Connecting to www.example.com|216.52.143.250|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 311 [text/html]
100%[====================================>] 311 --.--K/s
18:13:51 (9.00 MB/s) - `loggedOut.epl' saved [311/311]
So, what it's doing is checking if my password info is good enough to see index.epl, and what it realizes that my creds are no good, it's redirecting me to the "You need to log in page."
A couple of points I ought to mention - one this is over the https protocol. Two, I'm not really trying to get into gmail (I can use POP3 for that), but I'd rather not say what the site is because it is for work. I spoofed some info above to mask the site, but the idea is the same.
I'm pretty sure that this site uses cookies to make sure that the browser contacting it has logged on, and I've attempted to send (with cURL and wget) the correct cookies to the site, but that doesn't seem to be working either. My guess is that one needs to logon to the site with a program, and then the site will only work if that program sends its signature (or something).
Does anybody have any ideas on what the standard security type stuff is for websites, and what I need to do to fool a site for which I have all the proper authority to visit normally?
Thanks in advance, sorry for the length.
Last edited by mlissner; 11-07-2006 at 09:30 PM.
|
|
|
11-07-2006, 10:17 PM
|
#4
|
Senior Member
Registered: Mar 2006
Location: India
Distribution: Fedora
Posts: 1,562
Rep:
|
Quote:
Like, can wget be used to download a snapshot of my gmail inbox twice a day, or is this impossible because to see my inbox you need to sign-in to gmail?
|
This can easily be accomplished by using the 'gmail' extension plugin if you're using Firefox. Be sure to check the snippet option in the preferences.
|
|
|
11-07-2006, 11:59 PM
|
#5
|
Member
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37
Original Poster
Rep:
|
True, and there're other ways to do that as well, but that's not really what I'm after. Really, I want to get into other sites that I have the password for so I can download them several times a day without actually visiting them.
Shouldn't this be doable? What's a browser doing to login that I'm not accomplishing with wget or curl?
|
|
|
11-08-2006, 11:09 AM
|
#6
|
LQ Guru
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852
|
I think the issue here is that the only authentication wget supports is the sort where a login box actually pops up and asks for a username and password (like the sort of authentication used on home routers).
I think if you have to put the username and password in on an actual form built into the page, it doesn't know how to do that. I believe you could do it with curl, since with that you can actually give it the name of a field and the data it should enter there (which might also be possible with wget, but I have only seen it done with curl).
I don't actually have any idea how you would do that though, it isn't exactly in my field. By the way, perhaps this topic would do better in Linux Software?
|
|
|
11-08-2006, 11:09 AM
|
#7
|
Senior Member
Registered: Dec 2003
Location: Shelbyville, TN, USA
Distribution: Fedora Core, CentOS
Posts: 1,019
Rep:
|
I would say if this is for work then call the company you are trying to retrieve from and ask if they have any methods to automate it? Maybe they have the same info on and ftp server somewhere. If you are partnered with them in some way surely they would be open to help you out.
And sometimes when you log into a site the passwords are being encryped/hashed/whatever using their own methods. I would say the username password combo you gave it would be for a site that has a popup box asking for a password. I can't think of a popular one I've been to lately that does it like this, but yahoo, gmail, etc are different than that method.
|
|
|
11-08-2006, 11:01 PM
|
#8
|
Member
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37
Original Poster
Rep:
|
Perhaps it's a lost cause. Something linux can't do? It seems to be so. I'll see about contacting the people that we work with, but they're usually mighty busy.
Would it be worthwhile to move this thread to Linux software? If so, how?
Thanks again.
|
|
|
11-08-2006, 11:45 PM
|
#9
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,414
|
1. For simplicity, contact the sites and ask if they have a simpler method for automation eg ftp etc.
2. I use Perl & WWW::Mechanize.pm module to do the same thing myself.
Add a reply to this thread if you'd like to see my code.
|
|
|
11-09-2006, 12:44 AM
|
#10
|
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
|
You can automate a form login and other activity with the lynx command-line web browser. For example:
The following command will create a log of your actions:
lynx -cmd_log=filename http://www.somesite.com
You can replay the operation with:
lynx -cmd_script=filename http://www.somesite.com
|
|
|
11-09-2006, 01:27 AM
|
#11
|
Member
Registered: Jun 2006
Location: Berkeley, CA
Distribution: FC5
Posts: 37
Original Poster
Rep:
|
Hmmm..I don't know the first thing about lynx, but I will say that it has crossed my mind on more than one occasion as a simple hack of a solution. I'll play with it and see if I can figure it out.
And, chrism01, I don't really know perl yet, but your code could come in handy. Would you mind posting it?
My faith returneth.
------EDIT-------
I went and played with lynx, figured out the basic navigation and etc, but the page that loads cookies onto my machine and authenticates my login credentials is javascript, so lynx doesn't work at all...how about that perl code?
Last edited by mlissner; 11-09-2006 at 02:18 AM.
|
|
|
06-25-2007, 10:14 PM
|
#12
|
Member
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 249
Rep:
|
Lynx is great but I need a log for a javascript enabled web-browser...
What can I do?
|
|
|
06-25-2007, 10:51 PM
|
#13
|
Senior Member
Registered: Aug 2003
Location: UK
Distribution: Slackware
Posts: 3,467
Rep:
|
Try elinks with -dump command. First browse to the site normally and log in, save cookies and say yes to remember pass. Then try 'elinks -dump bla.com'. May work.
And I believe some javascript works with elinks.
|
|
|
06-26-2007, 01:43 AM
|
#14
|
Member
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 249
Rep:
|
elinks does not seem to work with javascript
|
|
|
06-26-2007, 03:44 AM
|
#15
|
Member
Registered: Jun 2006
Location: Colombo, Sri Lanka
Distribution: Ubuntu
Posts: 103
Rep:
|
i haven't tried this, but I assume it would work (from the manual):
Code:
wget --save-cookies cookies.txt \
--post-data ’user=foo&password=bar’ \
http://server.com/auth.php
# i think you have to make sure the forms label/name
# correspond with what you've entered above
# Now grab the page or pages we care about.
wget --load-cookies cookies.txt \
-p http://server.com/interesting/article.php
remember, that if you're using the HTTPS protocol, you have to consider the certification. if you are sure about who you are connecting, using the '--no-check-certificate' is the easiest thing to do but it is also the least secure.
i believe MS3FGX's wget didn't work since it looked for HTTP authentication while sites like gmail, for example, use the POST method.
|
|
|
All times are GMT -5. The time now is 04:22 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|