LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 10-13-2013, 02:37 PM   #1
gemaeuer
LQ Newbie
 
Registered: Oct 2013
Posts: 2

Rep: Reputation: Disabled
copying/dumping guestbook - login problem in lynx


hello

The following task drives me crazy:

I want to dump a guestbook from an quite old Mambo-CMS to a text file. Meanwhile there are several thousand pages and of course you have to log in to see the entries.

First I wanted to use curl, but then of course I need the parsed content, so a browser seemed more appropriate. Lynx has the nice "dump" and "crawl" options, so it seemed like an easy task. But what about the login?

This would be the command if no login were required (without crawl for starters):
$ lynx -accept_all_cookies -dump -nolist "http://www.somepage....&startpage=1" >test.txt

From this I get the login-page and not the page of interest. Looking for a solution I found the -post_data option, but I did not find a proper syntax for the datafile needed. There are some hints out there, but all way too cryptic for me.

Is there a way to dump and crawl from within lynx? So I could login using lynx and then do for example the "print" command somehow for all the pages?

Or is there a completly different way, for example using firefox or opera to automate cntr-c + cntr-v and calling the next page?

thanks in advance
Rainer
 
Old 10-14-2013, 07:17 AM   #2
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,436
Blog Entries: 11

Rep: Reputation: 181Reputation: 181
If you can access the database it might be easier to get a dumb from there.
Another approach would be wget which allows to spider through websites. It's also a bit easier to pass post information for a login. Or you could do the login manually get the cookie or auth string and feed that to wget. I did that once but not sure of the exact routine.

As far as the post-data options goes I guess you use key=value pairs per line and the --- at the end of the post data.
 
Old 10-14-2013, 08:09 AM   #3
gemaeuer
LQ Newbie
 
Registered: Oct 2013
Posts: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by zhjim View Post
If you can access the database it might be easier to get a dumb from there.
Asking the administrator for the dump would be my last resort. Sometimes one wants something to work, no matter what.

Quote:
Originally Posted by zhjim View Post
Another approach would be wget which allows to spider through websites. It's also a bit easier to pass post information for a login.
I'll give that a try, but wget gives me the html-code only, right?

Quote:
Originally Posted by zhjim View Post
Or you could do the login manually get the cookie or auth string and feed that to wget. I did that once but not sure of the exact routine.
I would need the session cookies too .. hmhm .. sounds interesting!

Quote:
Originally Posted by zhjim View Post
As far as the post-data options goes I guess you use key=value pairs per line and the --- at the end of the post data.
Tried that several times, didn't work :-(
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting lynx past salesforce.com login screen linux_evangelist Linux - Software 2 03-22-2013 02:51 PM
how to change usplash,login window in lucid lynx? ultimate_linux Ubuntu 2 09-06-2010 09:10 AM
exporting/dumping session cookies (any browser) (lynx?) Argent Linux - Software 1 04-02-2008 08:28 PM
Novell is dumping KDE, so I'll be dumping SuSE KimVette Suse/Novell 10 11-12-2005 08:09 PM
PHP Guestbook Linuxidiot Programming 1 04-17-2005 09:53 AM


All times are GMT -5. The time now is 04:16 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration