LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   copying/dumping guestbook - login problem in lynx (https://www.linuxquestions.org/questions/linux-general-1/copying-dumping-guestbook-login-problem-in-lynx-4175480629/)

gemaeuer 10-13-2013 02:37 PM

copying/dumping guestbook - login problem in lynx
 
hello

The following task drives me crazy:

I want to dump a guestbook from an quite old Mambo-CMS to a text file. Meanwhile there are several thousand pages and of course you have to log in to see the entries.

First I wanted to use curl, but then of course I need the parsed content, so a browser seemed more appropriate. Lynx has the nice "dump" and "crawl" options, so it seemed like an easy task. But what about the login?

This would be the command if no login were required (without crawl for starters):
$ lynx -accept_all_cookies -dump -nolist "http://www.somepage....&startpage=1" >test.txt

From this I get the login-page and not the page of interest. Looking for a solution I found the -post_data option, but I did not find a proper syntax for the datafile needed. There are some hints out there, but all way too cryptic for me.

Is there a way to dump and crawl from within lynx? So I could login using lynx and then do for example the "print" command somehow for all the pages?

Or is there a completly different way, for example using firefox or opera to automate cntr-c + cntr-v and calling the next page?

thanks in advance
Rainer

zhjim 10-14-2013 07:17 AM

If you can access the database it might be easier to get a dumb from there.
Another approach would be wget which allows to spider through websites. It's also a bit easier to pass post information for a login. Or you could do the login manually get the cookie or auth string and feed that to wget. I did that once but not sure of the exact routine.

As far as the post-data options goes I guess you use key=value pairs per line and the --- at the end of the post data.

gemaeuer 10-14-2013 08:09 AM

Quote:

Originally Posted by zhjim (Post 5045382)
If you can access the database it might be easier to get a dumb from there.

Asking the administrator for the dump would be my last resort. Sometimes one wants something to work, no matter what.

Quote:

Originally Posted by zhjim (Post 5045382)
Another approach would be wget which allows to spider through websites. It's also a bit easier to pass post information for a login.

I'll give that a try, but wget gives me the html-code only, right?

Quote:

Originally Posted by zhjim (Post 5045382)
Or you could do the login manually get the cookie or auth string and feed that to wget. I did that once but not sure of the exact routine.

I would need the session cookies too .. hmhm .. sounds interesting!

Quote:

Originally Posted by zhjim (Post 5045382)
As far as the post-data options goes I guess you use key=value pairs per line and the --- at the end of the post data.

Tried that several times, didn't work :-(


All times are GMT -5. The time now is 04:10 AM.