LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   "wget -p" problem with PHP page (https://www.linuxquestions.org/questions/linux-general-1/wget-p-problem-with-php-page-729197/)

dedeco 05-28-2009 08:04 PM

[SOLVED] "wget -p" problem with PHP page
 
Hello,

I tried to make wget fetch a complete page (with called page-requisites):

Code:

wget -p "http://projecteuler.net/index.php?section=problems&id=246"

But once it finished, only two files were saved:

Code:

./projecteuler.net/index.php?section=problems&id=246
./projecteuler.net/robots.txt

And there should be more files, as you might see on the page (either the actual page or the downloaded file). The image files and the stylesheet file were not downloaded (despite the "-p").

What should I do for this to work? I guess it is because the file is not ended with ".htm" or ".html", but ends with ".php". (???) Not sure, though.

Dedeco

mrog 06-02-2009 02:19 PM

Look at the robots.txt file that was download. It says "Disallow: /". Therefore the people that created/own the site don't want you to do this. Wget respects the robots.txt file.

dedeco 06-04-2009 06:58 PM

Yes, that was it.

Disallowing all robots is probably not the best idea, IMHO.

I have to disrespect this file to do what otherwise would be a pain.

Of course, care should always be taken with the Internet, as I will have in doing what I want. But forbidding everything should not be the spirit.

Wikipedia's article about wget is pretty usefull, by the way.

Thank you.


All times are GMT -5. The time now is 02:43 PM.