what is the wget command line to down a complete web page for off line reading

rob.rice · 10-27-2016, 05:06 PM

the what I have found on google is a waste of online time
the only inter net access I have is hot spots

Philip Lacroix · 10-27-2016, 06:02 PM

For a single page this can be a good start:

Code:

wget --page-requisites --convert-links --adjust-extension \
     --span-hosts https://www.linuxquestions.org/index.html

This command will save the page, along with its related files, even if they span across different hosts, in a directory named "www.linuxquestions.org".

jefro · 10-27-2016, 06:33 PM

Once in a while you have to slow that down. Fast capture might signal the server to break connection.

What web pages did you look at?

http://stackoverflow.com/questions/6...y-of-a-webpage

Philip Lacroix · 10-27-2016, 06:53 PM

Perhaps using --wait and --random-wait might help in such cases? I personally find the locally stored documentation extremely helpful most of the time, especially when dealing with a specific command and its intricacies.

rob.rice · 10-28-2016, 12:40 PM

Quote:

Originally Posted by Philip Lacroix

For a single page this can be a good start:

Code:

wget --page-requisites --convert-links --adjust-extension \
     --span-hosts https://www.linuxquestions.org/index.html

This command will save the page, along with its related files, even if they span across different hosts, in a directory named "www.linuxquestions.org".

most of the site is in *.php files this just got the main page and none of the *.php files
this is the same problem I had with ALL of the other answers I found from google.com

rob.rice · 10-28-2016, 12:41 PM

Quote:

Originally Posted by jefro

Once in a while you have to slow that down. Fast capture might signal the server to break connection.

What web pages did you look at?

http://stackoverflow.com/questions/6...y-of-a-webpage

same as my last post

Philip Lacroix · 10-28-2016, 01:24 PM

I hope you don't mind me asking, but did you actually try that command? And did you understand what the --adjust-extension option is for? Because if you actually look, you'll see that LQ's homepage has the .php extension. By the way, PHP itself is server-side code (executed on the server) hence you'll never see it in a web page loaded with a web browser: what you get is (X)HTML.

jefro · 10-28-2016, 02:53 PM

If you simply want a single page of some website then I usually print it to a pdf file.

If you want an entire website then you have to tell use what is failing.

I have used httrack and it grabbed what I wanted.

On that link I posted had some comments about others results and fixes.

rob.rice · 10-28-2016, 03:05 PM

Quote:

Originally Posted by Philip Lacroix

I hope you don't mind me asking, but did you actually try that command? And did you understand what the --adjust-extension option is for? Because if you actually look, you'll see that LQ's homepage has the .php extension. By the way, PHP itself is server-side code (executed on the server) hence you'll never see it in a web page loaded with a web browser: what you get is (X)HTML.

yes I did try it it just downloaded the just first page
none of the links and just 1 of the doku.php files

I got (I think) the whole page with

wget -c -m -r -x -k http://the-website

BUT
it didn't convert the links

Philip Lacroix · 10-28-2016, 03:42 PM

Quote:

Originally Posted by rob.rice

yes I did try it it just downloaded the just first page none of the links

I thought it was what you wanted to do, according to your OP:

Quote:

Originally Posted by rob.rice

what is the wget command line to down a complete web page for off line reading

rob.rice · 10-28-2016, 05:51 PM

Quote:

Originally Posted by Philip Lacroix

I thought it was what you wanted to do, according to your OP:

the parts you must have missed was
"complete"
and
"for offline reading"

Philip Lacroix · 10-29-2016, 01:52 PM

The command I suggested does, indeed, download a complete web page for offline reading. That is, you'll get the code, with all the related images and styles for proper rendering. Of course it does not recursively follow the hyperlinks to other pages, as you asked for a command to download a web page, not a web site. For a better understanding I suggest that you have a look at the excellent wget man page, available locally on your Slackware system.

jefro · 10-29-2016, 02:38 PM

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

The links to files that have been downloaded by Wget will be changed to refer
to the file they point to as a relative link.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
downloaded, then the link in doc.html will be modified to point to
‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
combinations of directories.

The links to files that have not been downloaded by Wget will be changed to
include host name and absolute path of the location they point to.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
../bar/img.gif), then the link in doc.html will be modified to point to
http://hostname/bar/img.gif.

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.