LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 10-27-2016, 05:06 PM   #1
rob.rice
Senior Member
 
Registered: Apr 2004
Distribution: slack what ever
Posts: 1,076

Rep: Reputation: 205Reputation: 205Reputation: 205
what is the wget command line to down a complete web page for off line reading


the what I have found on google is a waste of online time
the only inter net access I have is hot spots
 
Old 10-27-2016, 06:02 PM   #2
Philip Lacroix
Member
 
Registered: Jun 2012
Distribution: Slackware
Posts: 441

Rep: Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574
For a single page this can be a good start:

Code:
wget --page-requisites --convert-links --adjust-extension \
     --span-hosts https://www.linuxquestions.org/index.html
This command will save the page, along with its related files, even if they span across different hosts, in a directory named "www.linuxquestions.org".
 
Old 10-27-2016, 06:33 PM   #3
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,974

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
Once in a while you have to slow that down. Fast capture might signal the server to break connection.

What web pages did you look at?

http://stackoverflow.com/questions/6...y-of-a-webpage
 
Old 10-27-2016, 06:53 PM   #4
Philip Lacroix
Member
 
Registered: Jun 2012
Distribution: Slackware
Posts: 441

Rep: Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574
Perhaps using --wait and --random-wait might help in such cases? I personally find the locally stored documentation extremely helpful most of the time, especially when dealing with a specific command and its intricacies.

Last edited by Philip Lacroix; 10-27-2016 at 07:00 PM.
 
Old 10-28-2016, 12:40 PM   #5
rob.rice
Senior Member
 
Registered: Apr 2004
Distribution: slack what ever
Posts: 1,076

Original Poster
Rep: Reputation: 205Reputation: 205Reputation: 205
Quote:
Originally Posted by Philip Lacroix View Post
For a single page this can be a good start:

Code:
wget --page-requisites --convert-links --adjust-extension \
     --span-hosts https://www.linuxquestions.org/index.html
This command will save the page, along with its related files, even if they span across different hosts, in a directory named "www.linuxquestions.org".
most of the site is in *.php files this just got the main page and none of the *.php files
this is the same problem I had with ALL of the other answers I found from google.com
 
Old 10-28-2016, 12:41 PM   #6
rob.rice
Senior Member
 
Registered: Apr 2004
Distribution: slack what ever
Posts: 1,076

Original Poster
Rep: Reputation: 205Reputation: 205Reputation: 205
Quote:
Originally Posted by jefro View Post
Once in a while you have to slow that down. Fast capture might signal the server to break connection.

What web pages did you look at?

http://stackoverflow.com/questions/6...y-of-a-webpage
same as my last post
 
Old 10-28-2016, 01:24 PM   #7
Philip Lacroix
Member
 
Registered: Jun 2012
Distribution: Slackware
Posts: 441

Rep: Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574
I hope you don't mind me asking, but did you actually try that command? And did you understand what the --adjust-extension option is for? Because if you actually look, you'll see that LQ's homepage has the .php extension. By the way, PHP itself is server-side code (executed on the server) hence you'll never see it in a web page loaded with a web browser: what you get is (X)HTML.
 
Old 10-28-2016, 02:53 PM   #8
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,974

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
If you simply want a single page of some website then I usually print it to a pdf file.

If you want an entire website then you have to tell use what is failing.

I have used httrack and it grabbed what I wanted.

On that link I posted had some comments about others results and fixes.
 
Old 10-28-2016, 03:05 PM   #9
rob.rice
Senior Member
 
Registered: Apr 2004
Distribution: slack what ever
Posts: 1,076

Original Poster
Rep: Reputation: 205Reputation: 205Reputation: 205
Quote:
Originally Posted by Philip Lacroix View Post
I hope you don't mind me asking, but did you actually try that command? And did you understand what the --adjust-extension option is for? Because if you actually look, you'll see that LQ's homepage has the .php extension. By the way, PHP itself is server-side code (executed on the server) hence you'll never see it in a web page loaded with a web browser: what you get is (X)HTML.
yes I did try it it just downloaded the just first page
none of the links and just 1 of the doku.php files

I got (I think) the whole page with

wget -c -m -r -x -k http://the-website

BUT
it didn't convert the links
 
Old 10-28-2016, 03:42 PM   #10
Philip Lacroix
Member
 
Registered: Jun 2012
Distribution: Slackware
Posts: 441

Rep: Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574
Quote:
Originally Posted by rob.rice
yes I did try it it just downloaded the just first page none of the links
I thought it was what you wanted to do, according to your OP:

Quote:
Originally Posted by rob.rice
what is the wget command line to down a complete web page for off line reading
 
Old 10-28-2016, 05:51 PM   #11
rob.rice
Senior Member
 
Registered: Apr 2004
Distribution: slack what ever
Posts: 1,076

Original Poster
Rep: Reputation: 205Reputation: 205Reputation: 205
Quote:
Originally Posted by Philip Lacroix View Post
I thought it was what you wanted to do, according to your OP:

the parts you must have missed was
"complete"
and
"for offline reading"
 
Old 10-29-2016, 01:52 PM   #12
Philip Lacroix
Member
 
Registered: Jun 2012
Distribution: Slackware
Posts: 441

Rep: Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574Reputation: 574
The command I suggested does, indeed, download a complete web page for offline reading. That is, you'll get the code, with all the related images and styles for proper rendering. Of course it does not recursively follow the hyperlinks to other pages, as you asked for a command to download a web page, not a web site. For a better understanding I suggest that you have a look at the excellent wget man page, available locally on your Slackware system.

Last edited by Philip Lacroix; 10-29-2016 at 01:54 PM.
 
Old 10-29-2016, 02:38 PM   #13
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,974

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

The links to files that have been downloaded by Wget will be changed to refer
to the file they point to as a relative link.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
downloaded, then the link in doc.html will be modified to point to
‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
combinations of directories.

The links to files that have not been downloaded by Wget will be changed to
include host name and absolute path of the location they point to.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
../bar/img.gif), then the link in doc.html will be modified to point to
http://hostname/bar/img.gif.

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Web page input executes on command line and output back to web page keif Programming 7 02-26-2014 10:25 AM
dmesg reading line by line with sed command marcuka68 Linux - Software 2 05-31-2013 12:41 PM
[SOLVED] Downloaded complete web page with wget but browser wants internet to open page? SharpyWarpy Linux - General 15 08-16-2012 04:57 AM
[SOLVED] How to get the source code of a web page from linux command line? chekhov_neo Linux - Newbie 4 05-07-2010 07:38 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 06:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration