LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-27-2020, 01:25 PM   #1
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Rep: Reputation: 35
Question [wget] Syntax to download single HTML page and its dependencies?


Hello,

I need to find a way to download a single web page and its dependencies before calling pandoc to turn it into an EPUB file.

The following works… but not when I add "-O local.html" to rename the file:

Code:
wget -E -H -k -K -p -e robots=off -O local.html https://www.acme.com/remote.html
Does someone know?

Thank you.
 
Old 04-27-2020, 02:23 PM   #2
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by littlebigman View Post
Hello,

I need to find a way to download a single web page and its dependencies before calling pandoc to turn it into an EPUB file.

The following works… but not when I add "-O local.html" to rename the file:

Code:
wget -E -H -k -K -p -e robots=off -O local.html https://www.acme.com/remote.html
Does someone know?

Thank you.
That code snippet appears to have a zero rather than an uppercase. Maybe use the long version of the option to be sure:
Code:
--output-document=file
I also note in the wget man page:
Quote:
Similarly, using -r or -p with -O may not work as you expect: Wget won't just download the first file to file and then download the rest to their normal names: all downloaded content will be placed in file. This was
disabled in version 1.11, but has been reinstated (with a warning) in 1.11.2, as there are some cases where this behavior can actually have some use.

Note that a combination with -k is only permitted when downloading a single document, as in that case it will just convert all relative URIs to external ones; -k makes no sense for multiple URIs when they're all being
downloaded to a single file; -k can be used only when the output is a regular file.
That said, what does (or doesn't) happen for you?

Another option is not use -O and simply redirect the output to a file
Code:
wget -E -H -k -K -p -e robots=off  https://www.acme.com/remote.html > local.html
(also mentioned in the man page)

Last edited by scasey; 04-27-2020 at 02:25 PM.
 
Old 04-27-2020, 02:34 PM   #3
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Here's the error message:

Code:
wget -E -H -k -K -p -e robots=off --output-document=local.html https://www.acme.com/mypage.html

Cannot specify both -k or --convert-file-only and -O if multiple URLs are given, or in combination with -p or -r. See the manual for details.
The redirection creates an empty file at the root, and I can find no local.html anywhere.

Last edited by littlebigman; 04-27-2020 at 02:36 PM.
 
Old 04-27-2020, 02:44 PM   #4
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by littlebigman View Post
Here's the error message:

Code:
wget -E -H -k -K -p -e robots=off --output-document=local.html https://www.acme.com/mypage.html

Cannot specify both -k or --convert-file-only and -O if multiple URLs are given, or in combination with -p or -r. See the manual for details.
The redirection creates an empty file at the root, and I can find no local.html anywhere.
That's a pretty clear error message, and is what the man page section I copied says as well. It simply won't work. -k and -O can't be used in the same command. Try it without the -k, or the -O (which you said worked, yes?)
What happens without the -O? Several files? Can you cat them together?
 
Old 04-27-2020, 02:55 PM   #5
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Without "k-", I get just the HTML file, and nothing else, meaning I only get text and no pictures.

I'll just rename the file in a second command since wget seems unable to do it.

Thank you.

Last edited by littlebigman; 04-27-2020 at 02:57 PM.
 
Old 04-29-2020, 06:55 AM   #6
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
You do know that most browsers can do this?
Caveat: the "dependencies" are placed in a separate folder that needs to travel with the original downloaded file.
But I believe you will have the same caveat with wget, or curl for that matter.
It might be possible to convert the whole pile into something that is one file only, but that goes beyond mere downloading.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prevent wget to download index.html of 404 page unclesamcrazy Linux - Newbie 3 04-19-2018 11:56 PM
Can wget extract links from a locally stored html page? LAPIII Linux - Software 1 11-12-2013 01:14 AM
[SOLVED] wget failed to download a html page moebus Linux - General 11 01-31-2012 09:58 PM
[SOLVED] How do I use wget to download only images from a single web page? errigour Linux - Newbie 1 11-29-2011 06:57 PM
grabbing linked .svg files from a html page with wget silviolorusso Programming 2 10-29-2011 07:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration