Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
08-11-2012, 08:24 PM
|
#1
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Rep:
|
Downloaded complete web page with wget but browser wants internet to open page?
I downloaded a web page with wget like this:
wget wget -E -H -k -K -p http://en.wikipedia.org/wiki/TRS_connector
It seemed to go well but when I go offline and open the local page I downloaded Firefox tries to acces the internet from "bits.wikimedia.org". Why did wget not work properly? I've read the wget manual and can't see what I'm missing. I need this page sometimes when I'm not on line. Thanks in advance.
|
|
|
|
08-11-2012, 08:48 PM
|
#2
|
|
Guru
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian
Posts: 5,490
|
The first thing I would do is navigate to the directory where wget stored the page and look at what's there.
Looking at the page source from your link, "bits" is in the links that point to the CSS and some of the images.
|
|
|
1 members found this post helpful.
|
08-11-2012, 09:10 PM
|
#3
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
Okay, thanks frankbell, doing that now.
|
|
|
|
08-11-2012, 09:19 PM
|
#4
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
So how about this. I download
http://bits.wikimedia.org/en.wikiped...n=vector&*
and change the link in the source code to where I put the above file, which in this case would be bits.wikimedia.org? And do the same with other stuff that's being called from the internet? Would that work? Thanks.
|
|
|
|
08-11-2012, 09:40 PM
|
#5
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
Hmm. I tried doing that, modifying the source page and it's a mess. Any way I can get wget to download the stuff needed by the CSS and then change the URLS in the CSS like it does with the source html file?
Well I gotta go to bed. Worn out. Can't figure this out 'till I've had some rest. The old head still works it just needs more rest than it used to.
Last edited by SharpyWarpy; 08-11-2012 at 09:49 PM.
Reason: retiring
|
|
|
|
08-12-2012, 07:13 PM
|
#6
|
|
Guru
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian
Posts: 5,490
|
I have not used wget to download a webpage.
I know that, when I tell Opera to save a webpage "as HTML with images," it creates an HTML file and then a subdirectory in which it stores linked images. I just tested it with the Wikipedia page in your first post in this thread, and the saved page seems to display properly.
Here's a screen grab showing the saved page opened in Konqueror on the left and the contents of the "files" subdirectory on the right.
Maybe that could be a workaround.
|
|
|
|
08-12-2012, 07:42 PM
|
#7
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
I have not tried that with Konquerer but I have with Firefox and it has the same habit of needing an internet connection. Let me try Konquerer. Oh, and Thanks for the reply!
|
|
|
|
08-12-2012, 07:55 PM
|
#8
|
|
Guru
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian
Posts: 5,490
|
Note that I was using Opera to view and save the page and Konqueror simply as a file manager (I have never quite gotten adjusted to Dolphin).
|
|
|
|
08-12-2012, 08:28 PM
|
#9
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
Okay I don't have Opera but I found something in Konquerer. Clicked on "Tools > Archive Web Page" and it saved it in .war archive format. It does not look like the original page but all pertinent content is there. So I guess I can use that but I'd like to work out the wrinkles with wget so I can do it from the command line. Call me picky but I like doing as much as I can from a command prompt. Thank you very much, Frankbell. Although I don't have Opera your reply tempted me to try Konquerer and it saves the complete page for offline viewing. Maybe I'll stroll over to the wget home page and see if I can find some documentation there that is more thourough in this respect. Thanks again.
|
|
|
|
08-12-2012, 09:37 PM
|
#10
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
Okay I found a Firefox extension called "unmht-5.7.5.xpi". When you right click on the page the resulting dialog includes the option "Save As MHT" and it downloads everything for offline viewing. However it displays the same problem, that of not including some of the images listed in the CSS or javascript part as local images.
Last edited by SharpyWarpy; 08-12-2012 at 09:40 PM.
|
|
|
|
08-13-2012, 03:24 PM
|
#11
|
|
Senior Member
Registered: Jan 2011
Distribution: Slack14_64_Multilib
Posts: 1,491
Rep: 
|
try
Code:
wget -p --convert-links www.domain.com
|
|
|
|
08-13-2012, 07:00 PM
|
#12
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
Quote:
Originally Posted by Habitual
try
Code:
wget -p --convert-links www.domain.com
|
Thanks for your reply. I already use these options. "--convert-links" is the same as "-k". But thanks anyway! 
|
|
|
|
08-14-2012, 07:50 AM
|
#13
|
|
Senior Member
Registered: Jan 2011
Distribution: Slack14_64_Multilib
Posts: 1,491
Rep: 
|
Quote:
Originally Posted by SharpyWarpy
..."--convert-links" is the same as "-k". But thanks anyway! 
|
Yeah, I tried. 
I only use it like once every month...
Sorry about that!
|
|
|
|
08-14-2012, 10:31 PM
|
#14
|
|
Member
Registered: Oct 2011
Location: New Zealand
Distribution: Debian
Posts: 105
Rep:
|
I tried to mirror that wikipedia page with httrack instead of wget. And it saved the text and images okay, but some of the page formatting was missing though...
Code:
# a standard list of httrack filters to save webpages
$ cat list-of-filters
-*
+*.jpg
+*.jpeg
+*.tif
+*.tiff
+*.png
+*.gif
+*.ico
+*.bmp
+*.css
+*.js
-mime:*/*
+mime:image/*
+mime:text/html
+mime:text/plain
+mime:text/css
+mime:text/javascript
+mime:application/x-javascript
# mirror the webpage into the current directory
$ httrack -w -r1 -n -o0 -s2 -%v -z -%B -H1 -%P -u2 -%u -T20 -R1 \
-F "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" \
-%l "en, en, *" -%S "$PWD/list-of-filters" -O ./. 'http://en.wikipedia.org/wiki/TRS_connector'
I did try to use higher levels of recursion etc., but that didn't improve the quality of the saved page. Indeed, looking at the files downloaded, I never saw any css files in the mirror. And I also discovered that wikipedia refuses httrack, unless you specify a different user-agent.
But looking within the saved html files, I could see some embedded javascript in there. And that might explain where the missing page formatting went. Because according to the httrack faq, support for javascript parsing in httrack is incomplete 
Last edited by dru8274; 08-14-2012 at 10:39 PM.
|
|
|
|
08-15-2012, 06:04 AM
|
#15
|
|
Member
Registered: Feb 2003
Location: Florida
Distribution: Fedora 18
Posts: 828
Original Poster
Rep:
|
@ dru8274, thank you for your reply. Wget has the same lack of javascript support but if you read the wget FAQ there's a bit of info there explaining why. To me it says you don't want javascript support because it can create some very bad problems.
http://wget.addictivecode.org/Freque..._JavaScript.3F
But I found wget handles CSS as long as it's from a *.CSS file and not CSS embedded in an index.html file, it runs into trouble there. As I said in a prior reply Konquerer has an archive feature that saves to a *.war file but it's missing a lot of the formatting and some of the images too. So as far as I can tell the only way to get everthing is use wget with the options I used then look though the generated files for missing stuff and download all that separately. Sounds like a lot of work but it depends on how important the page is for you. Thanks again!
Last edited by SharpyWarpy; 08-15-2012 at 06:06 AM.
Reason: typo
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:52 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|