LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 06-01-2006, 10:55 PM   #1
babag
Member
 
Registered: Aug 2003
Posts: 365

Rep: Reputation: 30
scripting or maybe wget question


not sure if this is where to ask about this so please direct me elsewhere
if apropriate. there's a website i want to research:

http://www.houseoftartan.co.uk/house/tfinder.htm

this page allows one to click a series of links on the left side frame
and an image appears in the right frame. i'd like to download all of the
images so i can peruse them offline. i tried wget and downloaded the
entire site but did not get the images. from what i can tell, they are
being called up by an .exe app within the site somewhere. the .exe didn't
seem to come with the wget download. is there a way to do this or am i
stuck downloading the images one at a time? i'd like to view just the
images, without the rest of the page.

thanks,
BabaG
 
Old 06-02-2006, 01:26 AM   #2
juanbobo
Member
 
Registered: Mar 2005
Location: Chicago
Distribution: Gentoo AMD64
Posts: 365

Rep: Reputation: 30
I believe the images are in the http://www.houseoftartan.co.uk/gifhold directory for the most part. It's not possible to read the directory's contents directly, but if you can parse the image names from the site you can add them to the directory and get them with wget.

Last edited by juanbobo; 06-02-2006 at 01:28 AM.
 
Old 06-02-2006, 02:39 AM   #3
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
from manwget
Code:
   -p
       --page-requisites
           This option causes Wget to download all the files that are neces-
           sary to properly display a given HTML page.  This includes such
           things as inlined images, sounds, and referenced stylesheets.

           Ordinarily, when downloading a single HTML page, any requisite doc-
           uments that may be needed to display it properly are not down-
           loaded.  Using -r together with -l can help, but since Wget does
           not ordinarily distinguish between external and inlined documents,
           one is generally left with ``leaf documents'' that are missing
           their requisites.

           For instance, say document 1.html contains an "<IMG>" tag referenc-
           ing 1.gif and an "<A>" tag pointing to external document 2.html.
           Say that 2.html is similar but that its image is 2.gif and it links
           to 3.html.  Say this continues up to some arbitrarily high number.

           If one executes the command:

                   wget -r -l 2 http://I<site>/1.html

           then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
 
Old 06-02-2006, 02:45 AM   #4
juanbobo
Member
 
Registered: Mar 2005
Location: Chicago
Distribution: Gentoo AMD64
Posts: 365

Rep: Reputation: 30
Cool, works?
 
Old 06-02-2006, 02:52 AM   #5
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
Smile

U know what?!!!...

I was doing something else with wget the other day and I read that in passing.

I have to admit that I havn't tried it yet so I will interested to see what you guys make of it.
 
Old 06-02-2006, 12:13 PM   #6
babag
Member
 
Registered: Aug 2003
Posts: 365

Original Poster
Rep: Reputation: 30
this is the command i tried to download the site with:

wget -r --tries=10 http://www.houseoftartan.co.uk/house/tfinder.htm

using that i couldn't find the images i was looking for anywhere,
including the http://www.houseoftartan.co.uk/scottish/itm_img
directory. when i tried the link above (http://www.houseoftartan.co.uk/gifhold)
i get:

Directory Listing Denied
This Virtual Directory does not allow contents to be listed.

there is no gifhold directory in the material wget downloaded.

how would i change the command i originally used to make use of the -p
option? would it just be by adding -p:

wget -p -r --tries=10 http://www.houseoftartan.co.uk/house/tfinder.htm

or should i be replacing the -r with -p? obviously a noob with this
sort of thing, sorry.

thanks,
BabaG
 
Old 06-02-2006, 12:34 PM   #7
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
try wget -p -r www.website.com/webresource
 
Old 06-02-2006, 02:58 PM   #8
babag
Member
 
Registered: Aug 2003
Posts: 365

Original Poster
Rep: Reputation: 30
nope. seemed to get pretty much the same stuff. i suspect the graphics
in question are being called by a separate executable so maybe that's
why wget isn't finding them? when i looked at the source for the page
i originally linked to there is an exe referenced in the code.
 
Old 06-02-2006, 03:34 PM   #9
slzckboy
Member
 
Registered: May 2005
Location: uk - Reading
Distribution: slack 10.2 kde 3.4.2 kernel 2.6.15
Posts: 452

Rep: Reputation: 30
ok i just did wget -p www.thelouche.com

and it made a directory in the current directory called thelouche.com

in that directory was the html page and the pics from that site.

try again.
omit the r.
 
Old 01-31-2012, 11:00 PM   #10
bungfish
LQ Newbie
 
Registered: Mar 2006
Location: Omaha, NE
Distribution: Linux Mint, Backtrack, Ubuntu, Slackware
Posts: 4

Rep: Reputation: 0
Exclamation wget and robots.txt

Don't forget that wget respects and follows the "robots.txt" exclusion file. Take a look yourself:
http://www.houseoftartan.co.uk/robots.txt

User-agent: *
Disallow: /scripts/
Disallow: /download/
Disallow: /turnstile/
Disallow: /scottish/admin/
Disallow: /gifhold/
Disallow: /hot_nav/
Disallow: /database/
Disallow: /tracker/

You will have to pass wget the option to tell it to ignore robots.txt. See the wget FAQ:
http://wget.addictivecode.org/Freque...estions#robots

Hope this helps!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux scripting - wget loop northy_ie Programming 7 11-09-2005 08:17 AM
wget question kpachopoulos Linux - General 1 07-23-2005 10:00 AM
Question about wget sdouble Linux - Software 2 06-16-2004 11:36 AM
wget question Yohhan Linux - Software 1 05-05-2004 07:49 PM
wget question satimis Linux - Software 4 07-14-2003 04:25 AM


All times are GMT -5. The time now is 03:43 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration