LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 08-06-2008, 11:49 PM   #1
memo007
Member
 
Registered: Feb 2005
Distribution: Debian, Kanotix, Kubuntu
Posts: 117

Rep: Reputation: 15
wget how to specify a numeric range and download images from a website ?


I was needing images on a server which are on the server but i have to access them manually.
It would be for example www.xyz.com/collection/846_02_900_x.jpg
When i change to /collection/847_02_900_x.jpg (just one number higher) there is a picture as well, and i need all the pictures in the collection folder.. However, if i just go to www.xyz.com/collection there are no images, it just says the there was an error... So without me having to sit 3 hours and do it all manually, going from 846_02_900_x.jpg 847_02_900_x.jpg 848_02_900_x.jpg etc.. i was wondering if wget can download the pictures somehow?

When i try it with wget -m -p -k http://www.xyz.com/collection all i get is HTTP request sent, awaiting response... 404 Not Found
And of course when i do wget -m -p -k www.xyz.com/collection/846_02_900_x.jpg it download only that image. The question is if there is a possibility to set it to increase certain values automatically such as 846, and download accordingly, without changing any other numbers ?

Last edited by memo007; 08-07-2008 at 12:19 AM.
 
Old 08-07-2008, 12:08 AM   #2
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
Wget doesn't give you the magic ability to scan a web sites directories. Wget follows links. If a page has links to those images, wget can be told to follow the links to download the images.

Alternatively, you can create a file that contains those links, and have wget iterate over those.
 
Old 08-07-2008, 12:17 AM   #3
memo007
Member
 
Registered: Feb 2005
Distribution: Debian, Kanotix, Kubuntu
Posts: 117

Original Poster
Rep: Reputation: 15
I don't necessarily need to scan it but only specify what images to download from one number to the other..

like from 847_02_900_x.jpg to 10000_02_900_x.jpg download all images..
To simply specify a numeric sequence or a range ...

Quote:
Originally Posted by Mr. C. View Post
Wget doesn't give you the magic ability to scan a web sites directories. Wget follows links. If a page has links to those images, wget can be told to follow the links to download the images.

Alternatively, you can create a file that contains those links, and have wget iterate over those.
 
Old 08-07-2008, 12:34 AM   #4
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
Right, so you need a script that can generate the file names. You can take it from here:


Code:
for i in $(seq 847 10000) ; do 
   echo ${i}_02_900_x.jpg
done
 
Old 08-07-2008, 12:38 AM   #5
memo007
Member
 
Registered: Feb 2005
Distribution: Debian, Kanotix, Kubuntu
Posts: 117

Original Poster
Rep: Reputation: 15
Thanks, but after i paste this i get numbers?
How do i use this with wget?

Quote:
Originally Posted by Mr. C. View Post
Right, so you need a script that can generate the file names. You can take it from here:


Code:
for i in $(seq 847 10000) ; do 
   echo ${i}_02_900_x.jpg
done
 
Old 08-07-2008, 12:47 AM   #6
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
You get file names...

847_02_900_x.jpg
848_02_900_x.jpg
849_02_900_x.jpg
850_02_900_x.jpg
...

Are these not the file names you want? Replace echo with wget, and add the missing part of the URI to the filename.
 
Old 08-07-2008, 11:19 AM   #7
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 114Reputation: 114
Why are you trying to harvest a site's images anyway? Do you have permission to suck up their bandwidth this way?

If I catch you doing that on my site, I'll blacklist you...and my site watches for exactly that, so it likely will catch you.
 
Old 08-08-2008, 02:34 PM   #8
memo007
Member
 
Registered: Feb 2005
Distribution: Debian, Kanotix, Kubuntu
Posts: 117

Original Poster
Rep: Reputation: 15
Grow up man...

Quote:
Originally Posted by jiml8 View Post
Why are you trying to harvest a site's images anyway? Do you have permission to suck up their bandwidth this way?

If I catch you doing that on my site, I'll blacklist you...and my site watches for exactly that, so it likely will catch you.
 
Old 08-08-2008, 02:37 PM   #9
ncsuapex
Member
 
Registered: Dec 2004
Location: Raleigh, NC
Distribution: CentOS 2.6.18-53.1.4.el5
Posts: 770

Rep: Reputation: 42
wget -r -l5 --no-parent -A.jpg www.xyz.com/collection/


the l5 is the number of levels it goes down. So l5 would go 5 levels down from collection
 
Old 08-08-2008, 02:50 PM   #10
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
come on.. scraping multiple images in this fashion is the fastest way to grow your Pr0n collection..

But seriously though, all kidding aside, have you looked at httrack ? it's another option you could try.

Quote:
httrack
Description: Copy websites to your computer (Offline browser)
HTTrack is an offline browser utility, allowing you to download a World Wide website from the Internet to a local directory, building
recursively all directories, getting html, images, and other files from the server to your computer.

HTTrack arranges the original site's relative link-structure.
 
Old 11-13-2008, 10:46 AM   #11
taxtropel
Member
 
Registered: Mar 2005
Location: Cascade Mountains WA USA
Distribution: Linux From Scratch (LFS)
Posts: 149

Rep: Reputation: 16
Wget Stuff

If you know the range of URLs (image or otherwise) then using seq as above you get something like...
Code:
for i in $(seq 1 20)
  do
    wget http://mybuddies.site.org/blarg/filename_$i.blarg
done
and of course as seen on the command line
Code:
for i in $(seq 1 20); do wget http://mybuddies.site.org/blarg/filename_$i.blarg; done
And yeah. Seriously jiml8.
Your post was completely off topic.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
wget: Multi-Threaded downloading wwnexc Linux - Networking 8 05-15-2010 08:40 PM
wget - downloading files from a directory Oris13 Linux - General 8 05-15-2008 05:42 PM
Advanced wget with google images Darek84CJ Linux - General 1 10-23-2006 09:19 AM
Wget - download images from any location andreic Linux - Software 3 04-18-2006 04:22 AM
HOWTo restart downloading with wget ashwin_cse Red Hat 5 08-26-2004 10:07 AM


All times are GMT -5. The time now is 06:16 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration