LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 05-13-2008, 08:49 PM   #1
Oris13
LQ Newbie
 
Registered: Aug 2005
Posts: 6

Rep: Reputation: 0
wget - downloading files from a directory


Hello, i'd appreciate if somebody could help me with this.
What i'm trying to do is this:

download all files from a directory on a web-server (no subfolders and their contents; no upper-level folder contents)
e.g.
Code:
http://url/dir/subdir/one/two
                 ^only files from this one

I've been struggling with this for quite a long time and tried probably all combinations with -r --no-parent -l# -A -R switches (reasonable and stupid combinations) - i can't figure this out. I've read man pages and different online how-to's.

I'm about to give up on wget )) Here's the practical question:

download all files from this(vamps) directory.(probably 1-1.5 megs at most)

Code:
http://http.us.debian.org/debian/pool/main/v/vamps/
I don't mind if it builds the tree of folders before vamps as long as only the files of vamps are saved. Does anybody know how to do this with wget?

I hope it's possible! Thanks in advance
 
Old 05-13-2008, 09:04 PM   #2
rlhartmann
Member
 
Registered: Mar 2008
Posts: 73

Rep: Reputation: 16
Have you tried setting the recursion depth --level=0 this should prevent any recursion,
also -nd will tell it no directories on the local machine.
 
Old 05-13-2008, 09:30 PM   #3
Oris13
LQ Newbie
 
Registered: Aug 2005
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks for replying. I have used -l0, and -nd..
Right now my command looks like this:

Code:
wget -r -l0 -nd --no-parent -A "vamps*" -R ".*" http://http.us.debian.org/debian/pool/main/v/vamps/
And it does download all files from vamps, but it goes on to vala, valgrind and other subdirs of /v and downloads their index.html's but for each one it says(after it gets it):

"Removing index.html since it should be rejected" // thus my -A and -R filters

Even though i do end up only with vamps files downloaded(SOME progress, at least), why does it go on downloading upper folders' index.html's and then rejecting them?.. The only way to stop it is Ctr-C when you notice too many "Removing..." lines flicking by..

Last edited by Oris13; 05-13-2008 at 09:31 PM.
 
Old 05-13-2008, 11:31 PM   #4
rlhartmann
Member
 
Registered: Mar 2008
Posts: 73

Rep: Reputation: 16
I'm not sure, but since you have -A "vamps*" and that is all you want, I don't think
you need the -R, try removing that and moving the --no-parent to the very last option.
 
Old 05-14-2008, 07:51 AM   #5
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 3,439

Rep: Reputation: 851Reputation: 851Reputation: 851Reputation: 851Reputation: 851Reputation: 851Reputation: 851
From the directory where you want the files to be downloaded to:
Quote:
wget -nH --cut-dirs=4 --level=0 http://http.us.debian.org/debian/pool/main/v/vamps/
-nH will remove 'http.us.debian.org' and
--cut-dirs=5 will remove 'debian/pool/main/v/vamps' from the downloaded file names.
 
Old 05-14-2008, 08:13 AM   #6
Oris13
LQ Newbie
 
Registered: Aug 2005
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks for replies guys.
Quote:
Originally Posted by allend View Post
From the directory where you want the files to be downloaded to:

-nH will remove 'http.us.debian.org' and
--cut-dirs=5 will remove 'debian/pool/main/v/vamps' from the downloaded file names.
That sounds like a sound way to do it, but after putting it in it only downloads the index.html of vamps folder. No files

I don't know, is there some other program that people use for downloading like this? I know that with some ftp clients you browse into folders and download files with simple wildcard masks (e.g. vamps*), but what about http?
 
Old 05-14-2008, 05:18 PM   #7
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 329

Rep: Reputation: 140Reputation: 140
This seems to work but it downloads 5 extra files to the 16 required. The extra files are from links in the vamps directory and are automatically deleted by 'wget' as it implements the wild card filter 'vamps*'. It gives just the files without any directories:
Code:
wget -r -nH -l1 --cut-dirs=5 --no-parent -A "vamps*" http://http.us.debian.org/debian/pool/main/v/vamps/

Downloaded: 690,265 bytes in 21 files
Using the 'lynx' text-only web browser it's possible to download the index.html of the directory as text and then use 'sed' to save the filenames to a file which can then be used for input into 'wget':
Code:
dir=http://http.us.debian.org/debian/pool/main/v/vamps/
lynx -dump $dir | sed -n "s|.*\(${dir}vamps.*\)|\1|p" > filelist
wget -i filelist

Downloaded: 664,375 bytes in 16 files
Alternatively the filenames can be piped directly into 'wget' using the '-i -' option:
Code:
dir=http://http.us.debian.org/debian/pool/main/v/vamps/
lynx -dump $dir | sed -n "s|.*\(${dir}vamps.*\)|\1|p" | wget -i -

Downloaded: 664,375 bytes in 16 files
 
Old 05-15-2008, 05:18 PM   #8
frenchn00b
Senior Member
 
Registered: Jun 2007
Location: E.U., Mountains :-)
Distribution: Debian, Etch, the greatest
Posts: 2,546

Rep: Reputation: 51
In the 1000 alternatives:
Code:
elinks "URL" |  grep -o 'http:[^"]*' | grep vamp | xargs wget -k

Last edited by frenchn00b; 05-15-2008 at 05:19 PM.
 
Old 05-15-2008, 05:42 PM   #9
Oris13
LQ Newbie
 
Registered: Aug 2005
Posts: 6

Original Poster
Rep: Reputation: 0
Thank you all for replies. I will try those later. I thought there was an easier way, something i was missing. I guess not, but thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
wget: Multi-Threaded downloading wwnexc Linux - Networking 8 05-15-2010 08:40 PM
How to set path for downloading files for wget vineet7kumar Linux - Newbie 2 04-24-2008 04:38 AM
What's So Reliable About the wget Mirror Command vs Downloading Other Ways? des_a Linux - Software 0 03-12-2008 11:53 AM
[wget] Crawl through directory structure, get only certain files FnordPerfect Linux - Software 2 01-19-2007 12:06 PM
HOWTo restart downloading with wget ashwin_cse Red Hat 5 08-26-2004 10:07 AM


All times are GMT -5. The time now is 02:07 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration