LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 06-22-2006, 05:38 PM   #1
wwnexc
Member
 
Registered: Sep 2005
Location: California
Distribution: Slackware & Debian
Posts: 264

Rep: Reputation: 30
wget: Multi-Threaded downloading


Hi,

I am wondering if it is possible to speed up the recursive function of wget by having it download muliple pages at once.

Thanks!!
 
Old 06-23-2006, 04:02 PM   #2
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
hi... this is just a bump... i'd like to know if this is possible also... i checked the wget man page but couldn't find anything...

i suspect the only way this would actually improve your "speed" is if each individual connection to the server is less than what your total bandwidth is... for example, if i have a 256Kbps connection and i'm downloading recursively (one file at a time) from a server at 32KB/s (all my bandwidth is used), then i don't think it would help to have two connections going-on at the same time... but if my download speed was actually, say, 16KB/s, then having two simultaneous downloads would indeed get my files twice as fast... of course this also depends on whether the server allows me to establish two simultaneous connections or not...

even if wget can't do this on it's own, i have a feeling one can use it within a shell script to achieve the desired result...

anyways, i'm hoping someone can shed some light on this...

Last edited by win32sux; 06-23-2006 at 04:30 PM.
 
Old 06-23-2006, 04:27 PM   #3
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
Quote:
i have a feeling one can use it within a shell script to achieve the desired result
Well, I guess you could use a bunch of forks...

I think it would be better to use a language that has native threading, such as Ruby:
Code:
#!/usr/bin/ruby

threads = []

for page in ARGV
  threads << Thread.new(page) do |url|
    puts "Fetching: #{url}"
    system("wget --recursive #{url}")
    puts "Got: #{url}"
  end
end

threads.each { |thr| thr.join }
use like:
Code:
$ ./get.rb "example.com/foo/" "example.com/bar/" "example.com/baz/"
 
Old 06-23-2006, 04:34 PM   #4
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
thanks for the reply!!

Quote:
Originally Posted by bulliver
Code:
#!/usr/bin/ruby

threads = []

for page in ARGV
  threads << Thread.new(page) do |url|
    puts "Fetching: #{url}"
    system("wget --recursive #{url}")
    puts "Got: #{url}"
  end
end

threads.each { |thr| thr.join }
Code:
$ ./get.rb "example.com/foo/" "example.com/bar/" "example.com/baz/"
but would that mirror those three directories simultaneously?? in other words, it would be like initiating those three downloads individually, no?? or would that start multiple connections for *each* of those subdirs?? i apologize if my question has an obvious answer - i don't really know how to read ruby...

Last edited by win32sux; 06-23-2006 at 04:36 PM.
 
Old 06-23-2006, 05:00 PM   #5
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
What this will do is recursively download "/foo/" "/bar/" and "/baz/" from example.com separately, but at the same time.

It is just quick and dirty, and has no error checking, but you get the idea. As many URLs as you can pass on the command line it will download simultaneously.

Quote:
would that start multiple connections for *each* of those subdirs??
You could do this, but not with wget. You would have to do it in pure Ruby (or Python, or Perl etc etc) and would take many more lines of code than my simple example...
 
Old 06-23-2006, 05:05 PM   #6
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
yeah, i was afraid of that...

it would be cool to be able to do something like:
Code:
./get.pl -n5 ftp://ftp.example.com/foo/
and have it download /foo recursively using multiple simultaneous connections, where "-n" is the amount of simultaneous connections you'd want to achieve, if the server allows...
 
Old 06-23-2006, 05:12 PM   #7
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
Quote:
it would be cool to be able to do something like: ./get.pl -n5 ftp://ftp.example.com/foo/
Right, well, that's why I used "example.com/foo/" "example.com/bar/" and "example.com/baz/" in my example.
If you want to download "example.com" recursively, and example.com has 'foo' 'bar' and 'baz' as subdirectories, then you are sorta achieving what you want, right? Right? He he. Come on, work with me here...
 
Old 06-23-2006, 05:22 PM   #8
win32sux
LQ Guru
 
Registered: Jul 2003
Location: Los Angeles
Distribution: Ubuntu
Posts: 9,870

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
Quote:
Originally Posted by bulliver
If you want to download "example.com" recursively, and example.com has 'foo' 'bar' and 'baz' as subdirectories, then you are sorta achieving what you want, right? Right? He he. Come on, work with me here...
LOL, it's all good, i hear ya...

but it would indeed be awesome to be able to deal with all the subdirs in one shot...

especially if ftp://ftp.example.com/ has like

< Dr. Evil Voice > One Meeeeeeellion Subdirs < Dr. Evil Voice /> ...

Last edited by win32sux; 06-23-2006 at 05:26 PM.
 
Old 05-15-2010, 08:40 PM   #9
SuperSparky
LQ Newbie
 
Registered: Apr 2008
Location: San Diego, California
Distribution: Ubuntu
Posts: 11

Rep: Reputation: 2
Solution!

I realize this is an ancient post, but my reply, I believe, falls under this web site's charter and purpose... to help.

wget on its own, is not multithreaded, and that is a shame. However, there is a way to achieve nearly the same effect, and here is how you do it:

wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &

copied as many times as you deem fitting to have as much processes downloading. This isn't as elegant as a properly multithreaded app, but it will get the job done with only a slight amount of over head. the key here being the "-N" switch. This means transfer the file only if it is newer than what's on the disk. This will (mostly) prevent each process from downloading the same file a different process already downloaded, but skip the file and download what some other process hasn't downloaded. It uses the time stamp as a means of doing this, hence the slight overhead.

It works great for me and saves a lot of time. Don't have too many processes as this may saturate the web site's connection and tick off the owner. Keep it around a max of 4 or so. However, the number is only limited by CPU and network bandwidth on both ends.

Enjoy!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
fork()'ed or multi threaded socket I/O? thedevilsjester Programming 5 09-24-2008 12:41 AM
C/C++ Multi-Threaded Programming Debugging powah Programming 2 04-24-2006 07:03 PM
multi-threaded sliding window mjl3434 Programming 2 10-24-2005 04:16 PM
how to do waitpid() using a multi-threaded server? Thinking Linux - Software 1 04-13-2005 09:50 PM
Multi-Threaded C pragti Programming 1 06-01-2004 10:50 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 09:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration