wget: Multi-Threaded downloading
Hi,
I am wondering if it is possible to speed up the recursive function of wget by having it download muliple pages at once. Thanks!! |
hi... this is just a bump... i'd like to know if this is possible also... i checked the wget man page but couldn't find anything...
i suspect the only way this would actually improve your "speed" is if each individual connection to the server is less than what your total bandwidth is... for example, if i have a 256Kbps connection and i'm downloading recursively (one file at a time) from a server at 32KB/s (all my bandwidth is used), then i don't think it would help to have two connections going-on at the same time... but if my download speed was actually, say, 16KB/s, then having two simultaneous downloads would indeed get my files twice as fast... of course this also depends on whether the server allows me to establish two simultaneous connections or not... even if wget can't do this on it's own, i have a feeling one can use it within a shell script to achieve the desired result... anyways, i'm hoping someone can shed some light on this... :study: |
Quote:
I think it would be better to use a language that has native threading, such as Ruby: Code:
#!/usr/bin/ruby Code:
$ ./get.rb "example.com/foo/" "example.com/bar/" "example.com/baz/" |
thanks for the reply!!
Quote:
|
What this will do is recursively download "/foo/" "/bar/" and "/baz/" from example.com separately, but at the same time.
It is just quick and dirty, and has no error checking, but you get the idea. As many URLs as you can pass on the command line it will download simultaneously. Quote:
|
yeah, i was afraid of that... :)
it would be cool to be able to do something like: Code:
./get.pl -n5 ftp://ftp.example.com/foo/ |
Quote:
If you want to download "example.com" recursively, and example.com has 'foo' 'bar' and 'baz' as subdirectories, then you are sorta achieving what you want, right? Right? He he. Come on, work with me here... ;) |
Quote:
but it would indeed be awesome to be able to deal with all the subdirs in one shot... especially if ftp://ftp.example.com/ has like < Dr. Evil Voice > One Meeeeeeellion Subdirs < Dr. Evil Voice /> ... :D |
Solution!
I realize this is an ancient post, but my reply, I believe, falls under this web site's charter and purpose... to help.
wget on its own, is not multithreaded, and that is a shame. However, there is a way to achieve nearly the same effect, and here is how you do it: wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & wget -r -np -N [url] & copied as many times as you deem fitting to have as much processes downloading. This isn't as elegant as a properly multithreaded app, but it will get the job done with only a slight amount of over head. the key here being the "-N" switch. This means transfer the file only if it is newer than what's on the disk. This will (mostly) prevent each process from downloading the same file a different process already downloaded, but skip the file and download what some other process hasn't downloaded. It uses the time stamp as a means of doing this, hence the slight overhead. It works great for me and saves a lot of time. Don't have too many processes as this may saturate the web site's connection and tick off the owner. Keep it around a max of 4 or so. However, the number is only limited by CPU and network bandwidth on both ends. Enjoy! |
All times are GMT -5. The time now is 10:48 PM. |