A script that could mirror slackware tree from multiple servers
It uses lftp to get the file names to download and uses aria2 to download them from multiple servers. (SlackBuild for aria2 can be found in SBo) It will start _one_ connection per time(i.e., only one thread from each server). So it will rise your speed but won't contribute very much to the servers' load. The script is here:
Code:
#!/bin/zsh |
OK, I'll bite. What exactly is the advantage of this? Just to lower load on the mirror servers by distributing the download?
Couldn't you do the same thing by running a different rsync operation against each directory? Or in other words, rsync "slackware" from one server and "source" from another? |
First, Thanks for your comment. Then, my answers:
1, Yes, one advantage is lowering the load on the server side. But this is not the most important feature. I post it in the very first post because I don't want to threaten them. The servers are always powerful, right? ;) 2, Not every mirror has rsync service.(well, most of them have, but not all) This script use lftp that can get files from ftp, http, ftps... the ones lftp support. The more mirrors you utilize, the faster speed you can get. 3, rsync in my network is _very_ slow, about 0.xKB/s. I don't know the reason but this is the truth. So I could only use ftp/http protocol to update my copies. But the connections to foreign servers are also slow, about 10~20KB/s per connection. So I have to I have to think about solutions to boost the speed --- get files from more servers. The total speed now is bearable, about 100~200KB/s(roughly equal to 10*speed_per_server). I'm satisfied with it. So the third advantage could be: if you slow with one server, you could get more with this script. Your solution has a disadvantage that you cannot run two rsyncs in the same folder as they may over write each other. But there are only limited numbers of folders and each changed will not effect all of them. Say, PatV upgraded firefox, only slackware64/xap and source/xap/mozilla-firefox may have changes. So you could just launch two rsync instances. It's not comparable with downloading from 10 servers at a time ;) Although rsync could download text files very efficiently, I doubt the effect on binary files, which consist most part of the tree. |
Thanks for posting, but I don't think I'd be inclined to use anything like that as I'm never in that much of a hurry. I'm curious though regarding consistency. If you're pulling from all over the place, what happens if the mirrors are out of step with the main one? At best you'll get some sort of file not found error, at worst, you could end up with some files having the wrong contents.
|
In my experiences, Slackware never have two different package with the same package name(i.e., file name). So if one of the mirrors is out of date, you cannot get content there with the new file name. aria2 will handle it. As for SlackBuild scripts and other staff without a version number, they are very small and aria2 won't split them into many parts, so they won't be downloaded from multiple servers. But my script can't guarantee that. I may add this feature in the future. At least, there are the checksums. Thanks for advising ~;)
|
I mailed the author of aria2 to ask the problem about more than one URIs point to different file. He answers that aria2 will compare aria2 will check the file size and if it differs, it will drop some of the URIs. However, it cannot guarantee which URI will be dropped and which URI will be preserved. So it unlikely to corrupt files, although the downloaded one maybe not the one in the main server. Here I have two solutions:
1, wait for 1~2 days. ;) The mirrors listed in the script are very active -- 1~2 days is enough for them to synchronize with each other. 2, run "mmirror-slack.sh -f", than it will only download files from the main server. This could be slow but if you have already downloaded most of stuffs from multiple servers(i.e., run mmirror-slack.sh first), it won't take too much time. It won't even hurt if you run rsync afterward, because after run mmirror-slack.sh, out-of-sync files should be SlackBuilds, txts, CHECKSUMS.md5 that without a version number, these are all very small. I updated the script in the very first post. If anyone use this script, please upgrade your local copy. Thanks. |
Quote:
I notice that you are from China. I analyzed the log file of my Slackware mirror (between 15th and 21st November). There were 1882 failed downloads of Slackware ISO files, with 205 unique IP addresses. According to whois, 164 of those were from China. And there where 43 succeeded ISO downloads, from 31 unique addresses. None from China. Most succeeded downloads were from Europe but some were from countries like Malaysia, Argentina and Colombia which are far from my location (Finland, Europe). The downloads from China look like this: Sat Nov 21 07:15:34 2009 [pid 30133] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 161424 bytes, 1.72Kbyte/sec Sat Nov 21 07:17:19 2009 [pid 30153] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 192888 bytes, 3.01Kbyte/sec Sat Nov 21 07:23:44 2009 [pid 30184] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 161424 bytes, 2.12Kbyte/sec Sat Nov 21 07:24:38 2009 [pid 30187] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 112176 bytes, 2.16Kbyte/sec I hid the ip address. The same file was tried to download 93 times from the same ip address for a time period of five hours. It always fails immediately, after about 100 kilobytes. So, I think there is something wrong in the Chinese net. |
China uses extensive firewalling and QoS systems to control and monitor their access to the Internet, so that is very possible.
|
Ok, I admit Chinese network have firewalls have many limitations.... So in one aspect, my script can be considered as some kind of "workaround" of the problem. Besides, not all of the nets in the world is as fast as Europe or USA or Japan, I think many under-developing country doesn't have very fast network yet. So they may get benefit from my script. And people in fast net could use my script to get faster, although there is less room to improve... ;)
|
All times are GMT -5. The time now is 07:07 PM. |