LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   A script that could mirror slackware tree from multiple servers (http://www.linuxquestions.org/questions/slackware-14/a-script-that-could-mirror-slackware-tree-from-multiple-servers-769891/)

grissiom 11-18-2009 03:42 AM

A script that could mirror slackware tree from multiple servers
 
It uses lftp to get the file names to download and uses aria2 to download them from multiple servers. (SlackBuild for aria2 can be found in SBo) It will start _one_ connection per time(i.e., only one thread from each server). So it will rise your speed but won't contribute very much to the servers' load. The script is here:
Code:

#!/bin/zsh

usage="mmirror-slack.sh mirror slackware tree from multiple servers.

usage: mmirror-slack.sh [-vt]
    v: be verbose. Display the commands that going to run.
    f: final mode. Remove the files that not present on remote server.
    n: dry run. Display the commands going to run but not excute them.
      Implies v.

mmirror-slack.sh also receive parameters from environment variables:
    VERSION: the version you want to mirror. -current, -13.0 etc.
            Default is -current. Don't forget the leading '-'.
    LOCALMIRROR: where is your mirror on the disk. Be sure to adjust
            it before run this script.
    ARCH: i386, x86_64 etc.
    FOLDER: the folder under tree you want to mirror. slackware/, extra/ etc.
            Don't forget the trailing '/'.
    LEXTRAAGRS: extra arguments that passed to lftp.
    AEXTRAAGRS: extra arguments that passed to aria2.
"

MAINMIRROR='ftp://ftp.osuosl.org/pub/slackware/'

# add your favorite mirrors here
MIRRORS=(ftp://darkstar.ist.utl.pt/pub/slackware/
ftp://slackware.mirrors.tds.net/pub/slackware/
ftp://ftp.slackware.no/pub/linux/slackware/
ftp://ftp.slackware.at/
ftp://ftp.ntua.gr/pub/linux/slackware/
http://mirror.switch.ch/ftp/mirror/slackware/
ftp://ftp.heanet.ie/mirrors/ftp.slackware.com/pub/slackware/
ftp://ftp.belnet.be/mirror/ftp.slackware.com/
ftp://ftp.slackware.org.uk/slackware/
http://slackware.cs.utah.edu/)
MIRRORS+=$MAINMIRROR
#http://mirrors.163.com/slackware/

# -current or -13.0 etc. Don't forget the leading '-'.
VERSION=${VERSION:-'-current'}

# where your local mirror located. In that folder you should have some thing
# like:
#  slackware64-current/
#  slackware-current/
#  slackware-13.0/
LOCALMIRROR=${LOCALMIRROR:-'/ext4/slackware_rsync'}
# use ARCH to determine which branch to mirror.
case $ARCH in
        'x86_64' )
        SBASE='slackware64'
        TBASE=$SBASE
        ;;
        'i386' )
        SBASE='slackware'
        TBASE=$SBASE
        ;;
        * )
        echo "ARCH=[x86_64|i386] mmirror-slack.sh"
        echo "see source file for more parameters."
        exit 1
        ;;
esac

TDIR=${LOCALMIRROR}/${TBASE}${VERSION}/${FOLDER}
SDIR=${SBASE}${VERSION}/${FOLDER}

on_exit() {
        kill 0
        exit
}

exec_cmd() {
        [ $VERBOSE -ne 0 ] && echo $@
        if [ $DRYRUN -eq 0 ]; then
                eval $@
        fi
        [ $? -ne 0 ] && CMDFAIL+="\n""$@"
}

fetch_cmd() {
        LEXTRAAGRS=${LEXTRAAGRS}' --verbose=3 --script=- '
        # Some mirror have symbolic links, others are not. So for compatible
        #  reason, use --dereference to download symbolic links as files.
        #  Hope this won't get local mirror too large...
        # If you behind good router and use good mirror, set ftp:sync-mode off
        lftp -c "set ftp:sync-mode on
                open $MAINMIRROR &&
                mirror ${LEXTRAAGRS} \
                  ${SDIR} ${TDIR}"
}

dispatch_cmd() {
        while read -u 0 cmdline; do
                case ${cmdline[1,3]} in
                        "get" )
                        cmd=${cmdline//"$MAINMIRROR"/}
                        file=$(echo "$cmd" | rev | cut -f 1 -d ' ' | rev)
                        folder=$(echo "$cmd" | rev | cut -f 2 -d ' ' | rev)
                        exec_cmd aria2c ${AEXTRAAGRS} --summary-interval=0 \
                            --allow-overwrite=true --remote-time=true \
                            --dir="${folder}" --split=${NMIRROR} \
                            ${MIRRORS[@]/%/${file}}
                        ;;
                        'rm ' )
                        cmd=${cmdline//"file:"/}
                        if [ $FINAL -eq 1 ]; then
                                exec_cmd $cmd
                        fi
                        ;;
                        * )
                        if [ "${cmdline[1,5]}" = 'chmod' ]; then
                                cmd=${cmdline//"file:"/}
                                exec_cmd $cmd
                        elif [ "${cmdline[1,5]}" = 'shell' ]; then
                                cmd=${cmdline//"shell "/}
                                exec_cmd $cmd
                        else
                                echo "$cmdline"
                        fi
                        ;;
                esac
        done
}

#############
# Main body #
#############
trap on_exit 1 2 3 6

FINAL=0
VERBOSE=0
DRYRUN=0
NMIRROR=$((${#MIRRORS[@]}-1))
while getopts ':nvf' opt; do
        case $opt in
                'v' )
                VERBOSE=1
                ;;
                'f' )
                FINAL=1
                LEXTRAAGRS+=" --delete "
                MIRRORS=$MAINMIRROR
                NMIRROR=1
                ;;
                'n' )
                DRYRUN=1
                ;;
                '?' )
                unkopt+=$OPTARG' '
                ;;
        esac
done
[ $DRYRUN -eq 1 ] && VERBOSE=1

[ -n "$unkopt" ] && { echo "$usage"; echo "Unkown option: $unkopt"; exit 2}

echo "Mirror $MAINMIRROR/$SDIR to $TDIR :"
if [ $FINAL -eq 1 ]; then
        MIRRORS=$MAINMIRROR
        echo "FINAL mode"
fi
fetch_cmd | dispatch_cmd

[ -n "$DFAILFILE" ] && echo "failed to download:" $DFAILFILE
[ -n "$CMDFAIL" ] && echo "failed to run command(s):" $CMDFAIL

exit 0

The code is hosted on http://gitorious.org/slack-utils/slack-utils . Cloning or suggestions are strongly welcome~ ;)

MS3FGX 11-19-2009 07:12 PM

OK, I'll bite. What exactly is the advantage of this? Just to lower load on the mirror servers by distributing the download?

Couldn't you do the same thing by running a different rsync operation against each directory? Or in other words, rsync "slackware" from one server and "source" from another?

grissiom 11-20-2009 06:16 AM

First, Thanks for your comment. Then, my answers:

1, Yes, one advantage is lowering the load on the server side. But this is not the most important feature. I post it in the very first post because I don't want to threaten them. The servers are always powerful, right? ;)

2, Not every mirror has rsync service.(well, most of them have, but not all) This script use lftp that can get files from ftp, http, ftps... the ones lftp support. The more mirrors you utilize, the faster speed you can get.

3, rsync in my network is _very_ slow, about 0.xKB/s. I don't know the reason but this is the truth. So I could only use ftp/http protocol to update my copies. But the connections to foreign servers are also slow, about 10~20KB/s per connection. So I have to I have to think about solutions to boost the speed --- get files from more servers. The total speed now is bearable, about 100~200KB/s(roughly equal to 10*speed_per_server). I'm satisfied with it. So the third advantage could be: if you slow with one server, you could get more with this script.

Your solution has a disadvantage that you cannot run two rsyncs in the same folder as they may over write each other. But there are only limited numbers of folders and each changed will not effect all of them. Say, PatV upgraded firefox, only slackware64/xap and source/xap/mozilla-firefox may have changes. So you could just launch two rsync instances. It's not comparable with downloading from 10 servers at a time ;) Although rsync could download text files very efficiently, I doubt the effect on binary files, which consist most part of the tree.

GazL 11-20-2009 06:57 AM

Thanks for posting, but I don't think I'd be inclined to use anything like that as I'm never in that much of a hurry. I'm curious though regarding consistency. If you're pulling from all over the place, what happens if the mirrors are out of step with the main one? At best you'll get some sort of file not found error, at worst, you could end up with some files having the wrong contents.

grissiom 11-20-2009 07:46 AM

In my experiences, Slackware never have two different package with the same package name(i.e., file name). So if one of the mirrors is out of date, you cannot get content there with the new file name. aria2 will handle it. As for SlackBuild scripts and other staff without a version number, they are very small and aria2 won't split them into many parts, so they won't be downloaded from multiple servers. But my script can't guarantee that. I may add this feature in the future. At least, there are the checksums. Thanks for advising ~;)

grissiom 11-21-2009 01:11 AM

I mailed the author of aria2 to ask the problem about more than one URIs point to different file. He answers that aria2 will compare aria2 will check the file size and if it differs, it will drop some of the URIs. However, it cannot guarantee which URI will be dropped and which URI will be preserved. So it unlikely to corrupt files, although the downloaded one maybe not the one in the main server. Here I have two solutions:

1, wait for 1~2 days. ;) The mirrors listed in the script are very active -- 1~2 days is enough for them to synchronize with each other.

2, run "mmirror-slack.sh -f", than it will only download files from the main server. This could be slow but if you have already downloaded most of stuffs from multiple servers(i.e., run mmirror-slack.sh first), it won't take too much time. It won't even hurt if you run rsync afterward, because after run mmirror-slack.sh, out-of-sync files should be SlackBuilds, txts, CHECKSUMS.md5 that without a version number, these are all very small.

I updated the script in the very first post. If anyone use this script, please upgrade your local copy. Thanks.

Petri Kaukasoina 11-21-2009 02:49 AM

Quote:

Originally Posted by grissiom (Post 3763757)
But the connections to foreign servers are also slow, about 10~20KB/s per connection. So I have to I have to think about solutions to boost the speed --- get files from more servers.

Hi

I notice that you are from China.

I analyzed the log file of my Slackware mirror (between 15th and 21st November). There were 1882 failed downloads of Slackware ISO files, with 205 unique IP addresses. According to whois, 164 of those were from China. And there where 43 succeeded ISO downloads, from 31 unique addresses. None from China. Most succeeded downloads were from Europe but some were from countries like Malaysia, Argentina and Colombia which are far from my location (Finland, Europe).

The downloads from China look like this:

Sat Nov 21 07:15:34 2009 [pid 30133] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 161424 bytes, 1.72Kbyte/sec
Sat Nov 21 07:17:19 2009 [pid 30153] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 192888 bytes, 3.01Kbyte/sec
Sat Nov 21 07:23:44 2009 [pid 30184] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 161424 bytes, 2.12Kbyte/sec
Sat Nov 21 07:24:38 2009 [pid 30187] [ftp] FAIL DOWNLOAD: Client "XXX.XXX.XXX.XXX", "/slackware-13.0-iso/slackware-13.0-install-dvd.iso", 112176 bytes, 2.16Kbyte/sec

I hid the ip address. The same file was tried to download 93 times from the same ip address for a time period of five hours. It always fails immediately, after about 100 kilobytes.

So, I think there is something wrong in the Chinese net.

MS3FGX 11-21-2009 09:54 AM

China uses extensive firewalling and QoS systems to control and monitor their access to the Internet, so that is very possible.

grissiom 11-21-2009 07:05 PM

Ok, I admit Chinese network have firewalls have many limitations.... So in one aspect, my script can be considered as some kind of "workaround" of the problem. Besides, not all of the nets in the world is as fast as Europe or USA or Japan, I think many under-developing country doesn't have very fast network yet. So they may get benefit from my script. And people in fast net could use my script to get faster, although there is less room to improve... ;)


All times are GMT -5. The time now is 01:41 AM.