LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   How best to speed up an rsync/rsnapshot backup (http://www.linuxquestions.org/questions/linux-server-73/how-best-to-speed-up-an-rsync-rsnapshot-backup-4175426936/)

Vanyel 09-12-2012 03:47 PM

How best to speed up an rsync/rsnapshot backup
 
I have a CentOS 6 box with Rsnapshot installed (rsync 3.06 is on the machine) backing up several linux boxes and one Windows XP machine connected to an instrument to gather scientific data.

The two data folders on the XP box are shared to the server via standard smb filesharing and mounted "locally" on the linux box, where rsnapshot backs them up to an internal array.

This worked fine initially but months later I see that the scientific software regularly generates HUNDREDS of folders with hundreds of tiny files in them and the backup has become a mess as walking the folder tree takes so long that one backup never finishes before the next one is supposed to begin, and runs simultaneously with a second, unrelated rsnapshot instance, bogging the machine down horribly as both go on.

How can I speed this up?

The linux server already has a lot of memory (12 GB). Both machines have gigabit ethernet. I hear smb is much slower than NFS. Would putting NFS on the Windows box be a good way to get some more speed from this?

OR should I install rsync for Windows on the PC and forgo mounting the folders, letting rsync talk machine to machine?

Or does someone have a better idea than either?

SecretCode 09-12-2012 05:38 PM

I think rsync on windows would be the fastest solution. But that's not a scientific opinion.

How often does your backup run? How many files & folders are we talking about?

Vanyel 09-13-2012 09:55 AM

I want it to run nightly. As for how many? Hundreds of folders with little files! I don’t know. Too damn much. More added every day. Not lots of changes, but rsync doesn’t know what’s changed until after it’s looked at everything.

TenTenths 09-13-2012 10:25 AM

Have you thought about using dirvish to only rsync changed files each time?

Vanyel 09-13-2012 12:44 PM

“dirvish”? Never heard of it. I’ll keep that in mind for future reference. Of course, rsync (what the rsnapshot script is built on) will only sync changed files ... it’s just that it needs to look at all the files in the source first and compare them all to the destination before it knows what’s changed, and with hundreds of files that takes time.

I did find out some other things while surfing around though.

Rsync won’t use the same delta algorithms to determine what’s changed or not in the source and what needs to be copied when looking at a local (or locally mounted) filesystem as when it’s going over the network. Addressing my locally mounted folder as "localhost:/source” instead of “/source” should change that, making it in effect, a network backup.

Windows filesystem timestamps resolve with a 2 second accuracy, not 1 second like linux. So looking at timestamps, a Windows file may seem different to linux that actually is not. Lengthening the modify window argument in rsync can overcome that (I chose 3 seconds just to be safe, so any file with a 3 second or less modification difference is considered not different).

Windows Service for UNIX is free these days and lets me export NFS from a WinXP box! Also something of a one-click install. see http://technonstop.com/tutorial-setu...server-windows . This will let me easily get away from SMB to a faster protocol.

I’ll see tonight if these changes add up to enough of a difference. Wish me luck!

Vanyel 09-13-2012 12:48 PM

I’m leery about installing rsync on the Windows box and doing away with mounts for direct network copying instead, because I notice large (i.e. multi-gigabyte) rsync transfers can bog down a linux box, and I don’t want to chance the same thing happening to the Windows PC. I don’t have control over when it’s being used for data acquisition at the keyboard.

SecretCode 09-13-2012 02:14 PM

Can you do it once as a test? And are you sure all the file I/O - which has to happen somehow - isn't bogging down the windows machine?


All times are GMT -5. The time now is 10:30 AM.