An alternative to rsync?

Skaperen · 11-01-2012, 09:21 PM

I would like to know if there is an alternative to rsync for doing backups.

I currently use rsync to make incremental backups based on what is changed. I use the backup feature of rsync by configuring it to move replaced or deleted files into an "arch/YYYY-mm-dd" directory. The current copy of the backed up file tree is in "sync". Both "arch" and "sync" are in a directory designated for each backup configuration.

The primary problem is these backups get very large, with many millions of files. What rsync does is collect all the names of all the files into memory before doing any real transfers of data. This is putting a lot of memory stress on the source (to be backed up) and target (where backups are saved) systems.

I believe such memory hogging is really not needed. At any one time, the most that should be needed is to keep the names of all files in each of the directory levels down to the one where the backup activity is currently working ... not the entire tree. So I am looking for something that can do these things rsync does, but without doing this reading of all files in the entire tree.

There is also a secondary problem. I want to make a backup of the backup. This is resulting in double transfers. Because files in "sync" (on the target, which is now the source for this secondary backup) get moved to "arch" when replaced, the new target gets these files transferred to it as new files in "arch", even though it has a copy in "sync". I tried the --fuzzy option on rsync to see if it would find the duplicate "somewhere else". This has not worked.

If a new program is made to specifically deal with this, and synchronize primary backups to secondary backups with a minimum of transferred data (replicate the previous moves), that would be great. But it will still need to do smart data-incremental transfers where files being replaced still have most of the old data just like rsync was originally designed for.

If there is no such existing program, is there any interest in one being developed that focuses on incremental primary and secondary backups?

sag47 · 11-01-2012, 11:25 PM

Which version of rsync are you using? From my rsync 3.0.8 man page it says:

Quote:

Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the transfer after the scanning of the first few directories have been completed. This incremental scan only affects our recursion algorithm, and does not change a non-recursive transfer. It is also only possible when both ends of the transfer are at least version 3.0.0.

Perhaps you need to update your rsync package or remove it and compile the latest rsync from sources.

SAM

Skaperen · 11-03-2012, 03:32 PM

The versions varies from 3.0.6 to 3.0.8 depending on which system. I'm not seeing any incremental effect. It still scans the ENTIRE tree before doing any file transfers.

I don't know why they ever did it that way. Everything I see rsync doing can be done with only having the names in memory for the directories it is currently working at. I guess this was all designed back in the days of not so many files on small disks.

rknichols · 11-03-2012, 07:58 PM

What options are you using? In the rsync manpage, the description for the "--recursive" option lists several other options that disable the incremental recursion mode and thus require much more memory.

jefro · 11-04-2012, 10:20 AM

If you want to go wild.
Some of the new file systems could be set up to provide similar data protection. Btrfs and zfs could be used to keep backups of data.

Go go old school a tar.gz on cron might do fine.