Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I would like to know if there is an alternative to rsync for doing backups.
I currently use rsync to make incremental backups based on what is changed. I use the backup feature of rsync by configuring it to move replaced or deleted files into an "arch/YYYY-mm-dd" directory. The current copy of the backed up file tree is in "sync". Both "arch" and "sync" are in a directory designated for each backup configuration.
The primary problem is these backups get very large, with many millions of files. What rsync does is collect all the names of all the files into memory before doing any real transfers of data. This is putting a lot of memory stress on the source (to be backed up) and target (where backups are saved) systems.
I believe such memory hogging is really not needed. At any one time, the most that should be needed is to keep the names of all files in each of the directory levels down to the one where the backup activity is currently working ... not the entire tree. So I am looking for something that can do these things rsync does, but without doing this reading of all files in the entire tree.
There is also a secondary problem. I want to make a backup of the backup. This is resulting in double transfers. Because files in "sync" (on the target, which is now the source for this secondary backup) get moved to "arch" when replaced, the new target gets these files transferred to it as new files in "arch", even though it has a copy in "sync". I tried the --fuzzy option on rsync to see if it would find the duplicate "somewhere else". This has not worked.
If a new program is made to specifically deal with this, and synchronize primary backups to secondary backups with a minimum of transferred data (replicate the previous moves), that would be great. But it will still need to do smart data-incremental transfers where files being replaced still have most of the old data just like rsync was originally designed for.
If there is no such existing program, is there any interest in one being developed that focuses on incremental primary and secondary backups?
Which version of rsync are you using? From my rsync 3.0.8 man page it says:
Quote:
Beginning with rsync 3.0.0, the recursive algorithm used is now an incremental scan that uses much less memory than before and begins the transfer after the scanning of the first few directories have been completed. This incremental scan only affects our recursion algorithm, and does not change a non-recursive transfer. It is also only possible when both ends of the transfer are at least version 3.0.0.
Perhaps you need to update your rsync package or remove it and compile the latest rsync from sources.
The versions varies from 3.0.6 to 3.0.8 depending on which system. I'm not seeing any incremental effect. It still scans the ENTIRE tree before doing any file transfers.
I don't know why they ever did it that way. Everything I see rsync doing can be done with only having the names in memory for the directories it is currently working at. I guess this was all designed back in the days of not so many files on small disks.
What options are you using? In the rsync manpage, the description for the "--recursive" option lists several other options that disable the incremental recursion mode and thus require much more memory.
If you want to go wild.
Some of the new file systems could be set up to provide similar data protection. Btrfs and zfs could be used to keep backups of data.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.