rsync incremental backups to be restored
With the --backup and --backup-dir= options on rsync, I can tell it another tree where to put files that are deleted or replaced. I'm hoping it fills out the tree with a replica of the original directory paths (at least for the files put there) or else it's a show stopper. What I'm wanting to find out applies when I'm restoring files.
Assuming each time I run rsync (once a day) I make a new directory tree (named by the date) for the backup directory. For each file name/path in the tree, I would start with whatever is in the main tree (the rsync target) and work through the incremental trees going backwards until I reach the date of interest to restore to. If along the way I encounter a file in an incremental, I would replace the previous file at that path with this next one. So by the time I get back to a given date, I should have the version of the file which was present at that date. Do this for each file in the tree and it should be a full restore.
But ... and this is the hard part, it seems. What about files that did not exist at the intended restore date, but do exist (were created) on a date after the intended restore date. What I'd want for a correct restore would be for such files to be absent in the restored tree (just as they were absent in the source tree on that date).
How can such a restore be done to correctly exclude these files? Wouldn't rsync have to store some kind of sentinel that indicates that on dates prior, the file did not exist.
I suspect someone might suggest I just make a complete hard linked replica tree for each date, and this way absent files will clearly be absent. I can assure you this is completely impractical because I have actually done this before. I ended up with backup filesystems that have so many directories and nodes that it could take over a day, maybe even days, to just do something like "du -s" on it. I'm intending to keep daily changes for at least a couple years, if not more. So that means the 40 million plus files would be multiplied by over 700, making programs like "du -s" have to check over 28 BILLION file names (and that's assuming the number of files does not grow over the next two years). Let's not go that way.
Do you really need every days backup to be immediately available ? .. I'd probably split it into 2 parts :-
- rolling rsync backups with hard links to keep 30 days worth readily available
- generate a tarball every month and keep it for 2 years
The contents of each tarball can be listed and dumped into a text file when its generated to allow you to grep for a particular file
You can try Rsnapshot http://rsnapshot.org/ .. <plug> I also wrote one in bash before I realised Rsnapshot existed :) - snap_create</plug>
Yes, daily is needed for most of it. It might need more often than daily. Weekends can probably be excluded.
I just read the article linked by rsnapshot. It looks like what I used to do, which is basically a hard-linked replica that I'm trying to avoid because it won't scale to this huge project.
I'm working on my own project to make something that will do the backup increments correctly (which --backup-dir= won't do) so a restore will be correct within the day resolution. But I need to verify that there isn't some other way to do it before I can justify using my project for this.
|All times are GMT -5. The time now is 12:36 AM.|