LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Periodic backup of changes on another disk from full backup (https://www.linuxquestions.org/questions/linux-software-2/periodic-backup-of-changes-on-another-disk-from-full-backup-4175523017/)

alleyoopster 10-22-2014 06:40 AM

Periodic backup of changes on another disk from full backup
 
I am looking for a backup solution that will enable me to backup only changes to an internal disk after a full backup to an external disk. Also, this should run snapshot backups every day until the next full backup.

Day 1: Full Backup / > External
Day 2: Changes to /home after full > internal
Day 3: Changes to /home after full > internal

etc

Currently I am using a simple rsync script for the full backup

I have looked and previously used rsnapshot and recently Back in Time, which both allow transparent snapshots on the internal space of timed intervals. Both work great, but rely on full backup to the same location first as far as I know.

VG1
root
backups here

VG2
home

I am running Linux, LVM with 2 X VG (one home and other holds root LV and the backup LV). If LVM snapshots can be used in this, then I am open to suggestions. Something similar to https://btrfs.wiki.kernel.org/index....emental_Backup may work if I understand it correctly, but not ready to change to btrfs yet.

linosaurusroot 10-22-2014 10:41 AM

dump/restore (for ext* type filesystems) has levels where 0 means full and 1 means everything since full.

alleyoopster 10-22-2014 03:21 PM

Thanks, I didn't know about dump. Only problem is that the files won't remain transparent as in I can't restore an individual files which is the advantage of rsync or rsnapshot style backup. If I lose a file or need to get a file from last week, I don't want to restore the whole partition.

linosaurusroot 10-23-2014 05:23 AM

Quote:

Originally Posted by alleyoopster (Post 5257899)
I can't restore an individual files .... If I lose a file or need to get a file from last week, I don't want to restore the whole partition.

False - pick an empty directory and do
Code:

restore ivf /name/of/dumpfile
and select the file(s) to extract under your CWD. Then move them to where you want.

alleyoopster 10-23-2014 07:03 AM

Fantastic. Thanks I'll look into it. I noticed XFS has this also, which would be preferable for me - if it gets the same result.

Beryllos 10-23-2014 09:54 AM

Since you are already using rsync, you should look into the --link-dest=DIR option.

It works by hardlinks, so one of the requirements is that the destination filesystem must have hardlink capability (as ext3 and ext4 do, and I think even ntfs does).

It creates what looks like a full backup; you see a directory with all the files. However, files which have not changed since the last backup are actually stored as hardlinks to the previously saved files, which saves a lot of disk space. Files which were changed or deleted since the last backup can be retrieved by looking in the previous backup directories.

Here's how it works:

First you create the full backup. If you backup no more than once a day, you could name the backup directory according to the date as in this example:
Code:

#!/bin/bash
backup_dir=/backup_drive/home_backup_directory
today=$(date +%Y-%m-%d)    # or equivalently $(date +%F) if you have it

/usr/bin/rsync -a /home/ $backup_dir/$today

For incremental backups, add the --link-dest=DIR option:
Code:

last_backup=$(ls -1A $backup_dir | tail -1)

/usr/bin/rsync -a --link-dest=$backup_dir/$last_backup /home/ $backup_dir/$today

If you want (as I certainly would), you could add some error checking to make sure $last_backup is valid. It might not be, for example, if the external drive is offline, or if $(ls -1A $backup_dir/ | tail -1) points to a regular file or a directory other than the most recent backup.

Additional note added in editing: One of the beautiful things about hardlinking backups in this way is that you can delete old, unneeded backup directories in any order. If you no longer need the first full backup, go ahead and delete it; the incremental backups will still have all the files. Each hardlink is independently associated with the actual file, so as long as at least one hardlink remains, the file is still there. When you delete the last hardlink, the file is gone.

alleyoopster 10-23-2014 12:11 PM

Quote:

Originally Posted by Beryllos (Post 5258327)
Since you are already using rsync, you should look into the --link-dest=DIR option.

Thanks for the detailed answer. Hard links would be just the thing for this. Your suggestion is much like rsnapshot. There seems to be a problem though. The external drive is only connected for the full backup and after this stored in another location. With this method doesn't rsync need to see the full backup before it can run an incremental backup?

Beryllos 10-23-2014 01:08 PM

Quote:

Originally Posted by alleyoopster (Post 5258398)
Thanks for the detailed answer. Hard links would be just the thing for this. Your suggestion is much like rsnapshot. There seems to be a problem though. The external drive is only connected for the full backup and after this stored in another location. With this method doesn't rsync need to see the full backup before it can run an incremental backup?

Yes. That is a problem. The hardlinks and the full backup must be stored within the same filesystem.

If the drive is attached to another computer, and accessible via the network, rsync can handle that. It would send the new files and changed files over the network, to be stored on the same filesystem as the full backup. This can be done efficiently by compressing the data stream, and securely with ssh as the transport protocol. I've heard of people doing something like that, creating the full backup locally because it's much faster for a large file system, then taking the drive to their remote office or facility, and then performing incremental backups remotely.

If the full backup is off the network, or powered off in a storage closet or a safe, this rsync method won't do what you need.

Edit: Your idea to keep the backup at a remote location is an excellent one. It's tragic when the computer and the backup are lost together, as might happen in a fire or theft.

alleyoopster 10-24-2014 04:16 AM

I have been looking at the xfsdump method and it seems each time it writes a full backup it has to write all the data again. It cannot update just the changes. This would be a fail.

As for the rsync method I have an idea that if I create a LVM snapshot of the current /home and then specify this location as the --link-dest it may fool rsync into thinking that the snapshot was the last backup and then increment the backup from the current home. Not sure on what rsync uses to check the last backup.

Haven't tested either of the above yet.

linosaurusroot 10-24-2014 05:22 AM

If you ask for a full backup it writes all the data because that's what you asked for. To do a "differential" dump that writes only the changes you use a different dump level number.

alleyoopster 10-24-2014 08:19 AM

Quote:

Originally Posted by linosaurusroot (Post 5258712)
If you ask for a full backup it writes all the data because that's what you asked for. To do a "differential" dump that writes only the changes you use a different dump level number.

I did ask for a full backup and that is what I want - a periodic full backup to the external disk. In practical terms the way dump writes is to send all the information to that disk where rsync - which is my current method for full backup - writes (and deletes) what has changed, in affect mirroring the data. With a large volume (which this is) the amount of time and system resources to run a full backup is significance greater,using dump. Rsync could often only take a few minutes where dump would take hours.

The dump method does have the advantage of keeping a record of the backup and using that for further backups. What I think your're suggesting is to write diff backups to the external disk and increments to the internal disk. What I understand from this is that the external would grow and grow. Also restoring would be more difficult. Wouldn't I at some point need to run another huge full backup. I only know a little of dump, so I maybe missing something?

Beryllos 10-24-2014 09:00 PM

Sorry, I didn't read your original post closely enough. Now that I understand what you want, I have another suggestion. It might not be too hard to write a script to select and backup files that have been created or modified since the last backup, either full or incremental.

Immediately before the full backup, run rsync with the --list-only option and redirect the output to a file on your internal drive:
Code:

today=$(date +%F)
/usr/bin/rsync -a --list-only /home/ $external_backup_dir/$today > previous_file_list

or equivalently
Code:

/usr/bin/rsync -r /home/ > previous_file_list
That does not back up anything, but creates a list of all files and directories that would have been backed up, with each file's size and modification time. Those are the properties that rsync uses by default to determine whether a file needs to be backed up. At this point, we need only to save it for later use.

Then execute the rsync full backup to your external drive.
Code:

/usr/bin/rsync -a /home/ $external_backup_dir/$today
To begin the incremental backup at some later date, make an updated file list:
Code:

/usr/bin/rsync -r /home/ > current_file_list
Then compare the previous and current lists. At the moment, I couldn't tell you exactly how to do that, but I suspect it can be done without much difficulty in bash; perhaps identify the unchanged items and eliminate them, and also eliminate directories. For each file which has been created or modified, put its name with the full path into a backup to-do list. Then tell rsync to backup from that list to your internal backup directory:
Code:

/usr/bin/rsync -a --files-from=$backup_to-do_list $internal_backup_dir/$today
At this point, you could also identify and list the files that were deleted since the previous backup.

The final step is to update the file list for the next incremental backup:
Code:

mv -f current_file_list previous_file_list
Please note: I haven't tested any of this. There may be syntax errors and/or logical errors that I don't know about.

alleyoopster 10-25-2014 03:11 AM

That looks promising. I need to have a look at the compare as I also cannot see an obvious solution at the moment.


What I notice is that using --list-only gives a different and most likely more usable output to -r

Beryllos 10-25-2014 09:41 AM

Quote:

Originally Posted by alleyoopster (Post 5259270)
That looks promising. I need to have a look at the compare as I also cannot see an obvious solution at the moment.

I would start with the diff command:
Code:

diff previous_file_list current_file_list
That still leaves a lot of work to do. The output of diff will differ somewhat depending on whether the file was created, deleted, or modified. For any of those three cases, the containing directory also changes and this shows up in the diff output. If I were doing this, I would ignore all changes to directories and analyze only the changes of files.


Quote:

Originally Posted by alleyoopster (Post 5259270)
What I notice is that using --list-only gives a different and most likely more usable output to -r

Funny, I see the exact same output, but it might be due to the content of the source directory. The difference is probably not due to --list-only but rather due to using -r instead of -a. The -a option processes symlinks but I think -r doesn't. It would be better to use -a, or even better to use your exact rsync command plus the --list-only option. Edit: No, I just tried it with symlinks, and I still see no difference between the outputs of those rsync commands. It must be something else.

alleyoopster 10-26-2014 06:20 AM

There have been some issues this weekend with the system. I lost a drive last week, which prompted the search for a better backup solution. Replaced the disk with a relatively new disk I was using for backups, but now getting Buffer IO errors and system hangs intermittently. I checked the old drive and that is definitely dead. The problem actually looked like a SATA cable for a long time, but today it is looking more like a SATA port.

So, putting the backup solution on hold for a bit until I can sort out the system. Thanks you both for your help so far with this.


All times are GMT -5. The time now is 05:16 PM.