LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Rsync: backups and hard links (https://www.linuxquestions.org/questions/linux-software-2/rsync-backups-and-hard-links-891531/)

JosephS 07-13-2011 01:37 PM

Rsync: backups and hard links
 
I am using rsync for incremental backups. I am backing up to a second hard drive on my computer. When I check the individual backup directories (backup.0 through backup.4) with du -hs they each show 12G; when I check the parent directory squeeze it shows 15G. Over 4 backups I have added 3G. I haven't made very much for changes to directories I'm backing up and am using hard links. I have included some info below. Maybe someone can show me what is wrong.

Quote:

Backup script:
#!/bin/bash
mount /mnt/backup
cd /mnt/backup/squeeze/
rm -rf backup.7
mv backup.6 backup.7
mv backup.5 backup.6
mv backup.4 backup.5
mv backup.3 backup.4
mv backup.2 backup.3
mv backup.1 backup.2
mv backup.0 backup.1
cd /
rsync -am --delete-after --filter="merge /root/scripts/filter-rule" --delete-excluded \
--link-dest=/mnt/backup/squeeze/backup.1 \
/ /mnt/backup/squeeze/backup.0
umount /mnt/backup
Quote:

filter-rule:
- /home/joe/downloads/
- /home/joe/.local/share/Trash/
+ /home/
+ /etc/
+ /root/
+ /boot/
+ /usr/
+ /usr/local/
- /usr/*
- /*

arizonagroovejet 07-13-2011 02:03 PM

Put output, commands, scripts and such in CODE tags. It makes your post easier to read. (CODE tags can be added using the button marked # when composing a post)

I don't think I entirely understand your description of the problem - are you saying that the source directory contains hardlinks and has a total size of 12GB but the destination directory ends up containing 15GB? If so, look at the -H option.

JosephS 07-13-2011 09:39 PM

Sorry about the confusion. I am backing up from drive A to B. I am backing up the directories in the filter rule under / to /mnt/backup/squeeze/backup.0. There are no hard links in the source directory. The hard links are on the destination directory. With the --link-dest=/mnt/backup/squeeze/backup.1 rsync hard links unchanged files in backup.0/ to backup.1/ and copies new or changed files from the source directories under / to backup.0/
I don't make many changes so with the hard links I can't see why there is so much extra space taken. If most of the files under /mnt/backup/squeeze/ are hark linked (backup.0 through backup.4) and show each 12G why is /mnt/backup/squeeze/ 15G?

Quote:

Here is quote from an article:
--link-dest this is a neat way to make full backups of your computers without losing much space. rsync links unchanged files to the previous backup (using hard-links, see below if you don’t know hard-links) and only claims space for changed files. This only works if you have a backup at hand, otherwise you have to make at least one backup beforehand.

scheuref 09-17-2012 06:03 AM

hard links with rsync
 
hi,

the 3 G overhead may be due to a lot of files, each hard link is using a small amount of disk, also the folders use 4096 Bytes at least.
so with 'find folder | wc -l' you can count and see if you have a huge number of files.

another possible cause is that rsync cannot use hard-links because the original folder has different timestamps (or permissions or ownerships) than the existing backup. you can use 'stat file' to check if original and backup files are really idendical.
you can also find the responsible folder by using 'du -chs /newbackup/folder1 /oldbackup/folder1' and then on folder2 3 etc.
so you will find out where the hard links are used and where not, then use 'stat' to check why rsync did not use hard link.

finally you can replace file with hard links using 'fdupes -r1L /newbackup /oldbackup'

PS:
you can find a script to backup the whole disk with rsync here: http://blog.pointsoftware.ch/index.p...th-hard-links/
It uses file deduplication thanks to hard-links, uses also MD5 integrity signature, 'chattr' protection, filter rules, disk quota, retention policy with exponential distribution (backups rotation while saving more recent backups than older).
It was already used in Disaster Recovery Plans for banking companies, in order to replicate datacenters, using only little network bandwidth and transport encryption tunnel.
Can be used locally on each servers or via network on a central remote backup server.

and it is free of course ^^
Francois Scheurer


All times are GMT -5. The time now is 08:30 AM.