Files seem to take up more space in destination after rsync copy
I have recently purchased an external hard drive in order to backup my home partition. In my PC I have a "1.5T" drive with several partitions on it, containing OSes and the home partition. The home partition is 1.3T according to df, the external drive contains one partition that spans the entire disk,df reports it as 1.4T in size. Both partitions are ext3.
When I use rsync to copy files from the home partition to the external partition, the external disk becomes full, despite the destination - supposedly - being larger than the source. I don't understand why copying files from one partition to a slightly bigger partition should need more space than on the source partition. Does anyone know what is happening ?
I created the partition on the external drive with gparted; gparted reported it the already have several gigabytes in used space immediately after the partitions creation - I thought at the time that this must be normal.
The home partition contains many files of all sorts, including lots of big audio and video files. If you are wondering, for all my important files this external disk is only secondary backup, as they are also backed up to the "internet".
These are the mount points :
/mnt/tmp/ : home partition, /dev/sdb6
/mnt/external/ : external partition, /dev/sdc1
I used rsync to copy the files, I know there are more efficient ways to do this, but I wanted to use the same command that I will subsequently run to sync the backup.
rsync -av --progress --stats --recursive --perms --links --delete /mnt/tmp/ /mnt/external/
Next I tried adding the --sparse switch, as I was wondering if the problem may come form sparse files. I don't know however if rsync would go back and shrink the sparse file by just adding the switch and executing the command. I also added --one-file-system, for good measure. Here is what I ran next :
rsync -av --progress --stats --sparse --one-file-system --recursive --perms --links --delete /mnt/tmp/ /mnt/external/
I tried an fsck on the home partition :
fsck -f /dev/sdb6
This is the output from the last rsync :
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "abcd.avi": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.6]
rsync: connection unexpectedly closed (27886 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
Looking at the destination after a partial copy seems to indicate that the problem is not symbolic links being "expanded". I have not checked the source filesystem for sparse files, nor the destination to see if these files could be larger there, as this does not seem trivial.
Here is some additional info :
$ df /mnt/tmp/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb6 1415342836 1414173740 369096 100% /mnt/tmp
$ df /mnt/external/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc1 1442145212 1441851736 293476 100% /mnt/external
Thank you !
The sparse file hypothesis
I just explored the sparse file possibility, and this does not seem to be the issue.
To discover sparse files in the source, I used a script from here :
The Wikipedia article on sparse files explains how to distinguish between apparent and actual file sizes :
So having identified a sparse file on the source, I ran :
# du -s -B1 --apparent-size '/mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
# du -s -B1 '/mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
Compared with the same file on the destination :
# du -s -B1 --apparent-size '/mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
# du -s -B1 '/mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
Identical. So I would say that sparse files are preserved, so my problem does not arise from this.
In actual fact, it would seem that the problem was sparse files after all.
I had quite a bit of trouble determining that this was actually the case though, I ended up hacking together two scripts to solve my problem, and without the second I think I would not have been able to solve the issue without erasing the entire destination disk and starting anew.
I first tried a diff, to see what differed from the source to the destination :
Next, I made a script to determine if the backup files were of a different size from the source files (and to see what files were missing from the backup) :
So, I made a new script to compare the number of filesystem blocks used in the source and destination partitions :
So I modified the last script to delete the offending files from the backup, I did another rsync, and presto, now the source and the backup are just about the same size !
1/ If you use the above code, beware that it seems to have a few issues with symlinks.
2/ I really feel that all this was overly complex. Shouldn't rdiff default to handling sparse files, or shouldn't adding the "--sparse" switch replace "regular" files in the destination with sparse files (this may not be trivial to implement mind you). At least mention sparse files and the woes they can cause in the rdiff docs...
3/ The script executes in under five minutes, a lot quicker than a full diff...
4/ I tend to ramble... maybe nobody is interested in my problems, maybe googleing this thread could help someone one day.
I just ran an rsync operation and somehow my source directory which contains 118GB of files bloats up to 220GB after rsync is complete yet all the files look the same. I'm just starting my journey into this and I appreciate this post.
We just completed a quick test and deleting the existing backup data from the usb drive and re-rsyncing it from scratch fixes the problem.
|All times are GMT -5. The time now is 09:23 PM.|