LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 02-09-2010, 04:44 AM   #1
Karderio
LQ Newbie
 
Registered: May 2006
Posts: 4

Rep: Reputation: 1
Files seem to take up more space in destination after rsync copy


I have recently purchased an external hard drive in order to backup my home partition. In my PC I have a "1.5T" drive with several partitions on it, containing OSes and the home partition. The home partition is 1.3T according to df, the external drive contains one partition that spans the entire disk,df reports it as 1.4T in size. Both partitions are ext3.

When I use rsync to copy files from the home partition to the external partition, the external disk becomes full, despite the destination - supposedly - being larger than the source. I don't understand why copying files from one partition to a slightly bigger partition should need more space than on the source partition. Does anyone know what is happening ?


Details :

I created the partition on the external drive with gparted; gparted reported it the already have several gigabytes in used space immediately after the partitions creation - I thought at the time that this must be normal.

The home partition contains many files of all sorts, including lots of big audio and video files. If you are wondering, for all my important files this external disk is only secondary backup, as they are also backed up to the "internet".

These are the mount points :
/mnt/tmp/ : home partition, /dev/sdb6
/mnt/external/ : external partition, /dev/sdc1

I used rsync to copy the files, I know there are more efficient ways to do this, but I wanted to use the same command that I will subsequently run to sync the backup.
rsync -av --progress --stats --recursive --perms --links --delete /mnt/tmp/ /mnt/external/

Next I tried adding the --sparse switch, as I was wondering if the problem may come form sparse files. I don't know however if rsync would go back and shrink the sparse file by just adding the switch and executing the command. I also added --one-file-system, for good measure. Here is what I ran next :
rsync -av --progress --stats --sparse --one-file-system --recursive --perms --links --delete /mnt/tmp/ /mnt/external/

I tried an fsck on the home partition :
fsck -f /dev/sdb6

This is the output from the last rsync :
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "abcd.avi": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.6]
rsync: connection unexpectedly closed (27886 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

Looking at the destination after a partial copy seems to indicate that the problem is not symbolic links being "expanded". I have not checked the source filesystem for sparse files, nor the destination to see if these files could be larger there, as this does not seem trivial.

Here is some additional info :

$ df /mnt/tmp/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb6 1415342836 1414173740 369096 100% /mnt/tmp

$ df /mnt/external/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc1 1442145212 1441851736 293476 100% /mnt/external


Thank you !
 
Old 02-09-2010, 05:06 AM   #2
Karderio
LQ Newbie
 
Registered: May 2006
Posts: 4

Original Poster
Rep: Reputation: 1
The sparse file hypothesis

I just explored the sparse file possibility, and this does not seem to be the issue.

To discover sparse files in the source, I used a script from here :
http://forums13.itrc.hp.com/service/...readId=1065891

The Wikipedia article on sparse files explains how to distinguish between apparent and actual file sizes :
http://en.wikipedia.org/wiki/Sparse_file

So having identified a sparse file on the source, I ran :
# du -s -B1 --apparent-size '/mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
63475 /mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat
# du -s -B1 '/mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
69632 /mnt/tmp/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat

Compared with the same file on the destination :
# du -s -B1 --apparent-size '/mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
63475 /mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat
# du -s -B1 '/mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat'
69632 /mnt/external/chris/.openoffice.org/3/user/registry/cache/org.openoffice.Office.UI.WriterCommands.dat

Identical. So I would say that sparse files are preserved, so my problem does not arise from this.
 
Old 02-09-2010, 05:07 PM   #3
Karderio
LQ Newbie
 
Registered: May 2006
Posts: 4

Original Poster
Rep: Reputation: 1
Solved...

In actual fact, it would seem that the problem was sparse files after all.

I had quite a bit of trouble determining that this was actually the case though, I ended up hacking together two scripts to solve my problem, and without the second I think I would not have been able to solve the issue without erasing the entire destination disk and starting anew.

I first tried a diff, to see what differed from the source to the destination :
Code:
diff -rq /mnt/tmp/ /mnt/external/
Let it be said that a diff on more than a terabyte of data takes a very long time, I stopped this after about five hours.

Next, I made a script to determine if the backup files were of a different size from the source files (and to see what files were missing from the backup) :

Code:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import os
from os.path import join, getsize, exists

path1 = "/mnt/tmp/"
path2 = "/mnt/external/"

for root, dirs, files in os.walk(path1):
	for file in files:
		mirror_path = join(path2, root[len(path1):], file)
		file_path = join(root, file)
		
		if not exists(mirror_path):
			print(file_path + " exists.")
			print(mirror_path + " absent.")
		else:
			if not getsize(file_path) == getsize(mirror_path):
				print(file_path + " size : " + str(getsize(file_path)))
				print(file_path + " size : " + str(getsize(mirror_path)))
It seemed that the files all had the same size from the source to the destination, and that there were just a few missing, as there was no space for them. I next inverted path1 and path2 to check that there were no extra files in the backup - there weren't.

So, I made a new script to compare the number of filesystem blocks used in the source and destination partitions :

Code:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import os
from os.path import join, getsize, exists

path1 = "/mnt/tmp/"
path2 = "/mnt/external/"

for root, dirs, files in os.walk(path1):
	for file in files:
		mirror_path = join(path2, root[len(path1):], file)
		file_path = join(root, file)
		
		if exists(mirror_path):
			if not os.stat(file_path).st_blocks == os.stat(mirror_path).st_blocks:
				print(file_path + " size : " + str(os.stat(file_path).st_blocks))
				print(mirror_path + " size : " + str(os.stat(mirror_path).st_blocks))
It turns out that some files used up a lot more blocks in the backup ! Seems some files were sparse in the source, but not in the destination :-/

So I modified the last script to delete the offending files from the backup, I did another rsync, and presto, now the source and the backup are just about the same size !

Remarks :

1/ If you use the above code, beware that it seems to have a few issues with symlinks.

2/ I really feel that all this was overly complex. Shouldn't rdiff default to handling sparse files, or shouldn't adding the "--sparse" switch replace "regular" files in the destination with sparse files (this may not be trivial to implement mind you). At least mention sparse files and the woes they can cause in the rdiff docs...

3/ The script executes in under five minutes, a lot quicker than a full diff...

4/ I tend to ramble... maybe nobody is interested in my problems, maybe googleing this thread could help someone one day.
 
1 members found this post helpful.
Old 02-24-2012, 04:00 PM   #4
thund3rstruck
Member
 
Registered: Nov 2005
Location: East Coast, USA
Distribution: Fedora 18, Slackware64 13.37, Windows 7/8
Posts: 346

Rep: Reputation: 38
Quote:
Originally Posted by Karderio View Post
I tend to ramble... maybe nobody is interested in my problems, maybe googleing this thread could help someone one day.
Umm... this post is outstanding.

I just ran an rsync operation and somehow my source directory which contains 118GB of files bloats up to 220GB after rsync is complete yet all the files look the same. I'm just starting my journey into this and I appreciate this post.
 
Old 02-24-2012, 04:11 PM   #5
suicidaleggroll
Senior Member
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 2,603

Rep: Reputation: 940Reputation: 940Reputation: 940Reputation: 940Reputation: 940Reputation: 940Reputation: 940Reputation: 940
Quote:
Originally Posted by thund3rstruck View Post
Umm... this post is outstanding.

I just ran an rsync operation and somehow my source directory which contains 118GB of files bloats up to 220GB after rsync is complete yet all the files look the same. I'm just starting my journey into this and I appreciate this post.
Any chance your source directory contains a bunch of sym or hard links? That's the most common reason a copy blows up like that for me.
 
Old 02-24-2012, 08:30 PM   #6
thund3rstruck
Member
 
Registered: Nov 2005
Location: East Coast, USA
Distribution: Fedora 18, Slackware64 13.37, Windows 7/8
Posts: 346

Rep: Reputation: 38
Quote:
Originally Posted by suicidaleggroll View Post
Any chance your source directory contains a bunch of sym or hard links? That's the most common reason a copy blows up like that for me.
Actually it's a long story. Windows server failed but we had backups of all the data on a USB drive (formatted in NTFS) so we restored all that data to an ext3 Linux/Samba server. Then we wanted to resume backups using the existing NTFS drive we restored from and that's the one where rsync is doubling the file sizes.

We just completed a quick test and deleting the existing backup data from the usb drive and re-rsyncing it from scratch fixes the problem.
 
  


Reply

Tags
backup, rsync


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Rsync set permissions destination Abstractt Linux - General 3 09-30-2010 02:05 AM
Copy files from various locations to a single destination folder deostroll Linux - Newbie 5 05-13-2009 06:52 AM
Changing permissions of destination directory after rsync cliff123 Linux - General 1 12-26-2007 03:09 AM
rsync syntax to skip directory, but copy select files.. tnicol Linux - Software 3 07-24-2007 05:40 AM
sudo rsync -uvrlpot doesn't copy some files xpucto Linux - Newbie 1 01-15-2007 06:56 AM


All times are GMT -5. The time now is 07:27 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration