LinuxQuestions.org - [SOLVED] du output differences

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - du output differences (https://www.linuxquestions.org/questions/linux-general-1/du-output-differences-4175440250/)

du output differences

Hello,

I have just copied a directory from one system to the other and I am using 'du' to keep track of the progress.

There is only one issue.

'du -s' in the centos4 system tells me the directory (original) size is 145952 . While the same command on a RHEL5 system tells me the size of the same directory (copy) is 382404.

When searching for the solution I have learned that this could be caused by the block size that is used during saving.

Is there a way to determine the size of the directory on both systems?

Quote:

Originally Posted by schuurs (Post 4843697)

Yes,This is Good Questions ever, Me too Faced this issue .., I am also waiting for Answer for this issue ..
Thanks ..

Quote:

Originally Posted by schuurs (Post 4843697)

I have just copied a directory from one system to the other and I am using 'du' to keep track of the progress.

There is only one issue.

'du -s' in the centos4 system tells me the directory (original) size is 145952 . While the same command on a RHEL5 system tells me the size of the same directory (copy) is 382404.

When searching for the solution I have learned that this could be caused by the block size that is used during saving.

That is almost correct: the block size is fixed, not determined during the action but set when creating the file-system. But you might have meant that in the first place ;)

Quote:

Is there a way to determine the size of the directory on both systems?

You do need to divine what you mean by size.

The numbers shown are the actual sizes that the directory and all the files and subdirectories use when stored on disk. To check the used block size as a normal user:

Code:

# do this on both boxes

$ stat /etc/passwd

  File: `/etc/passwd'

  Size: 1804            Blocks: 8          IO Block: 4096  regular file

Device: fe01h/65025d    Inode: 8625        Links: 1

Access: (0644/-rw-r--r--)  Uid: (    0/    root)  Gid: (    0/    root)

Access: 2012-12-05 17:43:25.091005672 +0100

Modify: 2012-08-28 15:24:06.461519685 +0200

Change: 2012-08-28 15:24:06.475583171 +0200

The block size increases/doubles (out-of-the-box) when disks/partitions get larger and larger.

This could have a negative impact if you are using lots of small files. Simplified: A file that is 1 byte needs one block to be stored (4096 bytes in the above output), "wasting" 4095 bytes......

If you want to know the file size (as shown by ls -l for example) for all the files you need to calculate them yourself (write a one-liner/script). You do need to be aware that if that total shows 0.9 Gb and you have a 1Gb USB stick, they will not fit on that USB stick!

Have a look at this (show "real" size for current directory):

Code:

ls -l | awk 'BEGIN{ cntr=0 } { cntr=cntr+$5 } END { print "Total size: ", cntr, "("cntr/1024**2, "Mb)" }'

@bala.linuxtech: Tip for the future: If you see a thread that has no answers, but are interested in the topic just use the thread tools to subscribe to the thread. If you can't contribute to the topic please don't post. With posting on a thread with no answers you take that thread of the Zero Reply List. Threads on the Zero Reply List will automatically be bumped twice to increase the chances for a helpful answer.

It could be the block size, but usually it won't cause that large a difference unless you're dealing with a lot of tiny files. Another possibility is that the source was full of links (hard or symbolic), which were then copied over as separate files to the new system.

I used scp to copy the files from the CentOS to the RHEL system.
I have changed it to rsync, since scp doesn't preserve symbolic links.

Code:

rsync -rlW --whole-file <source dir> <user>@<node>:<destination>

'du' still gives a difference in disk usage.
CentOS (145940), RHEL (143334).
It is much better, but not jet ideal.

Quote:

Have a look at this (show "real" size for current directory):

Code:

ls -l | awk 'BEGIN{ cntr=0 } { cntr=cntr+$5 } END { print "Total size: ", cntr, "("cntr/1024**2, "Mb)" }'

I tried this and it gives me the following result.

CentOS

Code:

ls -lR <dir> | awk 'BEGIN{ cntr=0 } { cntr=cntr+$5 } END { print "Total size: ", cntr, "("cntr/1024**2, "MB)" }'

Totalsize: 146122692 (139.535 MB)

RHEL

Code:

ls -lR <dir> | awk 'BEGIN{ cntr=0 } { cntr=cntr+$5 } END { print "Total size: ", cntr, "("cntr/1024**2, "MB)" }'

Totalsize: 146059684 (139.293 MB)

I thank you all for your input.

Quote:

Originally Posted by schuurs (Post 4844348)

I used scp to copy the files from the CentOS to the RHEL system.
I have changed it to rsync, since scp doesn't preserve symbolic links.

Code:

rsync -rlW --whole-file <source dir> <user>@<node>:<destination>

'du' still gives a difference in disk usage.
CentOS (145940), RHEL (143334).
It is much better, but not jet ideal.

That is small enough that it could be realistically blamed on filesystem/blocksize differences. If you want to be sure, add -H to your rsync command to preserve hard links as well. Also, -W is the same thing as --whole-file, no reason to have both of them in your command.

You can also have some (usually small) differences in the sizes of the directory files themselves. Once a directory has expanded to accommodate a large number of files, that space is never released until the directory is deleted entirely. When you copy such a directory, by whatever means, the newly created directory at the destination will have no more blocks than what is needed for the files it currently contains.

That would mean that it isn't possible to do a one on one size check when copying?

Quote:

Originally Posted by schuurs (Post 4844643)

That would mean that it isn't possible to do a one on one size check when copying?

Size checks are not sufficient anyways. The size wouldn't change if the data gets corrupted during the copy (for example due to faulty memory), so a size check will not tell you if the copy is successful. You have to use checksums for that.