![]() |
copying files as sparse
Both cp and rsync have options to make files sparse as they copy them. What I am finding is that they are not as effective at this as possible. I have a file that has a few data bytes non-zero in the first 512 bytes, and all binary zeros through the remainder of its 4194304 byte size. On a 4K ext4 filesystem, it occupies 4K allocated. Copied by cp --sparse=always, it occupies 32K. Copied by rsync -S it occupies 8K. If I truncate it to 512 bytes then truncate it back to 4194304 bytes, it occupies 4K, and the contents remain the same.
So I'm looking for something better than cp or rsync to make files sparse. I see no reason something can't go all the way, in this case to 4K. Or do I need to implement this myself? Code:
lorentz/root /home/root 269# ls -dl foo |
Code:
tar --sparse -c lastlog | tar --sparse -x -C /tmpCheers, Tink |
echo foo | cpio --sparse -p /path/to/target/dir/ could work, too
Tinkster: no wonder, it can find new blocks that were previously nonzero. |
Quote:
Cheers, Tink |
Tinkster: try using it on non-sparse file obtained by dd of 1M zeros from /dev/zero. Original file is non-sparse, most methods discussed here will produce sparse results. cpio man page explicilty states it searches simply for zero blocks.
|
Quote:
Cheers, Tink |
To expand on what raskin said. If you have a sparse file and write some data somewhere in that file it allocates blocks for that data. If you were then to replace the data with zeroes, the blocks don't get unallocated, at least on ext3.
So your original file used to have some data in some areas, and that data has sinced been replaced with zeroes. When you do a sparse copy, if there are allocated blocks containing zeroes, it won't allocate blocks in the new copy. Hence "no wonder, it can find new blocks that were previously nonzero" |
Quote:
Of course, this all depends on the underlying filesystem actually supporting sparse files and having added support for sparsifying blocks in existing files. A function in the library layer could be added to try the discard/unallocate, if that fails because not implemented, just pwrite() zeros there instead. |
| All times are GMT -5. The time now is 11:12 AM. |