Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Both cp and rsync have options to make files sparse as they copy them. What I am finding is that they are not as effective at this as possible. I have a file that has a few data bytes non-zero in the first 512 bytes, and all binary zeros through the remainder of its 4194304 byte size. On a 4K ext4 filesystem, it occupies 4K allocated. Copied by cp --sparse=always, it occupies 32K. Copied by rsync -S it occupies 8K. If I truncate it to 512 bytes then truncate it back to 4194304 bytes, it occupies 4K, and the contents remain the same.
So I'm looking for something better than cp or rsync to make files sparse. I see no reason something can't go all the way, in this case to 4K. Or do I need to implement this myself?
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900
Rep:
Tinkster: try using it on non-sparse file obtained by dd of 1M zeros from /dev/zero. Original file is non-sparse, most methods discussed here will produce sparse results. cpio man page explicilty states it searches simply for zero blocks.
Tinkster: try using it on non-sparse file obtained by dd of 1M zeros from /dev/zero. Original file is non-sparse, most methods discussed here will produce sparse results. cpio man page explicilty states it searches simply for zero blocks.
I'm just wondering how/why the du utility doesn't get it right on the original.
To expand on what raskin said. If you have a sparse file and write some data somewhere in that file it allocates blocks for that data. If you were then to replace the data with zeroes, the blocks don't get unallocated, at least on ext3.
So your original file used to have some data in some areas, and that data has sinced been replaced with zeroes. When you do a sparse copy, if there are allocated blocks containing zeroes, it won't allocate blocks in the new copy. Hence "no wonder, it can find new blocks that were previously nonzero"
To expand on what raskin said. If you have a sparse file and write some data somewhere in that file it allocates blocks for that data. If you were then to replace the data with zeroes, the blocks don't get unallocated, at least on ext3.
So your original file used to have some data in some areas, and that data has sinced been replaced with zeroes. When you do a sparse copy, if there are allocated blocks containing zeroes, it won't allocate blocks in the new copy. Hence "no wonder, it can find new blocks that were previously nonzero"
Maybe there is an ioctl() that can tell the filesystem to specifically unallocate a range. If not, it would be nice to add one. There is such an ioctl() for devices that support discarding blocks, generally used for solid state devices with a wear leveling layer, and perhaps also used in virtual machines engines that operate with an underlying compacted device file. I wonder what this would do on loopback block device/files (ideally, doing a discard on the loopback block device should be passed back to the loopback file to unallocate).
Of course, this all depends on the underlying filesystem actually supporting sparse files and having added support for sparsifying blocks in existing files.
A function in the library layer could be added to try the discard/unallocate, if that fails because not implemented, just pwrite() zeros there instead.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.