LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 08-30-2011, 12:43 PM   #1
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
copying files as sparse


Both cp and rsync have options to make files sparse as they copy them. What I am finding is that they are not as effective at this as possible. I have a file that has a few data bytes non-zero in the first 512 bytes, and all binary zeros through the remainder of its 4194304 byte size. On a 4K ext4 filesystem, it occupies 4K allocated. Copied by cp --sparse=always, it occupies 32K. Copied by rsync -S it occupies 8K. If I truncate it to 512 bytes then truncate it back to 4194304 bytes, it occupies 4K, and the contents remain the same.

So I'm looking for something better than cp or rsync to make files sparse. I see no reason something can't go all the way, in this case to 4K. Or do I need to implement this myself?

Code:
lorentz/root /home/root 269# ls -dl foo
-rw-r--r-- 1 root root 4194304 Aug 30 13:30 foo
lorentz/root /home/root 270# cp -pv --sparse=always foo foo-cp
`foo' -> `foo-cp'
lorentz/root /home/root 271# rsync -aSvW foo foo-rsync
sending incremental file list
foo

sent 4194883 bytes  received 31 bytes  8389828.00 bytes/sec
total size is 4194304  speedup is 1.00
lorentz/root /home/root 272# cat foo > foo-trunc
lorentz/root /home/root 273# truncate -s 512 foo-trunc
lorentz/root /home/root 274# truncate -s 4194304 foo-trunc
lorentz/root /home/root 275# md5sum foo foo-cp foo-rsync foo-trunc
7ebe5061c40a2236c28d041e541bf034  foo
7ebe5061c40a2236c28d041e541bf034  foo-cp
7ebe5061c40a2236c28d041e541bf034  foo-rsync
7ebe5061c40a2236c28d041e541bf034  foo-trunc
lorentz/root /home/root 276# du foo foo-cp foo-rsync foo-trunc
4	foo
32	foo-cp
8	foo-rsync
4	foo-trunc
lorentz/root /home/root 277#
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 08-30-2011, 02:00 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Code:
tar --sparse -c lastlog | tar --sparse -x -C /tmp
du -k /var/log/lastlog 
12      /var/log/lastlog
du -k /tmp/lastlog 
8       /tmp/lastlog
LOL - this seems to work better than expected ;D


Cheers,
Tink
 
Old 08-30-2011, 04:06 PM   #3
raskin
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900

Rep: Reputation: 69
echo foo | cpio --sparse -p /path/to/target/dir/ could work, too

Tinkster: no wonder, it can find new blocks that were previously nonzero.
 
Old 08-30-2011, 05:15 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by raskin View Post
echo foo | cpio --sparse -p /path/to/target/dir/ could work, too

Tinkster: no wonder, it can find new blocks that were previously nonzero.
Thinking about that ...



Cheers,
Tink

Last edited by Tinkster; 08-30-2011 at 05:17 PM.
 
Old 08-30-2011, 11:17 PM   #5
raskin
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: approximately NixOS (http://nixos.org)
Posts: 1,900

Rep: Reputation: 69
Tinkster: try using it on non-sparse file obtained by dd of 1M zeros from /dev/zero. Original file is non-sparse, most methods discussed here will produce sparse results. cpio man page explicilty states it searches simply for zero blocks.
 
Old 08-31-2011, 12:36 AM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by raskin View Post
Tinkster: try using it on non-sparse file obtained by dd of 1M zeros from /dev/zero. Original file is non-sparse, most methods discussed here will produce sparse results. cpio man page explicilty states it searches simply for zero blocks.
I'm just wondering how/why the du utility doesn't get it right on the original.


Cheers,
Tink
 
Old 08-31-2011, 05:16 AM   #7
phil.d.g
Senior Member
 
Registered: Oct 2004
Posts: 1,272

Rep: Reputation: 154Reputation: 154
To expand on what raskin said. If you have a sparse file and write some data somewhere in that file it allocates blocks for that data. If you were then to replace the data with zeroes, the blocks don't get unallocated, at least on ext3.

So your original file used to have some data in some areas, and that data has sinced been replaced with zeroes. When you do a sparse copy, if there are allocated blocks containing zeroes, it won't allocate blocks in the new copy. Hence "no wonder, it can find new blocks that were previously nonzero"
 
2 members found this post helpful.
Old 08-31-2011, 02:09 PM   #8
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by phil.d.g View Post
To expand on what raskin said. If you have a sparse file and write some data somewhere in that file it allocates blocks for that data. If you were then to replace the data with zeroes, the blocks don't get unallocated, at least on ext3.

So your original file used to have some data in some areas, and that data has sinced been replaced with zeroes. When you do a sparse copy, if there are allocated blocks containing zeroes, it won't allocate blocks in the new copy. Hence "no wonder, it can find new blocks that were previously nonzero"
Maybe there is an ioctl() that can tell the filesystem to specifically unallocate a range. If not, it would be nice to add one. There is such an ioctl() for devices that support discarding blocks, generally used for solid state devices with a wear leveling layer, and perhaps also used in virtual machines engines that operate with an underlying compacted device file. I wonder what this would do on loopback block device/files (ideally, doing a discard on the loopback block device should be passed back to the loopback file to unallocate).

Of course, this all depends on the underlying filesystem actually supporting sparse files and having added support for sparsifying blocks in existing files.

A function in the library layer could be added to try the discard/unallocate, if that fails because not implemented, just pwrite() zeros there instead.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Portable sparse defrag by moving files using dd ? H_TeXMeX_H Linux - Software 23 11-03-2011 08:33 AM
[solved] df shows inconsistent numbers - a lot of 'lost' space - how to find sparse files bizna Linux - Server 3 02-06-2011 12:11 AM
copy sparse files tincboy Linux - Newbie 14 07-07-2010 01:47 AM
MLDonkey/mlnet: creating sparse files - undesirable alexander_bosakov Linux - Software 0 02-26-2008 04:10 PM
reserving space on the disk for sparse files madhukirant Programming 1 08-17-2005 07:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration