Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
'sparse is not one of the 'dd' convert (conv) arguments. A sparse file is like a place marker on the disk. It is a file that at least some of it contains nothing.
But that way wastes disk space, because you have to make the file as big as it will ever need to be.
It wastes disk space because that is not a sparse file. All the blocks are allocated, and just happen to be filled with zeros.
Some versions of dddo support "sparse" as a conversion mode. When that is used, any obs-sized block that is entirely zeros will be skipped over with a seek() and not written, thus creating a sparse file. According to the changelog, the "sparse" conversion for dd was introduced in coreutils version 8.4 in August, 2013.
3 members found this post helpful.
Click here to see the post LQ members have rated as the most helpful post in this thread.
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
Original Poster
Rep:
Actually, sparse files were originally named such because of how they were backed up. The backup did not back up the entire file, only the part that was used. That was to save space on the backup media. Now there seems to be confusion regarding what a 'sparse' file actually 'is'. The younger generation refer to zero-filled files as regular files. But the older generation became accustomed to calling these 'sparse' files.
Wikipedia apparently sides with the older-generation definition. Although there is much contention.
I stand corrected on the conv=sparse issue. But tt only applies to file systems that understand sparse files. I haven't seen instances of it being used manually with linux. Although it was developed to, among other things, aid in the creation and manipulation of virtual HDDs.
The option for the dd command should of course be at the end of the dd command rather than at the end of the gzip command.
I mention this only to assist others who may use the command, and it is in no way a criticism of an outstanding guide to the dd command which I have referred to on many occasions. Thank you!
If the user wants the conv=noerror command, then it is as the quoted post indicates. I believe that if positioned after the gzip target it reports that conv=noerror cannot be found.
Actually, sparse files were originally named such because of how they were backed up. The backup did not back up the entire file, only the part that was used. That was to save space on the backup media. Now there seems to be confusion regarding what a 'sparse' file actually 'is'. The younger generation refer to zero-filled files as regular files. But the older generation became accustomed to calling these 'sparse' files.
Wikipedia apparently sides with the older-generation definition. Although there is much contention.
This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.
And that is still wrong. If there is any error reading the source, the rest of the stream will just be shifted over to fill in the unreadable gap, and the result will be a badly corrupted filesystem image. (I'm tempted to say "hopelessly corrupted," but careful forensic detective work could create a sane image.) You absolutely need the "sync" conversion too so that the missing block will be filled in with zeros.
This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.
An even older example is spelling dictionaries. The original spell utility stripped prefix/suffix from a word then hashed the result - and if that corresponding bit was set, then it was spelled properly. The hash was a very carefully designed one that almost never had a collision - thus the file had to be very large (up to 2GB at the time), but the contents were very sparse. Thus sparse files worked very well (though you couldn't copy the hash file...), and spell checking was very fast.
My first contact was UNIX v6... then UNIX System Vr2.
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
Original Poster
Rep:
sparse file
Quote:
Originally Posted by rknichols
This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.
Hi rk,
The past is fixed, It cannot be altered. Regardless of what you write, sparse files were still zero-filled files with a file system format written to them.
Before file systems understood the meta data-type sparse file, it wasn't even possible to write write them as we do now!
You have some problem with things outside your spectrum. Well, in this case you're incorrect. You have an idea of what you want a sparse file to be.
But there are plenty of systems in use today that are full of old sparse files: zero-filled files! They are still sparse files, even though your self-appointed definition does not include them.
But there are plenty of systems in use today that are full of old sparse files: zero-filled files! They are still sparse files, even though your self-appointed definition does not include them.
"Plenty of systems"? Sure, I can believe that. Just not Linux and Unix systems. Earlier you mentioned a Wikipedia article that considered zero-filled files to be sparse. I cannot see any way to read the Wikipedia article on Sparse Files that supports that interpretation. Perhaps you were referring to some other article.
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
Original Poster
Rep:
Certain files, called sparse files, have holes. A hole in a file is a section of the file's contents which was never written. The contents of a hole reads as zeros. In some sparse files the holes are represented by meta data, and the space is not yet allocated. GNU tar attempts to recognize the holes in a file, using `--sparse' (`-S'). This option, for any file using less disk space than would be expected from its length, causes tar to search the file for zero-fill, and to compress the empty content, and archive only the actual content.
"Plenty of systems"? Sure, I can believe that. Just not Linux and Unix systems. Earlier you mentioned a Wikipedia article that considered zero-filled files to be sparse. I cannot see any way to read the Wikipedia article on Sparse Files that supports that interpretation. Perhaps you were referring to some other article.
Not that many actually. Linux native filesystems, yes. BSD native filesystems, yes. Microsoft NTFSv5 (since about Windows 2000) yes. others.... no.
Unix/unix like systems have had sparse files almost since the beginning of UNIX.
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
Original Poster
Rep:
I thought I had read that on wikipedia, but apparently I was wrong. I suppose there is no conclusive evidence either way. So, everyone can think whatever he/she wants. The prevailing definition now seems to be that meta-data sparse files are sparse files.
I ended up making my sparse file with the dd's seek option, and my google compute engine image for slackware boots! I mentioned the sparse flag to "conv=" only because the man page does... and it is the man page on slackware linux: and it's wrong! We somehow got the manual page for a sparse-enabled dd... even though the dd on slackware has not this capability via a flag to conv=, but does have the capability via the seek option. I think the important characteristic of a sparse file is that the zeros don't occupy drive or tarball space until they are not zeros. Thanks for the explanations of sparse files and their history.
Distribution: Linux Mint Cinnamon 17.3 have multiboot with Mint Cinnamon 17.1 and Mint Cinnamon 16
Posts: 2
Rep:
This is a truly useful and informative posting; I have saved it and over the next months I will be digesting it as much as possible. I notice some tricks (and commands)that I would not have thought of using: more grist to the mill!
I have been familiar with "dd" for over thirty years, since I first encountered Unix, but this post on the thread is a genuine eye-opener!
The major issue with the "conv=sparse" is that it is non-standard. I believe it is available as a compile time option, thus the man page is not exactly wrong - it doesn't indicate it is an optional feature, nor that it is non-standard.
There are a few times when you don't want a sparse file - one is if you are creating a swap file. All the blocks in the file must be allocated as they will not be allocated during a write page action (I don't think the filesystem I/O code is used to keep the overhead down).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.