Learn the DD command

rknichols · 04-29-2016, 09:38 AM

Quote:

Originally Posted by AwesomeMachine

Hi,

'sparse is not one of the 'dd' convert (conv) arguments. A sparse file is like a place marker on the disk. It is a file that at least some of it contains nothing.

To write a 45MB sparse file, try:

Code:

dd if=/dev/zero of=/home/sam/sparse.file count=45 bs=1M

But that way wastes disk space, because you have to make the file as big as it will ever need to be.

It wastes disk space because that is not a sparse file. All the blocks are allocated, and just happen to be filled with zeros.

Some versions of dd do support "sparse" as a conversion mode. When that is used, any obs-sized block that is entirely zeros will be skipped over with a seek() and not written, thus creating a sparse file. According to the changelog, the "sparse" conversion for dd was introduced in coreutils version 8.4 in August, 2013.

AwesomeMachine · 04-29-2016, 02:09 PM

Actually, sparse files were originally named such because of how they were backed up. The backup did not back up the entire file, only the part that was used. That was to save space on the backup media. Now there seems to be confusion regarding what a 'sparse' file actually 'is'. The younger generation refer to zero-filled files as regular files. But the older generation became accustomed to calling these 'sparse' files.

Wikipedia apparently sides with the older-generation definition. Although there is much contention.

I stand corrected on the conv=sparse issue. But tt only applies to file systems that understand sparse files. I haven't seen instances of it being used manually with linux. Although it was developed to, among other things, aid in the creation and manipulation of virtual HDDs.

AwesomeMachine · 04-29-2016, 02:19 PM

Quote:

Originally Posted by rayburn

I have just tried using dd to image a partition and then compress it using this command as in the first post of this thread:

Code:

dd if=/dev/sdb2 ibs=4096 | gzip > partition.image.gz conv=noerror

However, I found that the correct command should be this:

Code:

dd if=/dev/sdb2 ibs=4096 conv=noerror | gzip > partition.image.gz

The option for the dd command should of course be at the end of the dd command rather than at the end of the gzip command.

I mention this only to assist others who may use the command, and it is in no way a criticism of an outstanding guide to the dd command which I have referred to on many occasions. Thank you!

If the user wants the conv=noerror command, then it is as the quoted post indicates. I believe that if positioned after the gzip target it reports that conv=noerror cannot be found.

rknichols · 04-29-2016, 05:02 PM

Quote:

Originally Posted by AwesomeMachine

Actually, sparse files were originally named such because of how they were backed up. The backup did not back up the entire file, only the part that was used. That was to save space on the backup media. Now there seems to be confusion regarding what a 'sparse' file actually 'is'. The younger generation refer to zero-filled files as regular files. But the older generation became accustomed to calling these 'sparse' files.

Wikipedia apparently sides with the older-generation definition. Although there is much contention.

This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.

rayburn · 05-02-2016, 07:13 AM

Quote:

Originally Posted by rknichols

And that is still wrong. If there is any error reading the source, the rest of the stream will just be shifted over to fill in the unreadable gap, and the result will be a badly corrupted filesystem image. (I'm tempted to say "hopelessly corrupted," but careful forensic detective work could create a sane image.) You absolutely need the "sync" conversion too so that the missing block will be filled in with zeros.

Code:

dd if=/dev/sdb2 ibs=4096 conv=noerror,sync | gzip > partition.image.gz

Thank you, that is most helpful, it is always good to try and understand the reasoning behind a command! I will correct my copy.

jpollard · 05-02-2016, 07:29 AM

Quote:

Originally Posted by rknichols

This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.

An even older example is spelling dictionaries. The original spell utility stripped prefix/suffix from a word then hashed the result - and if that corresponding bit was set, then it was spelled properly. The hash was a very carefully designed one that almost never had a collision - thus the file had to be very large (up to 2GB at the time), but the contents were very sparse. Thus sparse files worked very well (though you couldn't copy the hash file...), and spell checking was very fast.

My first contact was UNIX v6... then UNIX System Vr2.

AwesomeMachine · 05-02-2016, 10:31 AM

Quote:

Originally Posted by rknichols

This is a Linux forum. In Linux and Unix, "sparse file" has a very specific meaning and refers to a file that does not have space allocated for all its data blocks in the filesystem. A file with blocks filled with zeros does not qualify, even though that is how a sparse file appears when read. A very old example in the history of Unix is /var/log/lastlog, which holds a sparse linear array of structures indexed by numeric UID.

Hi rk,

The past is fixed, It cannot be altered. Regardless of what you write, sparse files were still zero-filled files with a file system format written to them.

Before file systems understood the meta data-type sparse file, it wasn't even possible to write write them as we do now!

You have some problem with things outside your spectrum. Well, in this case you're incorrect. You have an idea of what you want a sparse file to be.

But there are plenty of systems in use today that are full of old sparse files: zero-filled files! They are still sparse files, even though your self-appointed definition does not include them.

rknichols · 05-02-2016, 11:22 AM

Quote:

Originally Posted by AwesomeMachine

But there are plenty of systems in use today that are full of old sparse files: zero-filled files! They are still sparse files, even though your self-appointed definition does not include them.

"Plenty of systems"? Sure, I can believe that. Just not Linux and Unix systems. Earlier you mentioned a Wikipedia article that considered zero-filled files to be sparse. I cannot see any way to read the Wikipedia article on Sparse Files that supports that interpretation. Perhaps you were referring to some other article.

AwesomeMachine · 05-02-2016, 12:05 PM

Certain files, called sparse files, have holes. A hole in a file is a section of the file's contents which was never written. The contents of a hole reads as zeros. In some sparse files the holes are represented by meta data, and the space is not yet allocated. GNU tar attempts to recognize the holes in a file, using `--sparse' (`-S'). This option, for any file using less disk space than would be expected from its length, causes tar to search the file for zero-fill, and to compress the empty content, and archive only the actual content.

rknichols · 05-02-2016, 03:19 PM

Yes, the hole reads as zeros. I see we are in complete agreement after all.

jpollard · 05-02-2016, 07:07 PM

Quote:

Originally Posted by rknichols

"Plenty of systems"? Sure, I can believe that. Just not Linux and Unix systems. Earlier you mentioned a Wikipedia article that considered zero-filled files to be sparse. I cannot see any way to read the Wikipedia article on Sparse Files that supports that interpretation. Perhaps you were referring to some other article.

Not that many actually. Linux native filesystems, yes. BSD native filesystems, yes. Microsoft NTFSv5 (since about Windows 2000) yes. others.... no.

Unix/unix like systems have had sparse files almost since the beginning of UNIX.

AwesomeMachine · 05-05-2016, 11:01 PM

I thought I had read that on wikipedia, but apparently I was wrong. I suppose there is no conclusive evidence either way. So, everyone can think whatever he/she wants. The prevailing definition now seems to be that meta-data sparse files are sparse files.

slac-in-the-box · 05-08-2016, 07:46 PM

I ended up making my sparse file with the dd's seek option, and my google compute engine image for slackware boots! I mentioned the sparse flag to "conv=" only because the man page does... and it is the man page on slackware linux: and it's wrong! We somehow got the manual page for a sparse-enabled dd... even though the dd on slackware has not this capability via a flag to conv=, but does have the capability via the seek option. I think the important characteristic of a sparse file is that the zeros don't occupy drive or tarball space until they are not zeros. Thanks for the explanations of sparse files and their history.

cuthbia · 06-16-2016, 07:14 PM

This is a truly useful and informative posting; I have saved it and over the next months I will be digesting it as much as possible. I notice some tricks (and commands)that I would not have thought of using: more grist to the mill!

I have been familiar with "dd" for over thirty years, since I first encountered Unix, but this post on the thread is a genuine eye-opener!

jpollard · 06-16-2016, 08:29 PM

The major issue with the "conv=sparse" is that it is non-standard. I believe it is available as a compile time option, thus the man page is not exactly wrong - it doesn't indicate it is an optional feature, nor that it is non-standard.

There are a few times when you don't want a sparse file - one is if you are creating a swap file. All the blocks in the file must be allocated as they will not be allocated during a write page action (I don't think the filesystem I/O code is used to keep the overhead down).