Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
/* gcc -o sparse_demo_test sparse_demo_test.c */
#define neq !=
#define eq ==
#define DOOPS 20
#define OKDOKEY 0
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
main()
{
int f2;
off_t status;
f2 = open(
"sparse_create_test.dat",
(O_CREAT | O_RDWR | O_TRUNC),
(S_IRUSR | S_IWUSR)
);
/* took out O_WRONLY, but O_RDWR does not work any better */
if (-1 eq f2) {
fprintf(stderr, "file create error\n");
return(DOOPS);
}
status = lseek(f2, 1073741824, SEEK_CUR); /* this is not working. WHY? */
if ((off_t) -1 eq status) {
fprintf(stderr, "lseek error\n");
}
if (close(f2)) {
fprintf(stderr, "error closing output file\n");
exit(DOOPS);
}
exit(OKDOKEY);
}
/* actual end of this file */
The idea is that this program, when run, writes a sparse file that appears to be 1 gig of zeroes, yet takes up almost no room on the file system.
What actually happens is that a zero-length file is written. D'oh!
I'm running on an ext4 file system, which I know handles sparse files (have used dd and fallocate to verify this). All my googling indicates that, in a .c program, the lseek is the *only * way to write a sparse "hole" to a file.
So what am I overlooking? I have no idea. Can anyone assist? Thank you.
You never wrote anything at that offset. If you are not going to write something to the file, you need to call ftruncate(2) to set a size in the inode.
You never wrote anything at that offset. If you are not going to write something to the file, you need to call ftruncate(2) to set a size in the inode.
Oh, I see that now!
If I write even just *one* byte after the lseek, the file expands to the proper size. I mean, "ls -l" reports the "full" size and du the small space actually occupied by such a sparse file.
ftruncate is a more elegant solution (not having to write that kludgy extra byte)
Thank you very much.
edit: confirmed. rewrote my program to use the ftruncate after the lseek. Works!
Last edited by jr_bob_dobbs; 04-18-2017 at 01:36 PM.
Although "sparse files" are tempting, you should be mindful of just how the underlying filesystem implements them. They might not do so efficiently. Not at all.
It's very hard to beat an SQLite database file for many such situations. (Just be sure to use transactions, so that SQLite will do "lazy writes.")
Yeah, fragmentation when the hole gets (partially) filled later on. Still, an archiver ought to know about sparse files so that some zero k sparse file doesn't balloon out to 100 gig on restore.
Yeah, fragmentation when the hole gets (partially) filled later on. Still, an archiver ought to know about sparse files so that some zero k sparse file doesn't balloon out to 100 gig on restore.
Don't make any such assumption about the behavior of an archiver.
Seriously, I would encourage you to reconsider the technical wisdom of "deliberately-sparse files." That smacks of a use-case that would be better served by some kind of database or indexed-file structure. I would form my argument as follows:
It is, in fact, the key space that is "sparse."
Key-distribution should have no bearing on the physical layout. On physical disks, this will greatly increase the tendency of "seek time" (the slowest operation an HDA (Head/Disk Assembly) can do ...), and destroy the usefulness of caching.
There is "a data structure": the data-structures of the underlying file system. But, these data structures are designed only to store files. Although provisions are made for sparse files, this is not their design focus.
We are no longer in the days of mainframe MVS®-yore, where when we "allocated a dataset" we got a physically contiguous block of "DASD cylinders" that we knew would be adjacent. We really don't know – and, can't control – where the "sparse" records actually are.
Thus, the design could be severely impacted by the differences between the conceptual view of the arrangement, and the possibly-entirely-different reality.
Whereas, very well-known indexed-file structures ... even VSAM (aka "NoSQL") ... are engineered for this use-case. The "Sqlite" project put this idea "on steroids," provided that you remember to use transactions to get lazy-writes. (The programmers on that project are Wizards.)
#undef SOAPBOX
Last edited by sundialsvcs; 04-27-2017 at 10:15 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.