open with delayed creation

Skaperen · 05-17-2020, 03:40 PM

Quote:

Originally Posted by dugan

Well I'm sorry to hear that you haven't written C since the 80s, but...

i wrote one last month.

Quote:

Originally Posted by dugan

Is your current target platform one where memory constraints would dictate an approach like this?

operability constraints would. if you had to collect a day-long stream of messages from a network connection, would you collect it in memory and write it all at the end of the day? this may be an exception. what is to be gained by your approach?

Quote:

Originally Posted by dugan

Would your current target platform perform better with a few large buffers or many small ones? Hint: does the target platform have a CPU cache?

today's operating systems do equally well in both situations. caching takes care of this.

Quote:

Originally Posted by dugan

Also keep in mind that on some platforms, sporadically writing many small files would create more disk fragmentation than writing a single large file in one operation.

how did this become a many small files situation? i didn't bring up this topic. if you want to discuss it, start a new thread and PM me the URL.

Quote:

Originally Posted by dugan

Finally, if you're keeping the file open and locked throughout the lifetime of the application, well, that's not the way you're supposed to do it on *nix. Locking is supposed to be done only when necessary.

how did this become a file lock situation? there is no need to lock a file to write it sequentially. there are often many alternatives, depending on the application that typically gets implemented with locking.

Skaperen · 05-17-2020, 03:52 PM

Quote:

Originally Posted by MadeInGermany

In the application you can have a framework that opens the file at the first writing. For example awk:

Code:

awk -F: '$7~/\/zsh/ { print $1 > "outfile" }' /etc/passwd

This writes all users with a zsh login shell to outfile.
If no user got a zsh then outfile is not created at all.

awk is designed to do it this way. C isn't. Even Python isn't. when the tool lets you output to a not-yet-opened reference, such as a string with the name of the target file, the issue is solved. i have done many projects in awk, but many others need way beyond its capability. but, at least awk is not buffering the whole contents in memory before actually asking the system to write some of it.

rnturn · 05-17-2020, 03:56 PM

Quote:

Originally Posted by Skaperen

i have never written any code in C that builds the whole file content in memory before writing it. but i have had to do that a few times in Python. my C programs always did a appropriate write when the data to be written was available.

I remember the days when I had to demonstrate to users that it was actually not faster to process data by reading in an entire file into memory and then write it out all at once after processing the data. Especially on a multi-user system where, because of quotas, you simply will not typically have permission to have access to all available memory. Displaying wall clock time and the post-execution resource utilization statistics accumulated during program execution was the eye-opener for many (this was, I think, easier to do then than it is today on Linux). Sucking an entire dataset into memory cost big time in paging activity and increased run time immensely. And with many users' taking the advice to not allocate swap space, a large dataset will likely have you reaching for the Big Red Switch. (There are times when I think that anyone writing software should be forced to write code on a small memory system for a while---it forces you to think about the problem at hand a bit and work within a finite set of resources.)

Just my $0.02.

Later...

dogpatch · 06-01-2020, 10:47 AM

You seem to be saying you never want an empty file. If you want to make sure the file has at least some data in it, then follow the advice above: create the file by opening it in 'w'rite mode when you have some data. Immediately after opening / creating the file, you write the data, and close the file. Subsequent writes, open in 'a'ppend mode, write the data, close the file.

If you want to make sure the file is complete before you create it, then do the above, but with a temporary file name. When it is complete, rename the file to its permanent name.

Skaperen · 06-01-2020, 02:48 PM

i want to replicate a behavior i have seen in a different OS which is equivalent to the file won't exist if the system is shutdown between open and the first write AND if another process attempts to open the same file name for exclusive creation after that first open by the first process before its first write, then it would fail. that kernel logic seems to be that it locks the name when the first successful open is done and actually creates the file in the file system when the first write is done.

dogpatch · 06-01-2020, 04:03 PM

In that case, create and lock the file. Then create a temporary filename, and right after writing data to it, unlock the permanent file and rename the temp file to its permanent name.

SoftSprocket · 06-01-2020, 04:03 PM

If you call open, immediately followed by unlink the file is deleted when it is closed i.e. close is called or the program exits.

You might be able to use that behavior to achieve what you're after.

Skaperen · 06-01-2020, 06:19 PM

@dogpatch you mean have 2 files, one to lock the name (created first), and the other to write to and eventually replace the first? hmmm, that might work.

Skaperen · 06-01-2020, 06:26 PM

@SoftSprocket the unlink makes that name go away. while it's gone, some other process might open a file at that name. and something needs to be done to bring the file name back and the only ways to do that either create an all new empty file or need to reference an existing name.

SoftSprocket · 06-02-2020, 08:40 AM

Ah, quite so - I should have tested first. As my penance I scribbled something that does work:

Code:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>

char* reserve_fn;
void delete_on_close () {
	printf ("atexit %s\n", reserve_fn);
	if (reserve_fn != NULL) {
		if (unlink (reserve_fn) > 0) {
			perror ("unlink");
		}
	}
}

int main () {
	reserve_fn = "reserved.txt";

	int fd = creat (reserve_fn, O_RDWR);

	if (fd < 0) {
		perror ("open");
		exit (EXIT_FAILURE);
	}
	fchmod (fd, S_IRUSR | S_IWUSR);
	
	signal(SIGINT, delete_on_close);
	atexit (delete_on_close);

	printf ("sleeping\n");
	sleep (10);

	printf ("Using fd\n");

	reserve_fn = NULL;	

	write (fd, &fd, sizeof fd);

	close (fd);

	return 0;
}

Exit on interupt or when exit is called and the file is removed - unless the reserve_fn is NULL. Not exactly elegant but it does work.

Skaperen · 06-02-2020, 02:54 PM

what's the point of delete_on_vlose() when the final desire is to have the fire? the whole point is to have the file never be empty. you need a way to block the name from being used while it doesn't exist.

SoftSprocket · 06-02-2020, 02:59 PM

Your original ask was to prevent the name from being used but delete the file if it isn't used. That is what the example will do. While the program is in the sleep you can interrupt the program and the file will be deleted. However if you don't the file will be preserved. Presuming real work going on before the write, an exit (say after a failed system call) will also remove the file. Did you try the program? If you don't interrupt during the sleep the file will be there.

Skaperen · 06-04-2020, 01:49 AM

what the kernel could do for this kind of open is have an internal set of which object names are open with delayed creation. this can be included in the test for "already exists" so another process cannot open it. but nothing is saved on disk, yet. then when the first write happens, the creation is completed. things like permission need to be tested at open time. the write could fail due to out of space. the file may be created empty if its parent directory does not need another block for it. this might need to be suppressed or reverted if it really is intended to avoid an empty file.

decades ago i worked on IBM mainframes. empty files were "impossible". i never was concerned about why or how enough to investigate (i did have the source code).

SoftSprocket · 06-04-2020, 07:51 AM

You might be able to use FUSE to do something like that. https://www.kernel.org/doc/html/late...tems/fuse.html. In addition to being used to write a filesystem in user space I thnk it can also be used as a filter of some sort.

I got my start on IBM mainframes. My main memory was the frustration of waiting while what I typed traveled down tie lines and back. Everything went through a switch that managed priorities and I as the low rung. Mercifully they put me on PCs where, even with 9 inch floppies and no hard drives, the experience was far more satisfying.

Geist · 06-05-2020, 05:04 PM

Even if you don't have access to this library, you have access to your own code.
So if the library wants to write a file immediately, then simply don't call the library functions until you want to.
That should work, right?

Keep your own copy of whatever you want to ultimately write in memory and only call the library function when you want to.