saving (overwriting) file properly

hydraMax · 05-28-2012, 02:09 PM

I feel embarrassed to ask such a basic question, but what is the best strategy for saving (overwriting) files containing user data? What I mean is, should I write the buffer data to a temporary file first, and then move it onto the actual file? (I am imagining it would be problematic if I wrote the buffer data directly to the user file, there was some i/o error half way through, and the only disk copy of the data was left in a corrupt state.) And are there any relevant conventions related to this? (E.g., name and location of the temporary file.)

I'm programming with C, so feel free to refer to relevant system functions.

chrism01 · 05-28-2012, 09:15 PM

You prob want to read this page https://www.gnu.org/software/libc/ma...ary-Files.html and then use mv to overwrite.
iirc, mv is supposed to be an atomic operation.

(Its been a while since I did C btw)

Nominal Animal · 05-28-2012, 11:47 PM

Quote:

Originally Posted by hydraMax

should I write the buffer data to a temporary file first, and then move it onto the actual file?

There are two approaches to safely replacing an existing file:

Copy the contents to a temporary file first, then truncate it and write the new contents into it. Use fsync() to make sure the contents are on disk before closing the file, if the file should survive a sudden power loss intact.
Create a temporary file in the same directory, put the new contents in the temporary file (on the same filesystem; in the same directory in practice), using fsync() to make sure the contents are on disk if the file should survive a sudden power loss intact. Finally, rename the temporary file over the target file.

The first approach retains the owner, group, mode (except for setuid/setgid bits), extended attributes, SELINUX context, and POSIX ACLs of the file. For the second approach, you can try to set them according to the first file, but it is not guaranteed to succeed, and especially extended attributes and POSIX ACLs need special handling to copy between files (so normally one only retains the mode, or at most, a suitable subset of the metadata).

The second one is atomic, because renaming a file using rename() replacing an existing file on the same (local) filesystem is atomic. It does require that the temporary file is on the same filesystem as the target file, but putting the temporary file in the same directory should be enough. It is not guaranteed to work, if bind mounts or overlay filesystems are used.
Even on NFS, the replacement is atomic in the sense that each system will see either the old or the new contents, never a mix. It is not atomic in the sense that there is a small window after the replacement, during which other hosts would open the "old" file instead, I believe.

The first one can be made atomic in Linux on local filesystems, by taking a write lease on it using fcntl(). The call only succeeds when the file is not open by other processes. If any other process tries to open the file, the lease owner is sent a signal, and has /proc/sys/fs/lease-break-time seconds (45 by default) to release the lease before the opening will proceed. (It will proceed, as you cannot stop the other process from opening the file. Even removing the file will not affect that; in a sense, the file is already open, the open() just haven't returned to the process yet from kernel. But, you should have ample time to replace the contents, or truncate the file to zero bytes.)

If the NFS server and all clients have correctly configured the lock manager, then file leases should work correctly even on NFSv4 filesystems. I haven't checked that, though: most web hosting environments I've tried that use NFS do not have correctly configured lock managers in my experience; so neither leases or file locks work reliably. (Talk about annoying..)

If the target file is bind-mounted, or under an overlay filesystem, the second approach will not work because the target file and temporary file are on different filesystems. You may have to implement the first approach as a workaround anyway.

So, the approach you choose is completely dependent on what kind of situation you have at hand.

It might help to know that first one is very often used by editors (vim, emacs, nano, sed -i), and the second one is often used for files created by the application itself (log files, data files, GUI applications) since they are unlikely to have any user-applied metadata anyway. The second approach is much more efficient, since it only writes the new data to disk; this is very important if the files are large, or if there is heavy I/O. (The first approach basically needs to read and write the original file contents, then write the new contents, to disk; that's a lot of I/O.)

Let me know if you want example C99 code for either case. I hope this helps,