Quote:
Originally Posted by hydraMax
Looking into my code some more, it seems that the problem wasn't that the file contents pointed to by the stream aren't changing (although that may still be a problem), but rather that my program only knows that it has changed when it detects that the mtime (from stat()) has changed.
|
Ah. Why not use the
inotify interface to detect the
CLOSE_WRITE ,
MOVE_SELF and
DELETE_SELF events for that file? The first requires a re-read, the others a reopen (since someone has replaced the file) followed by a re-read. Note that if the editor removes the original file first, instead of the recommended mechanism of renaming the new file over the old file, there might be a short while the file does not exist; some kind of a retry mechanism for the reopening might be needed.
There is also a small delay in inotify reporting the event. The delay effectively guarantees all processes should see the new state, so in your case the delay might be useful. But, if the changes occur at a high frequency, the delay might become a bottleneck.
Quote:
Originally Posted by hydraMax
For some reason, the mtime of the open FILE stream does change (after external editing) under the xfs file system, but not under the ext3 file system.
|
I think that's related to the writeback wonkiness on ext3 I mentioned.
Quote:
Originally Posted by hydraMax
But I think the same principles as you mentioned above still apply. So, does this mean I have to reopen the FILE stream every time I want to see if a file has been modified?
|
No, not really.
fflush() I would do prior to the reread, though.
The window between the changes not being visible to other processes is very short, because it is more related to synchronization between CPU cores in kernel than anything else. The page cache is shared between all processes, so the synchronization happens very fast.
Using any kind of synchronization to detect the change point --
inotify, advisory file locks, file leases -- works, because the change notification is done
after the synchronization occurs.
Something like using a pipe or a socket to tell the other process the modifications have been completed is racy, because there is no guarantees that synchronization has occurred yet: the pipe or socket may be faster than the kernel synchronization. But, if you couple that with advisory file locks or leases, then the lock or lease correctly handles the short race window, resulting in trustable guarantees. The timescale involved here is very, very short; certainly less than a second.
Quote:
Originally Posted by hydraMax
I need to know whether or not the file has been modified before pretty much every operation that is done in the program. So that is a lot of opening and closing of FILE streams.
|
On my machine, repeated
fopen() on the same file takes about 2 µs: a single process can reopen the same file about half a million times per second. The cost of a
fopen() seems neglible to me, considering you work on a file modified by multiple unrelated processes.
There are a few things you might consider. For example, use a generation counter, perhaps at the start of the file. Just use
pread() to read it (without affecting the file position or confusing the standard I/O on the same file in any way) to see if the file contents have changed. Other processes modifying the file should only update the counter after the other modifications have been written to the file. The size of the counter is not that important, it is perfectly okay for it to wrap around; if your file is text, you can certainly use an identifier string instead. (The number of unique states tells how many writes a reader can miss while still being guaranteed to notice a new write.)
To avoid the possible race conditions in the use of the generation counter, have all processes take an advisory lock when accessing the file. Readers can access the generation counter at any time (remember, the locks are
advisory, not mandatory; they won't block any read or write operations); whenever they see a change, they apply for a read lock (
fcntl(fileno(handle), F_SETLKW, {F_RDLCK, SEEK_SET, 0, 0})), and keep it during the re-read (to make sure nobody makes any modifications to the file at the same time). After the re-read, they drop the lock. Writers will simply take a write lock (
fcntl(fileno(handle), F_SETLKW, {F_WRLCK, SEEK_SET, 0, 0})), modify the contents, preferably the generation counter last (to avoid the time the readers have to wait to obtain the read lock), then drop the lock. Note that both
wait until they acquire the lock, and avoid any race conditions that way.
Using the generation counter and advisory lock scheme you should get pretty much maximum throughput with minimum CPU time used.
If you cannot control the editors, then using inotify is your best best. Or, if you are Linux-only, then you could use file leases to detect when another process starts modifying the file, then repeatedly try to lease the file -- it will not succeed as long as it is open by any other process -- until successful, then reread. Both advisory locks and file leases should be sufficient to guarantee the synchronicity.
Care to describe what your program does in a little more detail? Is it just one, or multiple files? Are the files small or large? Binary or text? All processes forked from the same parent? Can you modify the sources for all processes reading and writing to the file?