LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 05-21-2012, 09:05 PM   #1
hydraMax
Member
 
Registered: Jul 2010
Location: Skynet
Distribution: Debian + Emacs
Posts: 467
Blog Entries: 60

Rep: Reputation: 51
file systems: external edits to an open file


I'm looking for some perspective on a particular point that is confusing me: Let's say that, on a Gnu/Linux system:

1. I open file A with fopen(), and I read from it, and I leave the program running (i.e., without closing A). Then...

2. Using a different program (say, a text editor), I edit file A, changing a few words. Then...

3. Back in my original program, I rewind() the open stream for FILE A, and read from it again.

So, after step three, my program will always have read the new, edited version of the file A data, correct? Or is this something that is file system dependent?

The reason I ask is, I seem to be getting different results in this scenario for the same program. In one case, the data file is on an XFS file system, and in the other case, the data file is on an EXT3 file system.
 
Old 05-21-2012, 09:59 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Quote:
Originally Posted by hydraMax View Post
So, after step three, my program will always have read the new, edited version of the file A data, correct?
The simple answer is no. There is no such guarantee. When the changes to the underlying file become visible to a process, varies on multiprocessor architectures.

If you modify the step three to reopen the underlying file, you should always see the new, edited version of the file. (Note that this is recommended anyway, because many editors replace the original file with a new one. Your program will still access the old, already deleted file contents; since it is still open, the data is still on disk, but vanishes when the last open descriptor is closed.)

If the editing process and your process use advisory file locks (see man fcntl()), with the editing program temporarily acquiring a write lock on the file, then your program will see the new data after it re-acquires the file lock. The same applies to file leases, except they are Linux-specific.

However, standard C I/O has internal buffers, and you indicated standard C I/O was used. If you look at the fflush() man page, you see that to be sure you see the current contents of the file (and not just the contents of the standard library buffers), you should fflush() the input file handle before rewinding the file, to discard any cached data. The GNU C library uses the filesystem information to determine the optimum cache size for each file (it tries to use native I/O block size, I believe). This could be a reason why it occurs on one filesystem but not on another. You can use stat -c %o FILE-OR-DIRECTORY to see the native I/O block size.

Finally, the caching done by ext3 is a bit wonky, and it could be the reason why you might be seeing old data on ext3. Just search for ext3 fsync on the web to dive into the matter. I seem to remember that on certain settings it could take a very long time for the modified data to show up on open descriptors; something about the way it does writeback, I think. I've switched to ext4 years ago, so I haven't kept up with ext3 quirks at all; ext4 is faster, besides.

If you think it is useful, I could easily whip up two C99 programs to investigate the issue/effect using low-level I/O?
 
1 members found this post helpful.
Old 05-22-2012, 01:29 AM   #3
hydraMax
Member
 
Registered: Jul 2010
Location: Skynet
Distribution: Debian + Emacs
Posts: 467
Blog Entries: 60

Original Poster
Rep: Reputation: 51
@Nominal: Thanks for the detailed response. Very helpful and interesting.

Looking into my code some more, it seems that the problem wasn't that the file contents pointed to by the stream aren't changing (although that may still be a problem), but rather that my program only knows that it has changed when it detects that the mtime (from stat()) has changed. For some reason, the mtime of the open FILE stream does change (after external editing) under the xfs file system, but not under the ext3 file system.

But I think the same principles as you mentioned above still apply. So, does this mean I have to reopen the FILE stream every time I want to see if a file has been modified? I guess that isn't too hard to do, but it feels a bit odd, since (in my case) I need to know whether or not the file has been modified before pretty much every operation that is done in the program. So that is a lot of opening and closing of FILE streams.
 
Old 05-22-2012, 02:17 PM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Quote:
Originally Posted by hydraMax View Post
Looking into my code some more, it seems that the problem wasn't that the file contents pointed to by the stream aren't changing (although that may still be a problem), but rather that my program only knows that it has changed when it detects that the mtime (from stat()) has changed.
Ah. Why not use the inotify interface to detect the CLOSE_WRITE , MOVE_SELF and DELETE_SELF events for that file? The first requires a re-read, the others a reopen (since someone has replaced the file) followed by a re-read. Note that if the editor removes the original file first, instead of the recommended mechanism of renaming the new file over the old file, there might be a short while the file does not exist; some kind of a retry mechanism for the reopening might be needed.

There is also a small delay in inotify reporting the event. The delay effectively guarantees all processes should see the new state, so in your case the delay might be useful. But, if the changes occur at a high frequency, the delay might become a bottleneck.

Quote:
Originally Posted by hydraMax View Post
For some reason, the mtime of the open FILE stream does change (after external editing) under the xfs file system, but not under the ext3 file system.
I think that's related to the writeback wonkiness on ext3 I mentioned.

Quote:
Originally Posted by hydraMax View Post
But I think the same principles as you mentioned above still apply. So, does this mean I have to reopen the FILE stream every time I want to see if a file has been modified?
No, not really. fflush() I would do prior to the reread, though.

The window between the changes not being visible to other processes is very short, because it is more related to synchronization between CPU cores in kernel than anything else. The page cache is shared between all processes, so the synchronization happens very fast.

Using any kind of synchronization to detect the change point -- inotify, advisory file locks, file leases -- works, because the change notification is done after the synchronization occurs.

Something like using a pipe or a socket to tell the other process the modifications have been completed is racy, because there is no guarantees that synchronization has occurred yet: the pipe or socket may be faster than the kernel synchronization. But, if you couple that with advisory file locks or leases, then the lock or lease correctly handles the short race window, resulting in trustable guarantees. The timescale involved here is very, very short; certainly less than a second.

Quote:
Originally Posted by hydraMax View Post
I need to know whether or not the file has been modified before pretty much every operation that is done in the program. So that is a lot of opening and closing of FILE streams.
On my machine, repeated fopen() on the same file takes about 2 Ás: a single process can reopen the same file about half a million times per second. The cost of a fopen() seems neglible to me, considering you work on a file modified by multiple unrelated processes.

There are a few things you might consider. For example, use a generation counter, perhaps at the start of the file. Just use pread() to read it (without affecting the file position or confusing the standard I/O on the same file in any way) to see if the file contents have changed. Other processes modifying the file should only update the counter after the other modifications have been written to the file. The size of the counter is not that important, it is perfectly okay for it to wrap around; if your file is text, you can certainly use an identifier string instead. (The number of unique states tells how many writes a reader can miss while still being guaranteed to notice a new write.)

To avoid the possible race conditions in the use of the generation counter, have all processes take an advisory lock when accessing the file. Readers can access the generation counter at any time (remember, the locks are advisory, not mandatory; they won't block any read or write operations); whenever they see a change, they apply for a read lock (fcntl(fileno(handle), F_SETLKW, {F_RDLCK, SEEK_SET, 0, 0})), and keep it during the re-read (to make sure nobody makes any modifications to the file at the same time). After the re-read, they drop the lock. Writers will simply take a write lock (fcntl(fileno(handle), F_SETLKW, {F_WRLCK, SEEK_SET, 0, 0})), modify the contents, preferably the generation counter last (to avoid the time the readers have to wait to obtain the read lock), then drop the lock. Note that both wait until they acquire the lock, and avoid any race conditions that way.

Using the generation counter and advisory lock scheme you should get pretty much maximum throughput with minimum CPU time used.

If you cannot control the editors, then using inotify is your best best. Or, if you are Linux-only, then you could use file leases to detect when another process starts modifying the file, then repeatedly try to lease the file -- it will not succeed as long as it is open by any other process -- until successful, then reread. Both advisory locks and file leases should be sufficient to guarantee the synchronicity.

Care to describe what your program does in a little more detail? Is it just one, or multiple files? Are the files small or large? Binary or text? All processes forked from the same parent? Can you modify the sources for all processes reading and writing to the file?
 
Old 05-23-2012, 01:14 AM   #5
hydraMax
Member
 
Registered: Jul 2010
Location: Skynet
Distribution: Debian + Emacs
Posts: 467
Blog Entries: 60

Original Poster
Rep: Reputation: 51
https://frigidcode.com/code/csvfs/

My program translates a data file into a file system. If some external program changes the data file, that is okay with me, but after the data file changes, I need to know it happened so I can change the file system representation.
 
Old 05-25-2012, 12:00 AM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,612

Rep: Reputation: 674Reputation: 674Reputation: 674Reputation: 674Reputation: 674Reputation: 674
What you are looking for is a "file alteration monitor". On many distributions (Gentoo included) the package you want is called "gamin". It's fairly easy to set up a socket that will receive events whenever a monitored file or directory is changed. If the package is installed on your system, `man fam` will show details and usage.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Mounting 'external' file systems while running LinuxLive jjanderson5 Linux - Virtualization and Cloud 3 04-03-2012 09:16 AM
Problem Ubuntu live disk not recognising Internal and External Drive File Systems Ratbarf Linux - Newbie 9 01-07-2010 10:47 PM
LXer: Kernel Log: What's coming in 2.6.30 - File systems: New and revamped file syste LXer Syndicated Linux News 0 04-28-2009 12:02 AM
SED edits configuration file lusiads Programming 7 09-12-2008 04:25 AM
Open Ac3 file as external audio Cagnulein Linux - Software 3 07-11-2004 12:53 PM


All times are GMT -5. The time now is 02:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration