LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-03-2011, 01:12 PM   #1
sirnewton_01
LQ Newbie
 
Registered: Jun 2011
Posts: 3

Rep: Reputation: Disabled
inotify and missing events on linux kernel 2.6.32


Hello All,

I have been trying to use inotify to monitor a directory and all subdirectories in the filesystem for various changes: add/delete/modify. As part of this task I know that inotify does not provide API for recursive listening so I have to handle the recursion myself by listening for changes that involve adding new directories and adding a watch using inotify_add_watch().

After I add a new watch on a directory I immediately scan that directory using the standard opendir() and readdir() calls looking for any subdirectories recursively adding those using inotify_add_watch(). I would expect that the inotify_add_watch() call would return at a point where any new files added to this directory will generate events that I will later process. As a result, when I do my scan of the directory after adding the watch, I stat() each file looking for whether its ctime is before or after the point that I added the watch so I don't process the same file or directory twice (once from the crawl and once again from the inotify event from the kernel).

The problem that I am running into is that with the above logic I am missing events when I extract an archive (unzip or untar) with ~900 directories. It is especially troublesome because I miss some of the directories so I don't have the change to register a listener on them and will miss out on anything happening in that subtree.

Initially, I had guessed that perhaps I wasn't processing the events quickly enough using the read() call on the file descriptor provided by the inotify_init(). So, I tried a multi-threaded approach with an event queue. With a separate reading thread I still miss some of the events.

Afterwards, I tried the same experiment with the inotifywatch tool from the inotify-tools. e.g.

inotifywatch -r ~/tmp > output.txt
unzip <foo.zip> ~/tmp
<Ctrl-C> on the terminal running inotifywatch after several seconds

When I do a "cat output.txt | wc -l" I see only 896 directories when there should be 906. This seems to indicate that even inotifywatch is missing some of the events.

Finally, I went back to my original code and inserted a usleep() with 10ms between the time that I add a new watch to a directory and I crawl its contents looking for existing files. It dramatically improves the error rate with my own program but there are still occasions where I do not receive all of the events.

I double-checked /proc/sys/fs/inotify/* to verify that I wasn't exceeding any limits and everything seems adequate to handle my case.

Is it possible that inotify_add_watch() does not block completely until the kernel is prepared to begin sending events for that directory? Is my approach (and inotifywatch) somehow flawed?

Thanks,
Chris
 
Old 06-06-2011, 09:21 AM   #2
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 54

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Quote:
Originally Posted by sirnewton_01 View Post
As a result, when I do my scan of the directory after adding the watch, I stat() each file looking for whether its ctime is before or after the point that I added the watch so I don't process the same file or directory twice (once from the crawl and once again from the inotify event from the kernel).
How do you check what time the watch was added (since a gap between the watch placement and the time is potentially going to mean either processing a file twice or missing a file depending on the order)?

Incidentally, adding a watch twice for the same directory is not necessarily a problem, because this will simply modify the existing watch.

One potential issue is that a file can be moved (ie renamed), which is common for applications that make temporary files (eg decompression to a temporary, then rename to the actual name). So it may be necessary to check for MOVE events in some circumstances.

Quote:
I tried the same experiment with the inotifywatch tool from the inotify-tools.
The inotifywatch tool is not going to report any directories that are empty, since they will not have any reportable events; this means that a line count on the output isn't necessarily going to match the directories created. It would be more concerning if the count is not consistently the same every time.

Quote:
Is it possible that inotify_add_watch() does not block completely until the kernel is prepared to begin sending events for that directory?
Not clear that 'blocking' is required here. The sequence in your code is presumably: read the event list; add watches to any subdirectories; recurse. The system does not block between reading the event and adding the new watches, so necessarily there may be writes inside the subdirectories before the watch is placed on it (hence the recursion). When the watch is placed, there will be some point at which the watch becomes valid; there is no 'window' during which access to the file should be blocked (from the point of view of the user application).

Last edited by neonsignal; 06-06-2011 at 09:28 AM.
 
Old 06-06-2011, 04:14 PM   #3
sirnewton_01
LQ Newbie
 
Registered: Jun 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thank you for your reply.

I have reworked some of my code and it seems to catch more of the events but I can rerun the same scenario several times and get one or two runs that are off by a few directories.

Quote:
Originally Posted by neonsignal View Post
How do you check what time the watch was added (since a gap between the watch placement and the time is potentially going to mean either processing a file twice or missing a file depending on the order)?
I was hoping that the time that the inotify_add_watch() returned would signal the moment that the kernel is listening to changes. So, I get the current time with a time(NULL) call and use that to compare the ctimes of the files and subdirectories using opendir() and readdir(). If ctimes are earlier then I assume that they may not have had events generated for them. If the ctimes are after I ignore them because I'm expecting to get events for those.

Quote:
Originally Posted by neonsignal View Post
One potential issue is that a file can be moved (ie renamed), which is common for applications that make temporary files (eg decompression to a temporary, then rename to the actual name). So it may be necessary to check for MOVE events in some circumstances.
Thanks, I have added IN_MOVE to the event mask when I do inotify_add_watch().

Quote:
Originally Posted by neonsignal View Post
The inotifywatch tool is not going to report any directories that are empty, since they will not have any reportable events; this means that a line count on the output isn't necessarily going to match the directories created. It would be more concerning if the count is not consistently the same every time.
I know that inotifywatch only shows directories in its output. I am expecting 906 directories to be shown there because all are non-empty because I do a "find <directory> -type d -empty" after unzipping and I find no results.

Today, I reran the test case as described above. The first time I see 900 directories listed in output.txt, the second time 905. It does seem to fluctuate even given the fact that it is the same zip every time. I tried invoking a sync command in another terminal just to be sure the output.txt wasn't being truncated by accident.

Quote:
Originally Posted by neonsignal View Post
Not clear that 'blocking' is required here.
What I mean here is that I expect to be able to know when the directory is ready to report any events that happen to it. In the best (of the worst) case scenario I am processing events twice when I don't need to. In the worst case, when I get execution back from inotify_add_watch(), files are added to the directory but I don't get any events for them and I don't catch them while iterating the directory.

I have thought up some schemes to try to work around any races. For example, I could mark directories as being "warm" and listen to them for a certain period of time in a separate thread until its contents stabilize. Also, I could create a file in the middle of this directory (if I have permission) and wait until I get that event before proceeding to iterate the directory.

Hopefully I won't need to do these things.

Thanks again,
Chris
 
Old 06-06-2011, 10:35 PM   #4
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 54

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
It would be a shame to have to use a workaround.

Incidentally, I haven't been able to replicate what you did with the inotifywatch tool. I created a zip file with 930 directories (30 directories, each with 30 inside, and each with an empty file inside). I tried two different machines (an Athlon, and an older PIII), both running 2.6.32 kernel. It picked up events on all the directories during the unzip. I'm not sure what else to try. Admittedly these machines are only single CPU.

Quote:
I was hoping that the time that the inotify_add_watch() returned would signal the moment that the kernel is listening to changes. So, I get the current time with a time(NULL) call and use that to compare the ctimes of the files and subdirectories using opendir() and readdir().
Are you saying that you get the current time (which won't be exactly the same as the inotify_add_watch time), or that you get the time from the inotify_add_watch file descriptor?
 
Old 06-07-2011, 11:00 AM   #5
sirnewton_01
LQ Newbie
 
Registered: Jun 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks for trying out the inotifywatch tests on your systems.

I tried it again on 3 more systems locally. And found the same inconsistent results on each one. Sometimes I get events on 901 directories, sometimes 903, but rarely the full 906 that I expect.

There are some things that these system have in common though.

1) Ubuntu (either 10.04 or 10.10)
2) Multi-CPU (some dual, one quad and one with two hyperthreads)
3) Ext4 (in every case the filesystem with the test directory is ext4)

Quote:
Originally Posted by neonsignal View Post
Are you saying that you get the current time (which won't be exactly the same as the inotify_add_watch time), or that you get the time from the inotify_add_watch file descriptor?
At the moment, yes. I get the current time using a call to time(NULL) immediately after inotify_add_watch() from time.h. So, should I stat() the inotify's file descriptor or watch descriptor for the directory using its mtime instead?

Thanks again,
Chris
 
Old 06-07-2011, 05:50 PM   #6
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 54

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Quote:
Originally Posted by sirnewton_01 View Post
1) Ubuntu (either 10.04 or 10.10)
2) Multi-CPU (some dual, one quad and one with two hyperthreads)
3) Ext4 (in every case the filesystem with the test directory is ext4)
I'm running ext4 too. The main difference is the multicpu, but we aren't running exactly the same zip file either, and my machines are probably all slower than yours.

Quote:
At the moment, yes. I get the current time using a call to time(NULL) immediately after inotify_add_watch() from time.h. So, should I stat() the inotify's file descriptor or watch descriptor for the directory using its mtime instead?
My thoughts were that any files that were being created in the short time between your call to inotify_add_watch and the call to time are going to be missed by your algorithm. It would be better to call time before you added the watch (because at least then the worst outcome would be to watch items twice, which is not a major issue).

I had originally thought you might be checking the time on the inotify watch descriptor, and couldn't see how that could be done. I also don't think that checking the inotify file descriptor or the directory will help, they won't bear much relationship to the watch time.

I had a look through the inotifywatch code, and it appears to me that they are keeping a list of all the watches placed (using a red-black tree). Their recursive descent watch function explicitly states in the comments that it may miss watches. However, it doesn't matter, because the watch placed on a directory happens before the recursion inside the directory, so it will end up just recursing the directory again if something gets added in the meantime. It is a lot more effort than checking the times, assuming there is some way to get the timestamp method to work!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Events in linux kernel rkemiset Linux - Kernel 1 08-08-2009 03:00 PM
inotify and filename / directory name change events euroquisling Programming 3 05-20-2009 03:26 AM
Keyboard events interrupting mouse events. miner49er Linux - Hardware 3 11-04-2008 04:16 AM
Hang on triggering udev events- is there a buildup of events? sonichedgehog Slackware 20 07-11-2008 02:49 AM
patching the kernel with "inotify" abhi_abhijith Linux - General 2 02-27-2006 10:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration