sleep( ) call causes whole GNU Linux system to hang indefinitely
Anyone know the reason why a sleep( ) on a Redhat Linux OS would cause the system to indefinitely hang?
It's doing this every 10 or so calls in my program and I have to press the reboot button on my computer. This is driving me nuts. EDIT: My program is reading from a UDP port that has messages sent to it 20 times per second. When I sleep I assume the internal UDP buffer is getting more and more filled. |
why are you sure the sleep() is the cause? Because, in itself a sleep would/should not be the cause of your system hang(s).
What version of RedHat, and whatever else may applicable? |
Redhat Enterprise Linux 5.4.
I'll run it under valgrind on monday to see if anything is being corrupted. |
If this is a userspace application, and it is reproducible, then I imagine someone in the kernel development community would like to know about it. No userspace application should be able to crash the OS.
--- rod. |
Quote:
How do you know that your system (i.e. the kernel, not your particular task) hangs ? |
Quote:
And more importantly, why are you using sleep()? |
Quote:
Quote:
I'm still curious to know why the sleep would cause the system to hang though. I'm assuming there are guards to prevent user space code from corrupting anything that sleep() relies on to recover? Can anyone enlighten me on how sleep() works on GNU/Linux? The only other issue I've found on the internet regarding a similar issue is this: http://www.daniweb.com/software-deve...threads/267684 Before anyone says that it can be a SIGALRM or alarm() issue, the GNU C Library states, "On the GNU system, it is safe to use sleep and SIGALRM in the same program, because sleep does not work by means of SIGALRM." Source: http://www.gnu.org/s/hello/manual/libc/Sleeping.html |
Have you tested a simple program such as this? If so, did the system hang?
Code:
#include <unistd.h> |
I commented out some code and put a sleep(1); return 0; in a while(1) loop, which each iteration reading one message from the port. Everything went fine. This leads me to believe that something may be corrupting addresses that sleep relies on.
I've yet to run valgrind, but will update as soon as I do. |
Quote:
Do you have physical terminal and keyboard connected to the "hanging" computer ? Can you switch to another virtual console (e.g. Ctrl-Alt-F1 .. Ctrl-Alt-F6) ? |
There is a serial terminal connected to /dev/ttyS0 that is open and idle during the time of execution. When the system hangs the serial terminal is unresponsive. No kernel panic whatsoever as well.
|
And, by the way, I remember that heavily loaded 4 core server with 16GB of RAM could be very unresponsive - it was taking several minutes to obtain 'ssh' response from it.
So, if your program leaks memory, so heavy swapping kicks in, you may feel the server is stuck, though in reality it isn't. |
Memory usage and processor usage is very small/negligible. It doesn't even show up in top. Anyways, like I said I moved the function that had sleep in it to another thread and the system doesn't hang anymore.
|
Alright guys, I ran the program under helgrind and I found the problem. It was a deadlock in the kernel device driver and not the sleep. Thanks for the help guys.
Time to point fingers at the guy who wrote this. |
Quote:
|
All times are GMT -5. The time now is 01:07 AM. |