LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   sleep( ) call causes whole GNU Linux system to hang indefinitely (https://www.linuxquestions.org/questions/programming-9/sleep-call-causes-whole-gnu-linux-system-to-hang-indefinitely-879222/)

wtruong 05-06-2011 12:01 PM

sleep( ) call causes whole GNU Linux system to hang indefinitely
 
Anyone know the reason why a sleep( ) on a Redhat Linux OS would cause the system to indefinitely hang?

It's doing this every 10 or so calls in my program and I have to press the reboot button on my computer.

This is driving me nuts.

EDIT:

My program is reading from a UDP port that has messages sent to it 20 times per second. When I sleep I assume the internal UDP buffer is getting more and more filled.

Ramurd 05-06-2011 05:02 PM

why are you sure the sleep() is the cause? Because, in itself a sleep would/should not be the cause of your system hang(s).
What version of RedHat, and whatever else may applicable?

wtruong 05-07-2011 04:40 PM

Redhat Enterprise Linux 5.4.

I'll run it under valgrind on monday to see if anything is being corrupted.

theNbomr 05-09-2011 11:20 AM

If this is a userspace application, and it is reproducible, then I imagine someone in the kernel development community would like to know about it. No userspace application should be able to crash the OS.
--- rod.

Sergei Steshenko 05-09-2011 11:35 AM

Quote:

Originally Posted by wtruong (Post 4348568)
...
It's doing this every 10 or so calls in my program and I have to press the reboot button on my computer.
...

What else other than pressing the reboot button have you tried to do ?

How do you know that your system (i.e. the kernel, not your particular task) hangs ?

dwhitney67 05-09-2011 11:35 AM

Quote:

Originally Posted by wtruong (Post 4348568)
My program is reading from a UDP port that has messages sent to it 20 times per second.

Could you clarify the statement above? Are you stating that messages are sent at a rate of 20 msg/sec to the socket, or that you read the port at a rate of 20 read/sec?

And more importantly, why are you using sleep()?

wtruong 05-09-2011 01:09 PM

Quote:

How do you know that your system (i.e. the kernel, not your particular task) hangs?
I know it hangs because it hangs. I can't do anything else but to reset the card. I can't SSH into it, the serial terminal doesn't respond, etc.

Quote:

Could you clarify the statement above? Are you stating that messages are sent at a rate of 20 msg/sec to the socket, or that you read the port at a rate of 20 read/sec?

And more importantly, why are you using sleep()?
My application is reading from a port that has 20 small messages / second sent to it. The sleep prevents it from reading the messages coming in. I need to use sleep because the hardware that the application is sending to needs time to process a message I send to it (Old hardware). I've since moved the code segment that uses sleep to another thread so that the application doesn't have any delay in reading the messages at 20 times per second and so far no system hangs.

I'm still curious to know why the sleep would cause the system to hang though. I'm assuming there are guards to prevent user space code from corrupting anything that sleep() relies on to recover? Can anyone enlighten me on how sleep() works on GNU/Linux?

The only other issue I've found on the internet regarding a similar issue is this:
http://www.daniweb.com/software-deve...threads/267684

Before anyone says that it can be a SIGALRM or alarm() issue, the GNU C Library states, "On the GNU system, it is safe to use sleep and SIGALRM in the same program, because sleep does not work by means of SIGALRM."

Source: http://www.gnu.org/s/hello/manual/libc/Sleeping.html

dwhitney67 05-09-2011 03:06 PM

Have you tested a simple program such as this? If so, did the system hang?
Code:

#include <unistd.h>

int main()
{
  sleep(1);
  return 0;
}

P.S. As was commented on earlier, I also find it hard to believe that a user-program, calling sleep(), would affect the entire OS.

wtruong 05-09-2011 03:18 PM

I commented out some code and put a sleep(1); return 0; in a while(1) loop, which each iteration reading one message from the port. Everything went fine. This leads me to believe that something may be corrupting addresses that sleep relies on.

I've yet to run valgrind, but will update as soon as I do.

Sergei Steshenko 05-09-2011 03:27 PM

Quote:

Originally Posted by wtruong (Post 4351212)
I know it hangs because it hangs. I can't do anything else but to reset the card. I can't SSH into it, the serial terminal doesn't respond, etc.
...

???

Do you have physical terminal and keyboard connected to the "hanging" computer ? Can you switch to another virtual console (e.g. Ctrl-Alt-F1 .. Ctrl-Alt-F6) ?

wtruong 05-09-2011 03:35 PM

There is a serial terminal connected to /dev/ttyS0 that is open and idle during the time of execution. When the system hangs the serial terminal is unresponsive. No kernel panic whatsoever as well.

Sergei Steshenko 05-09-2011 03:35 PM

And, by the way, I remember that heavily loaded 4 core server with 16GB of RAM could be very unresponsive - it was taking several minutes to obtain 'ssh' response from it.

So, if your program leaks memory, so heavy swapping kicks in, you may feel the server is stuck, though in reality it isn't.

wtruong 05-09-2011 06:03 PM

Memory usage and processor usage is very small/negligible. It doesn't even show up in top. Anyways, like I said I moved the function that had sleep in it to another thread and the system doesn't hang anymore.

wtruong 05-10-2011 01:42 PM

Alright guys, I ran the program under helgrind and I found the problem. It was a deadlock in the kernel device driver and not the sleep. Thanks for the help guys.

Time to point fingers at the guy who wrote this.

Kwarf 06-27-2011 04:06 AM

Quote:

Originally Posted by wtruong (Post 4352262)
Alright guys, I ran the program under helgrind and I found the problem. It was a deadlock in the kernel device driver and not the sleep. Thanks for the help guys.

Time to point fingers at the guy who wrote this.

I'm experiencing similar problems. Which device drivers caused the problems on you system?


All times are GMT -5. The time now is 01:07 AM.