-   Linux - Newbie (
-   -   Linux 2.6 kernel hangs (

rmakhija 05-22-2005 11:56 PM

Linux 2.6 kernel hangs
Hi All

I have been facing this problem from some time now and feel it's time that I float this around for some solution. It would be great if you could provide some definitive pointers in this regard.

I am running a networking application which runs over SCTP ( Stream control transport protocol) protocol stack ( implemented on the lines of RFC 2960 ),. The SCTP stack sits over IP layer and makes raw IP system calls for all the networking operations.

The setup configuration is as follows .
1. There are 2 linux machines which are connected back to back over 100 mbps ethernet interface each having an intel 1.6 GHz Pentium IV processor with 1 GB of RAM each . Each machine is has redhat 9.0 running .

2. Each machine has SCTP stack running over IP layer. On top of SCTP stack there is a load application which can pump data messages ( of 100 bytes each for a period of 5 minutes) at different rates (which could be configured at run time) .

3. The intent is to evaluate the performance of the of the SCTP stack with the above configuration scenario.

4. To start with a messages with a moderate rate ( 1000 MSG per/sec) are pumped from both ends and the number is gradually increased till the buffers at transport layer become insufficient to handle the data rate at which messages are being pumped by application application - which basically is a load generator) from both sides.

5. A time comes when (the message rate has reached 35000 MSG/sec) one of the two computers stops responding and remains hanged and I have to forcefully (hard) reboot the machine . Ctrl+C etc keys donot work and the traces on the console also stop coming, none of the keys except CAPS, SCROLL and NUM LOCK seem to work. When I telnet the hanged machine from peer the control stops at escape sequence but the login prompt does not appear. Although the hanged machine successfully responds to the ping requests.

6. The SCTP stack and the application are running as a binary in user mode, The binary runs under root privileges. ( in super user mode)

Could you let me know what is happening, how can a user mode program force a kernel to go in an infinite loop such that it stops responding. I have checked /var/log messages but didn't find anything fishy.

Waiting for your suggestions/replies

Thanks and Best Regards

btmiller 05-23-2005 06:16 PM

This is a pretty obscure question ... you might have more luck in the networking forum. But just to be clear, the application is entirely user-space? No corresponding module inserted into the kernel? I could think of a couple of things it could be, perhaps related to corruption of kernel memory. But a couple questions:

1) What kernel version exactly are you running (type uname -a)?
2) Is it always the same machine that hangs? If so, it may simply be a case of dodgy hardware buckling under load.

rmakhija 05-24-2005 01:31 AM

Thanks for your inputs.
Yes, the application is running entirely in user space, it's only that it makes raw ip system calls (for IP underlying layer) for all the networking operations.
No corresponding module inserted into the kernel
Secondly I have tried this thing on various machines (running on kernel 2.6 machines and kernel 2.4 machines ) with different harware configurations but the problem persists
I would like to add that the problem surfaces only when we are pumping messages at a very high rate and when CPU utilization approaches 100 %. The messages are pumped for around 5 minutes but if the machine hangs it does not recover even if left idle for next 10-12 hours.
Waiting for your suggestions.
Thanks and Best Regards

All times are GMT -5. The time now is 09:10 AM.