LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Oom killer? (https://www.linuxquestions.org/questions/linux-server-73/oom-killer-818079/)

fortez 07-05-2010 08:19 AM

Oom killer?
 
Hi
our servers have a strange behaviour.
We use hp servers with RHE 4 for tasks of simulation.

Some servers during elaboration kill tasks of simulation and many other processes, also some very important for server utilization such as syslogd e sshd, so server can be eneterd only by ILO port.
This behaviour have been present in we e nights, when tasks are more frequent.

I'have soon thought to a oom memory casa but i have found nothing in the logs that confirm this.

In fact:

- in /var/log/messages i have only:
Jun 30 04:08:43 serversym1 exiting on signal 15
- the sar output for that day is
Quote:

00:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
03:10:01 9401296 7012768 42.72 70672 4065020 32764440 116 0.00 0
03:20:01 9399760 7014304 42.73 70672 4065020 32764440 116 0.00 0
03:30:01 9398736 7015328 42.74 70672 4065020 32764440 116 0.00 0
03:40:01 9397264 7016800 42.75 70672 4065020 32764440 116 0.00 0
03:50:01 9396176 7017888 42.76 70672 4065020 32764440 116 0.00 0
04:00:02 9393744 7020320 42.77 70672 4065020 32764440 116 0.00 0
Average: 9397829 7016235 42.75 70672 4065020 32764440 116 0.00 0
So:
- is really a oom memory case?
- How can i possibly confirm oom memory assumption ?

Thanks

business_kid 07-05-2010 10:22 AM

free -ms 5 > file

You should see swap increasing if oom is hit, then free itself might go. BTW, RHEL 4 is a bit long in the tooth these days, but of course you knew that.

syg00 07-05-2010 05:31 PM

sysstat will provide all that and a lot more.
If OOM_killer had been invoked there would be messages everywhere. I'd be suspecting some monitoring code checking for "vital signs" - seems loadavg is a popular one.

fortez 07-06-2010 07:48 AM

you are all right.
I have seen logs with most attention and i have seen that out of memory cases are correctly logged every time, when they are present.
So it seems not to be a oom killer case.

So, why exiting on signal 15 in /var/log/messages and server not available?

I have not understood if you mean /proc/loadavg or loadAVG tool ...
I have also nagios on the serverS but it is killed before i see something so no clear info by nagios.

syg00 07-06-2010 05:15 PM

I was thinking of your simulation product. It may be trying to protect itself. I've seen this mentioned somewhere (on a 2.4 kernel from memory), but I can't find the reference at the moment.

fortez 07-07-2010 03:07 AM

This night other crashes, no solution until now

My only assurance is that crashes are caused by interation beetween this software and server but i have seen
same tasks on workstation with less cpu and ram than servers not cause crashes of pc
On workstation same realease of red hat than servers
i ' m really confused

business_kid 07-08-2010 03:37 AM

Well, if you're running your simulation in a terminal, I will tell you what oom looks like. I got it once compiling some fpga stuff a guy had written in a brain fart and had umpteen libraries linked. When it came to the final ld, it threw me

Out of swap space
process killed

I repeated with free -ms 5 running in a terminal, and watched that. Ram went, and swap was gobbled, then I got the lines above again. I did compile it by unloading everything else - X, etc, and just running the 2 bash terminals. It took 182 megs to link it, and I only only had 197 available between swap and ram, so I got there by just unloading other processes. That gave me a 2 Meg executable.


All times are GMT -5. The time now is 08:01 PM.