LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 02-22-2008, 07:01 AM   #1
BusyBeeBop
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Rep: Reputation: 0
Out of memory (oom) killer causes system crash?


My (virtual) Red Hat Enterprise Linux 4 (update 5) server _seems_ to
(preliminary analysis) have crashed due to oom killer having killed
processes such as "sshd, "udevd" and a few others. The last log to
have been killed, according to /var/log/messages, is "udevd".


May the termination of "udevd" be the reason for the server to crash?
If so: Why does Linux whack processes that causes the server to crash?
 
Old 02-22-2008, 07:08 AM   #2
rayfordj
Member
 
Registered: Feb 2008
Location: Texas
Distribution: Fedora, RHEL, CentOS
Posts: 475

Rep: Reputation: 73
I'ts possible; I've seen oom-killer even nuke init (that's usually when things go really south ). Some-where in the list of oom-killer killed processes is most likely the culprit.


http://www.redhat.com/archives/taroo.../msg00006.html
and
http://linux-mm.org/OOM_Killer

"It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails. It will also kill any process sharing the same mm_struct as the selected process, for obvious reasons. Any particular process leader may be immunized against the oom killer if the value of it's /proc/<pid>/oomadj is set to the constant OOM_DISABLE (currently defined as -17)."



For further review: http://www.google.com/linux?hl=en&sa...er&btnG=Search

Last edited by rayfordj; 02-22-2008 at 07:10 AM. Reason: forgot to address udevd cause of crash question
 
Old 02-22-2008, 08:37 AM   #3
BusyBeeBop
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks for the reply.


Similar to mr. Sisler on the redhat-link we're running virtual RHEL (4) servers (on an esx server in our case). We have, however, other virtual RHEL 4 servers running the exact same kernel, Java-application, and so forth. One of these servers had the same problem with oom killer whacking processes, but never did the server crash. By setting lower_zone_protection to 250 (as mr. Sisler suggests) the whole oom killer problem went away.

But the RHEL 4 server in question does not seem to respond to this hack. _And_ it is the only server who has crashed due to oom killer.

I'm thinking that the problems we're having may have something to do with the interaction between esx and the virtual RHEL 4 server. I'm not quite sure, but it seems likely that there may be something there. Am I way off?
 
Old 02-22-2008, 06:19 PM   #4
rayfordj
Member
 
Registered: Feb 2008
Location: Texas
Distribution: Fedora, RHEL, CentOS
Posts: 475

Rep: Reputation: 73
You could be on to something. I suppose it is possible depending on memory shares allocation that if the vmware-tools driver is ballooning memory so that it may be allocated to a guest with higher shares that needs more physical RAM that it could induce this. (Stealing from Peter to pay Paul -- or something like that). I've not personally seen this as a problem myself...

There are some things that you could consider implementing (and I'm sure others may have more or more robust implementations and/or recommendations than these) to help track down what may be going on.

the first is to make sure that the sysstat package is installed in RHEL4.
Code:
rpm -q sysstat
cat /etc/cron.d/sysstat
this will collect system activity (a snapshot in time every 10 minutes by default) and generate a text report nightly (sometime around 4am by default) that may be found under /var/log/sa/

also, you could configure top sort by mem usage and dump to a file every X minutes via cron.

start top, press M (this should sort by Memory), then W (this should pop a quick confirmation just beneath the memory output and above the process list that says it wrote ~/.toprc) then q (to quit)

then add something to cron.d to capture output every X minutes.

every 5 minutes for example:
Code:
*/5 * * * * root /usr/bin/top -d 1 -n 1 -b >> /root/top.out 2>/dev/null
then after the problem has happened you can refer back to this for a timeline of process activity sorted by memory usage to see what the big hitters were.
 
Old 02-22-2008, 08:17 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,353

Rep: Reputation: 1042Reputation: 1042Reputation: 1042Reputation: 1042Reputation: 1042Reputation: 1042Reputation: 1042Reputation: 1042
Killing user processes is unlikely to cause a full-on system crash - killing init is a bit drastic though. I would have thought the code was smarter than that. Haven't looked at it for a while though, and I certainly wasn't looking for that ...
More likely resource exhaustion - maybe low memory as suggested.
Go for a 64-bit kernel if you can - for everything.

Sysstat would certainly help - with RH you probably already have it. If you need to do the top trick instead, put it in a script, and add that meminfo display, and anything else you can find in google.
Bit of legwork and digging around required I'd reckon.
 
Old 02-26-2008, 04:52 AM   #6
BusyBeeBop
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks for the tip on systat and top+cron. I'll try and implement these.

Unfortunately I'm stuck with 32-bit for now. :/

Last edited by BusyBeeBop; 02-26-2008 at 05:13 AM.
 
Old 06-02-2008, 01:42 AM   #7
pierwelzien
LQ Newbie
 
Registered: Jun 2008
Posts: 1

Rep: Reputation: 0
Hello everyone. We are experiencing in my company the same kind of problems ;

- We are running several RH4 VM under a VMware ESX Server. Often, some RH4 VM have "Out of Memory" problems ...

This is really strange because a lot of logs are taken from all the RH4 VM and when we analyze the logs, the processes are far away from consuming all the available memory.

Also, the stats presented by the ESX server don't show that much consumption of memory ...

Does someone has any idea of what could be the problem ? Thanks in advance
 
  


Reply

Tags
oom


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
oom killer vmware server concerns watts3000 Linux - Server 2 05-01-2007 11:05 PM
Please help - GNOME is broken after OOM Killer went on a rampage during upgrade sixerjman Linux - Software 3 11-20-2006 05:56 PM
OOM-Killer woes Slim Backwater Slackware 2 07-25-2006 03:00 AM
oom-killer: How to set priorities to kill processes guarriman Linux - Security 1 01-31-2006 09:03 AM
oom-killer is killing my programs on FC3 abarclay Fedora 1 03-08-2005 09:14 AM


All times are GMT -5. The time now is 01:19 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration