LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices



Reply
 
Search this Thread
Old 12-20-2012, 05:09 AM   #1
SCBrisbane
LQ Newbie
 
Registered: Dec 2012
Posts: 2

Rep: Reputation: Disabled
Understanding the out-of-memory state and the oom-killer


Hi,

I have recently been seeing servers run out of memory (physical and swap). The server then becomes completely unresponsive. What I would expect to see in this circumstance is that the out of memory killer kicks in and kills a process, but this does not seem to be happening every time and even when it does, it can take a few minutes from the server completely running out of memory and going unresponsive to the oom-killer terminating any processes - by that point various timout errors have started and the server should be effectively rebooted. Is there any threshold that can be tuned to allow the out of memory killer to kick in sooner, or alternatively is there a way to debug this to find out why the out of memory killer is not able to work effectively?

Consider my servers as a farm of machines for scientists to test code on. Memory leaks are not uncommon, but I need the server to terminate the offending process and continue to work. I would prefer to use the out of memory infrastrucure if is suitabe for this purpose rather than write a script. I have tried invoking the oom-killer manually with "echo f > /proc/sysrq-trigger" and it (a) works and (b) makes a sensible choice of which process to kill.

The machines have 16G RAM 16G swap,
2.6.18-308.20.1.el5 x86_64, 2.6.18-308.24.1.el5 x86_64

/proc/sys/vm:

block_dump 0
dirty_background_bytes 0
dirty_background_ratio 10
dirty_bytes 0
dirty_expire_centisecs 3000
dirty_ratio 40
dirty_writeback_centisecs 500
drop_caches 0
flush_mmap_pages 1
hugetlb_shm_group 0
laptop_mode 0
legacy_va_layout 0
lowmem_reserve_ratio 256 256 32
max_map_count 65536
max_reclaims_in_progress 0
max_writeback_pages 1024
min_free_kbytes 32527
min_slab_ratio 5
min_unmapped_ratio 1
mmap_min_addr 4096
nr_hugepages 0
nr_pdflush_threads 2
overcommit_memory 0
overcommit_ratio 50
pagecache 100
page-cluster 3
panic_on_oom 0
percpu_pagelist_fraction 0
swappiness 60
swap_token_timeout 300 0
topdown_allocate_fast 0
vfs_cache_pressure 100
vm_devzero_optimized 1
zone_reclaim_interval 30
zone_reclaim_mode 1

Last edited by SCBrisbane; 12-20-2012 at 03:11 PM.
 
Old 12-21-2012, 04:35 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,505

Rep: Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079Reputation: 1079
I don't have any systems anywhere near that old, but I'd be looking at that min_free_kbytes. For comparison, this 8 Gig F16 laptop has 67584.
Have a read of this (it's on wayback, so could take a while to load the page).

If your systems get really low, there may not be any room to allocate necessary for the oom-killer itself without it fighting with kswapd. Might explain your observations.

Dicking with the vm tunables is very much a black art - pick a system that is expendable to play on.
 
Old 12-21-2012, 06:53 AM   #3
SCBrisbane
LQ Newbie
 
Registered: Dec 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
> I don't have any systems anywhere near that old, but I'd be looking at that min_free_kbytes.

Thank you for your suggestion. I changed min_free_kbytes to values up to 1G. Unfortunatley all that apears to achieve is that they system holds back 1G of ram which - as far as I can determine - seems to be unusable to any process. I have been able to reproduce a full system hang, using a process with an artificial memory leak, even with values up to 1G.

Last edited by SCBrisbane; 12-21-2012 at 07:54 AM. Reason: New result
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
OOM killer even though there is memory available Bilb Linux - General 4 12-07-2011 04:03 AM
Kernel oom autopsy - trying to understand the oom-killer log entries kamermans Linux - Kernel 2 01-11-2011 01:52 PM
Oom killer? fortez Linux - Server 6 07-08-2010 04:37 AM
Is it OOM Killer - how to tell from sar? mohitanchlia Linux - General 12 04-29-2009 08:12 PM
Out of memory (oom) killer causes system crash? BusyBeeBop Linux - Software 6 06-02-2008 02:42 AM


All times are GMT -5. The time now is 02:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration