Kernel oom autopsy - trying to understand the oom-killer log entries

kamermans · 01-10-2011, 11:01 PM

I recently had a nasty out of memory kernel freak-out which ended in oom-killer killing sshd, named, apache, mysql and a handful of other services, at which point, my server wasn't serving anything and I couldn't get to it to diagnose the problem. After it was rebooted, it was clear that it was an oom:

Quote:

Jan 6 07:32:35 [hostname] kernel: [19650742.227691] mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Jan 6 07:32:35 [hostname] kernel: [19650742.227696] mysqld cpuset=/ mems_allowed=0
Jan 6 07:32:35 [hostname] kernel: [19650742.227699] Pid: 13433, comm: mysqld Not tainted 2.6.32-22-server #33-Ubuntu
Jan 6 07:32:35 [hostname] kernel: [19650742.227702] Call Trace:
Jan 6 07:32:35 [hostname] kernel: [19650742.227710] [<ffffffff810b32dd>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Jan 6 07:32:35 [hostname] kernel: [19650742.227715] [<ffffffff810f5f54>] oom_kill_process+0xd4/0x2f0
Jan 6 07:32:35 [hostname] kernel: [19650742.227718] [<ffffffff810f6510>] ? select_bad_process+0xd0/0x110
Jan 6 07:32:35 [hostname] kernel: [19650742.227722] [<ffffffff810f65a8>] __out_of_memory+0x58/0xc0
Jan 6 07:32:35 [hostname] kernel: [19650742.227727] [<ffffffff8155734d>] ? _read_lock+0xd/0x20
Jan 6 07:32:35 [hostname] kernel: [19650742.227730] [<ffffffff810f673e>] out_of_memory+0x12e/0x1a0
Jan 6 07:32:35 [hostname] kernel: [19650742.227733] [<ffffffff8155746e>] ? _spin_lock+0xe/0x20
Jan 6 07:32:35 [hostname] kernel: [19650742.227737] [<ffffffff810f9781>] __alloc_pages_slowpath+0x511/0x580
Jan 6 07:32:35 [hostname] kernel: [19650742.227741] [<ffffffff810f9961>] __alloc_pages_nodemask+0x171/0x180
Jan 6 07:32:35 [hostname] kernel: [19650742.227745] [<ffffffff8112c597>] alloc_pages_current+0x87/0xd0
Jan 6 07:32:35 [hostname] kernel: [19650742.227749] [<ffffffff810f3727>] __page_cache_alloc+0x67/0x70
Jan 6 07:32:35 [hostname] kernel: [19650742.227752] [<ffffffff810fd299>] __do_page_cache_readahead+0xc9/0x210
Jan 6 07:32:35 [hostname] kernel: [19650742.227757] [<ffffffff810fd401>] ra_submit+0x21/0x30
Jan 6 07:32:35 [hostname] kernel: [19650742.227761] [<ffffffff810f4fee>] filemap_fault+0x3fe/0x450
Jan 6 07:32:35 [hostname] kernel: [19650742.227765] [<ffffffff811116a4>] __do_fault+0x54/0x500
Jan 6 07:32:35 [hostname] kernel: [19650742.227768] [<ffffffff81114bd8>] handle_mm_fault+0x1a8/0x3c0
Jan 6 07:32:35 [hostname] kernel: [19650742.227773] [<ffffffff810397a9>] ? default_spin_lock_flags+0x9/0x10
Jan 6 07:32:35 [hostname] kernel: [19650742.227777] [<ffffffff81559fd5>] do_page_fault+0x125/0x3b0
Jan 6 07:32:35 [hostname] kernel: [19650742.227780] [<ffffffff81557935>] page_fault+0x25/0x30
Jan 6 07:32:35 [hostname] kernel: [19650742.227782] Mem-Info:
Jan 6 07:32:35 [hostname] kernel: [19650742.227784] Node 0 DMA per-cpu:
Jan 6 07:32:35 [hostname] kernel: [19650742.227787] CPU 0: hi: 0, btch: 1 usd: 0
Jan 6 07:32:35 [hostname] kernel: [19650742.227789] CPU 1: hi: 0, btch: 1 usd: 0
Jan 6 07:32:35 [hostname] kernel: [19650742.227791] Node 0 DMA32 per-cpu:
Jan 6 07:32:35 [hostname] kernel: [19650742.227794] CPU 0: hi: 186, btch: 31 usd: 178
Jan 6 07:32:35 [hostname] kernel: [19650742.227796] CPU 1: hi: 186, btch: 31 usd: 60
Jan 6 07:32:35 [hostname] kernel: [19650742.227801] active_anon:285005 inactive_anon:95530 isolated_anon:6
Jan 6 07:32:35 [hostname] kernel: [19650742.227802] active_file:133 inactive_file:176 isolated_file:21
Jan 6 07:32:35 [hostname] kernel: [19650742.227803] unevictable:0 dirty:0 writeback:0 unstable:0
Jan 6 07:32:35 [hostname] kernel: [19650742.227804] free:3338 slab_reclaimable:3139 slab_unreclaimable:7278
Jan 6 07:32:35 [hostname] kernel: [19650742.227805] mapped:4754 shmem:30585 pagetables:21454 bounce:0
Jan 6 07:32:35 [hostname] kernel: [19650742.227808] Node 0 DMA free:7784kB min:40kB low:48kB high:60kB active_anon:2632kB inactive_anon:2960kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):84kB pre
Jan 6 07:32:35 [hostname] kernel: [19650742.227820] lowmem_reserve[]: 0 1938 1938 1938
Jan 6 07:32:35 [hostname] kernel: [19650742.227824] Node 0 DMA32 free:5568kB min:5612kB low:7012kB high:8416kB active_anon:1137388kB inactive_anon:379160kB active_file:528kB inactive_file:720kB unevictable:0kB isolated(anon):24kB isola
Jan 6 07:32:35 [hostname] kernel: [19650742.227835] lowmem_reserve[]: 0 0 0 0
Jan 6 07:32:35 [hostname] kernel: [19650742.227839] Node 0 DMA: 12*4kB 3*8kB 50*16kB 25*32kB 4*64kB 4*128kB 5*256kB 2*512kB 1*1024kB 1*2048kB 0*4096kB = 7816kB
Jan 6 07:32:35 [hostname] kernel: [19650742.227849] Node 0 DMA32: 700*4kB 124*8kB 77*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5568kB
Jan 6 07:32:35 [hostname] kernel: [19650742.227859] 61078 total pagecache pages
Jan 6 07:32:35 [hostname] kernel: [19650742.227861] 30126 pages in swap cache
Jan 6 07:32:35 [hostname] kernel: [19650742.227863] Swap cache stats: add 6068560, delete 6038434, find 37512173/37888994
Jan 6 07:32:35 [hostname] kernel: [19650742.227865] Free swap = 0kB
Jan 6 07:32:35 [hostname] kernel: [19650742.227866] Total swap = 4184924kB
Jan 6 07:32:35 [hostname] kernel: [19650742.233600] 507232 pages RAM
Jan 6 07:32:35 [hostname] kernel: [19650742.233602] 9901 pages reserved
Jan 6 07:32:35 [hostname] kernel: [19650742.233605] 151946 pages shared
Jan 6 07:32:35 [hostname] kernel: [19650742.233607] 486346 pages non-shared
Jan 6 07:32:35 [hostname] kernel: [19650742.233611] Out of memory: kill process 13218 (apache2) score 139071 or a child
Jan 6 07:32:35 [hostname] kernel: [19650742.233654] Killed process 13218 (apache2)

There's a bit more of a back-story here, but basically, the server was slowly running out of memory and, eventually, it ran out of swap as well. I've got cacti on the server which monitors the system's memory usage and I can see it decay over time, down to nothing a few hours before it crashed.

I'm trying to prevent this from recurring, but since I don't have a record of which process(es) had all the RAM, I'm trying to get some circumstantial evidence. The above snippet is the last complete oom-killer log before the server died completely. First, I'm assuming mysqld requested more memory which signaled oom-killer to kill another process and it eventually chose apache2. This server typically has fairly low resident memory usage, for example, here's what it looks like right now:

Quote:

# cat /proc/meminfo
MemTotal: 1989324 kB
MemFree: 1082540 kB
Buffers: 4248 kB
Cached: 234660 kB
SwapCached: 244 kB
Active: 344372 kB
Inactive: 170440 kB
Active(anon): 296304 kB
Inactive(anon): 144528 kB
Active(file): 48068 kB
Inactive(file): 25912 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4184924 kB
SwapFree: 4174096 kB
Dirty: 176 kB
Writeback: 0 kB
AnonPages: 275684 kB
Mapped: 97592 kB
Shmem: 164920 kB
Slab: 63588 kB
SReclaimable: 47896 kB
SUnreclaim: 15692 kB
KernelStack: 1872 kB
PageTables: 12516 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5179584 kB
Committed_AS: 936064 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 548744 kB
VmallocChunk: 34359184892 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 9664 kB
DirectMap2M: 2019328 kB

The way I understand the linux memory management is that the "free" memory will be slowly used for caching I/O (as seen in meminfo under Buffers and Cache), but since this memory is backed by actual files, it can be reclaimed when memory starts getting low.

Based on my kernel log snippet, is it clear that all the physical memory was used by resident processes? I understand that the "free" memory as shown by meminfo and top does not take reclaimable memory into account. I'm trying to get a clear picture of how much memory was used for what purpose, but there are many different numbers in the log (ie. "Mem-Info:" vs. "Node 0 DMA" vs. "Node 0 DMA32").

I have a feeling that if I were at the console at this point, I would use the top command to see why the system is so slow, and I would see 0kB of memory and swap, but the list of processes' RES memory would not add up to the 6GB of RAM+SWAP.

If it helps, I believe this issue was caused/perpetuated by a user that was hitting a memory-intensive HTTP API of mine several million times per day (this is what happens when you try to be a nice guy I guess :P )

Thanks in advance for your insight!

[EDIT]
To expand, I doubt Apache could have used all the memory since it's restricted to 10 spare servers and 25 workers max, and they typically hit about 75MB max, so this would only be 25 x 75 = 1.8GB as a worst-case scenario, and oom-killer should have been able to kill off the whole herd of workers and reclaim all kinds of memory.

xandercage17 · 01-11-2011, 08:48 AM

Delete My account

kamermans · 01-11-2011, 12:52 PM

I'm using prefork right now to spawn worker threads before they're needed, I'm just trying to make sure my system is properly reclaiming memory. What is strange to me is that this whole process took many days, so the kernel should have had time to react in order to prevent this scenario.

I've attached a graph of the free memory and swap during this period. The blank section is when the server was down. It's a little confusing because all the measurements are in kilobytes, so the top of the graph (5.0 M) is 5,000,000 kB, or 5GB. The stats are the same way, "Current: 492.65 k" really means "Current: 492.65 MB".