LinuxQuestions.org - Understanding /var/log/messages memory output

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - Understanding /var/log/messages memory output (https://www.linuxquestions.org/questions/linux-software-2/understanding-var-log-messages-memory-output-626145/)

Understanding /var/log/messages memory output

One of my servers are having problems with memory starvation, and are whacking off (mostly Java) processes to stay alive.

In order to debug the memory starvation issue I need a more thorough insight into how the memory management is done in Linux. Therefore it would be great if someone could provide a walkthrough of the essentails in the /var/log/messages extract provided below.

For example:

How do the different memory zones work?
What does it mean that memory is "all unreclaimable"?

I'm not completely new to this stuff, but would like an active discussion in order to get a more complete understanding.

Anyways, here's the log extract:

Mar 6 11:53:22 mercury kernel: oom-killer: gfp_mask=0xd2
Mar 6 11:53:22 mercury kernel: Mem-info:
Mar 6 11:53:22 mercury kernel: DMA per-cpu:
Mar 6 11:53:22 mercury kernel: cpu 0 hot: low 2, high 6, batch 1
Mar 6 11:53:22 mercury kernel: cpu 0 cold: low 0, high 2, batch 1
Mar 6 11:53:22 mercury kernel: cpu 1 hot: low 2, high 6, batch 1
Mar 6 11:53:22 mercury kernel: cpu 1 cold: low 0, high 2, batch 1
Mar 6 11:53:22 mercury kernel: Normal per-cpu:
Mar 6 11:53:22 mercury kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 6 11:53:22 mercury kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 6 11:53:22 mercury kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 6 11:53:22 mercury kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 6 11:53:22 mercury kernel: HighMem per-cpu:
Mar 6 11:53:22 mercury kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 6 11:53:22 mercury kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 6 11:53:22 mercury kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 6 11:53:22 mercury kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 6 11:53:22 mercury kernel:
Mar 6 11:53:22 mercury kernel: Free pages: 269900kB (512kB HighMem)
Mar 6 11:53:23 mercury kernel: Active:299042 inactive:231443 dirty:0 writeback:0 unstable:0 free:67475 slab:4698 mapped:530206 pagetables:2270
Mar 6 11:53:23 mercury kernel: DMA free:12524kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:4444 all_unreclaimable? yes
Mar 6 11:53:23 mercury kernel: protections[]: 0 116000 180000
Mar 6 11:53:23 mercury kernel: Normal free:256864kB min:928kB low:1856kB high:2784kB active:17224kB inactive:16292kB present:901120kB pages_scanned:2406355 all_unreclaimable? yes
Mar 6 11:53:23 mercury kernel: protections[]: 0 0 64000
Mar 6 11:53:23 mercury kernel: HighMem free:512kB min:512kB low:1024kB high:1536kB active:1178816kB inactive:909480kB present:4325376kB pages_scanned:7454098 all_unreclaimable? yes
Mar 6 11:53:23 mercury kernel: protections[]: 0 0 0
Mar 6 11:53:23 mercury kernel: DMA: 3*4kB 4*8kB 2*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12524kB
Mar 6 11:53:23 mercury kernel: Normal: 146*4kB 25*8kB 17*16kB 6*32kB 2*64kB 2*128kB 5*256kB 10*512kB 1*1024kB 1*2048kB 60*4096kB = 256864kB
Mar 6 11:53:23 mercury kernel: HighMem: 20*4kB 6*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB
Mar 6 11:53:23 mercury kernel: Swap cache: add 12442227, delete 12441204, find 4256249/5650043, race 5+249
Mar 6 11:53:23 mercury kernel: 0 bounce buffer pages
Mar 6 11:53:23 mercury kernel: Free swap: 0kB
Mar 6 11:53:23 mercury kernel: 1310720 pages of RAM
Mar 6 11:53:23 mercury kernel: 1015792 pages of HIGHMEM
Mar 6 11:53:23 mercury kernel: 77337 reserved pages
Mar 6 11:53:23 mercury kernel: 6895 pages shared
Mar 6 11:53:23 mercury kernel: 1023 pages swap cached
Mar 6 11:53:23 mercury kernel: Out of Memory: Killed process 18552 (java).

Regards,
kenneho

Quote:

Originally Posted by kenneho (Post 3080009)

In order to debug the memory starvation issue I need a more thorough insight into how the memory management is done in Linux.

Since you used the word "thorough" you'll want to read a few docs before asking questions since these explain a lot:
- LinuxMMDocumentation (starting point),
- Understanding Virtual Memory (gentle intro),
- /usr/src/linux/Documentation/sysctl/vm.txt,
- Understanding the Linux Virtual Memory Manager,
- Understanding the Linux Kernel, chapter 8 "Memory management" (find yourself an online copy).
Don't mistake this for a RTFM answer, Linux VMM *is* interesting but not that easy to explain in a few sentences. At least I can't.

Quote:

Originally Posted by kenneho (Post 3080009)

Mar 6 11:53:23 mercury kernel: Free swap: 0kB

So you ran out of swap. Gotta love Java apps. While Tuning Linux VM on Kernel 2.6 is about Oracle it does ask the question "How to diagnose VM problems?" The TS approach is generic so could help you too.

Quote:

Originally Posted by unSpawn (Post 3081063)

Thanks. Let me study the documentation and get back to this thread with whatever questions I may have.

And thank you for the link regarding diagnosing VM problems.

I'm still reading up on Linux memory management, but would like to post a question that I can't get my head around. So far I've not been able to find the answer to this, and I would be very thankful for help on resolving this:

The first post on this thread shows that the oom killer whacks off Java processes.

To document which processes are hogging the memory I made a simple script that outputs the top five memory consuming processes once the swap usage is close to full. This is an extract of the output:

%MEM PID SZ VSZ COMMAND
19.4 25152 1805468 1827488 /opt/ibm/WebSphere/ProcServer/java/bin/java (...)
7.0 24416 769824 787740 /opt/ibm/WebSphere/ProcServer/java/bin/java (...)
5.2 24845 716952 734868 /opt/ibm/WebSphere/ProcServer/java/bin/java (...)
6.2 18489 463560 481452 /opt/ibm/WebSphere/ProcServer/java/bin/java (...)
3.0 18666 407032 420840 /opt/ibm/WebSphere/ProcServer/java/bin/java (...)

These five processes are the five most memory consuming processes around the time the oom killer starts whacking processes. What puzzles me is that the memory usage of these five processes is merely around 40%. There are not enough remaining processes to fill up the remaining memory. So why does oom killer thinks I'm out of memory?

I should add that the servers is configured with 1200 MBs of swap space.

I don't have the answer to this. All I can offer for consideration are some things to look into.

Wrt the system: does HW/SW meet or exceed the specs (what are your specs?) Websphere suggests it needs? (And you should not view swap as something good: disk I/O is expensive.) Is this a production machine (that is, do you also have a staging box to test releases on)? When did the OOM situation start? What changed at that time? Does this happen with only this or other kernels as well? Is the system tuned to run for this task? Does the system run only this task, or are there other services running that could be expensive in other ways (CPU, disk I/O)? Have you gathered stats for plotting (Dstat, SAR)? Wrt Websphere: which product(s) are you running? Is it started with heap restrictions (-Xms -Xmx)? Did the OOM situation occur right from the start after installation or later on, suddenly (say after another application release)? Do WAS logs show errors before OOM is invoked? Wrt the application(s): has there been any major changes in code which also mark the start of OOM? Was there any profiling done? 'top' can help in some situations (like to find out if you've got much swap going on if RSS is a fraction of VSZ) but Java apps do memory management differently and what you want is to read up on Java and profiling. If you only have to deploy code and manage the server the developers should (be forced to :-] ) do the grunt work for that. Finally, since Websphere has a large community, did you check their bug tracker and community resources for clues?

// (Posted here so I don't fsck up your other thread's -reply status) ... if http://www.linuxquestions.org/questi...memory-626965/ is related to all of this you should have posted your specs there saying it's a Java-based app. Postponing launch until you or the developers get a grip on problems seems only common sense to me.

Quote:

Originally Posted by unSpawn (Post 3083898)

The threads are not directly related. And I should add that the servers I'm referring to are merely servers used for development - they are not production servers.

Thank you for your useful thoughts in your previous post here on the thread. I've not yet had the time to carefully study the issues you address, but intend to do so asap.

Not production. Cool. Well, just post info as you go but if you can please try to work in the direction of HW -> OS -> SW -> application(s).