LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   PAE Kernel (2.6.18) fails to swap with large amounts of physical ram (http://www.linuxquestions.org/questions/linux-server-73/pae-kernel-2-6-18-fails-to-swap-with-large-amounts-of-physical-ram-774265/)

youngdo 12-08-2009 11:48 AM

PAE Kernel (2.6.18) fails to swap with large amounts of physical ram
 
We're load testing some of our larger servers (16GB+ RAM), and when memory starts to run low they are kicking off the oomkiller instead of swapping.

I've checked swapon -s (which says we're using 0 bytes out of 16GB of swap), I've checked swappiness (60), I've tried upping the swap to 32GB, all to no avail.

If we pull some RAM, and configure the box with 8GB of physical ram and 16 (or more) GB of swap, sure enough it dips into it and is more stable than a 16GB box with 16 or 32GB of swap.

Am I missing something? Any help would be appreciated!
--
Doug

johnsfine 12-08-2009 12:02 PM

A PAE kernel with 16GB of ram is at significant risk for exhausting the kernel 1GB virtual memory.

If processes only need 2GB instead of the usual 3GB of user mode virtual address space, there is a kernel build time option to increase kernel virtual memory to 2GB. Maybe you already have that option built in, which would make my guess about kernel virtual memory less likely, but still possible.

There was (I don't know if there still is) a kernel build time option to let the kernel mode and user mode each have their own nearly 4GB of virtual address space (was called "huge mem" in some prior version of Red Hat). Generally that is a bad idea because of extra overhead and driver compatibility issues, etc. But if you have no other choice for fixing a kernel virtual memory limit problem, then it may be the best fall back.

Do you have a good reason for choosing 32 bit PAE rather than 64 bit?

I forget all the various commands to get info about kernel virtual memory to see what limit is built in and how close you are to exhausting it. I really think that is worth checking. If it turns out not to be the cause of your OOM killer, the info from /proc/meminfo, /proc/slabinfo, and other places needed to rule that out might shed light on the real problem.

youngdo 12-08-2009 12:37 PM

I suspect you may be right. I was not involved with the company when the decision against 64 bit was made, and have been handed the job of "make this work". Since it looks like RHEL5/CentOS5 no longer supports the hugemem kernel, we may be sadly out of luck. I'll keep investigating. I'll also get the meminfo and slabinfo and post them.

Thanks!

johnsfine 12-08-2009 01:47 PM

Quote:

Originally Posted by youngdo (Post 3784317)
I was not involved with the company when the decision against 64 bit was made, and have been handed the job of "make this work".

Is the hardware 64 bit (is the lm flag present in /proc/cpuinfo)?

Assuming it is, switching to 64 bit Centos may be your minimum effort path to "make this work", even if there is still some political resistance to that idea.

Since you experimented with pulling ram, I assume you can do some moderately drastic experiments with a server. Assuming you have sane partitioning for a server (moderately small / partition, with the applications and data elsewhere and free space to create an experimental / partition) you should be able to install and test 64 bit Centos using all the existing 32 bit applications and their data.

64 bit Linux is generally good at running 32 bit applications. 64 bit Centos is especially good at running 32 bit applications.

I can think of lots of good reasons for not changing the decision to make the applications 32 bit and for keeping the applications the same architecture across multiple servers of different ram sizes. But there probably are zero good technical reasons for keeping the kernels the same vs. using a 64 bit kernel on systems that seem to need it.

syg00 12-08-2009 04:11 PM

Linus has said he doesn't care to fix any problems with PAE and big memory - it's a kludge.
Lets see /proc/meminfo and /proc/slabinfo. When things are o.k. (say just after all the services have started after boot),and when the OOM_killer kicks in. A screenshot of slabtop when you have the problem would be nice, but that might die. Get the other data first.

chrism01 12-08-2009 06:22 PM

There's a good summary in the RH KB http://kbase.redhat.com/faq/docs/DOC-6571 (no login reqd).
Basically, though I'd go with Post #4
[quote]
64 bit Linux is generally good at running 32 bit applications. 64 bit Centos is especially good at running 32 bit applications.
[/code]
RH/Centos 64 bit is 'multi-lib', so it'll run 32 & 64 bit the same.

kschmitt 12-09-2009 09:03 AM

[QUOTE=chrism01;3784692]There's a good summary in the RH KB http://kbase.redhat.com/faq/docs/DOC-6571 (no login reqd).
Basically, though I'd go with Post #4
Quote:

64 bit Linux is generally good at running 32 bit applications. 64 bit Centos is especially good at running 32 bit applications.
[/code]
RH/Centos 64 bit is 'multi-lib', so it'll run 32 & 64 bit the same.
Sadly, sometimes you're required to run 32bit, or else risk being outside of support for Big Vendor systems, or outside of the comfort range for certain consultants. Ugh.

Issues like that keep me running 32bit systems for some application servers. Especially when it comes to Oracle (apps not DBs). Luckily most of those can be virtualized.

--Kyle


All times are GMT -5. The time now is 07:23 PM.