Linux using almost all swap memory when much physical mem is available

exceed1 · 02-05-2011, 11:56 AM

i read some articles and some mailing-list entries concerning the overcomittment issues in linux and this feature of the kernel doesnt look very good for a busy production server, seems very likely that it should be turned off... i understand that it might be usefull sometimes, but what you gain of turning it off seems to outweight the advantages of having it on.

i see some places that they say that the oom killer starts when there is no more swap memory available, but im almost 100% that i have seen servers that has used all available swap memory and has at least 0,5 gig of physical memory left... in this situation, what happens if the linux memory management tries to swap anything out to disk?

johnsfine · 02-05-2011, 05:43 PM

Quote:

Originally Posted by exceed1

i read some articles and some mailing-list entries concerning the overcomittment issues in linux and this feature of the kernel doesnt look very good for a busy production server, seems very likely that it should be turned off...

I have seen a lot of misinformation about overcommit. I think you read some of that and reached the incorrect conclusion pointed to by that misinformation.

The default overcommit settings are good. Don't mess with them without a much better understanding than you have (but with a better understanding you'd still decide not to mess with them).

Quote:

what you gain of turning it off seems to outweight the advantages of having it on.

Turning overcommit off will make your system need even more swap space than it now needs, and I think it now needs more than it has.

Quote:

i see some places that they say that the oom killer starts when there is no more swap memory available,

You took that statement out of context. It does not apply to your reported situation.

Quote:

im almost 100% that i have seen servers that has used all available swap memory and has at least 0,5 gig of physical memory left... in this situation, what happens if the linux memory management tries to swap anything out to disk?

Linux won't try to swap, because it understands swap is full. So it will reduce cache instead of swapping. Reducing cache will likely create more extra disk accesses that swapping would have, so the system runs slower. But at that point nothing horrible happens.

The problem occurs when several processes have over requested unused ram adding up to more than the cache size, and swap is full and a process asks to overc request more memory. Then the request fails and probably that service fails.

Processes will request more memory than they use regardless of how you set the kernel's overcommit rules. The kernel's rules only determine the point at which the kernel makes memory requests fail.

You have a system with a heavy memory load relative to physical ram. A strong indicator of that is the used size of swap is larger than the cache size in physical ram.

With default overcommit settings, with a heavy memory load, the minimum safe amount of free swap space is the largest amount of memory over requested by any single process. Without digging into the smaps details of your processes, I'll just guess that is hundreds of MB.

If you turned overcommit off, then the cache size plus the free swap size should be larger than the total of overc requested memory of all process, which I expect is several GB.

exceed1 · 02-06-2011, 07:46 AM

Okey, so the linux kernel doesnt do any swapping if it runs out of physical RAM and it just continues to use available physical memory until it runs out (by for example clearing the oldest data in the cache).
The rest of what you are saying seems to be based on preferences. The articles i read was on official mailing list archives where people really seemed to know what they were talking about, seemed like developers, i also read on the linux-mm.org site and in the official linux kernel documenation tree which explains what linux overcommit is about and how it can be tuned. From my point of view (taken from the official documenation, mailing archives and several other sites on the net) is that linux memory overcommit is just a feature the linux kernel has that lets the application require more memory than they can because the kernel makes the assumtions about that the process wont use all of it anyway and if they do they will eventually be killed of if there are no more memory left when they try to use the allocated memory space. If you for example have an apache server with a bunch of perl scripts running in mod_perl (which really should be running in mod_fastcgi or mod_fcgi) they are loading an entire perl runtime every time a request is received and they probably use all memory set aside by them, so if you set up your apache servet to receive more requests than it has memory for in this way you would get out of memory errors (which i have also seen several times).

No only did i read the documentation about it but i was looking at several example using C code to see what the kernel did in each situation with the different programs when overcommit was in place. Turning off linux memory overcommit has been present in the linux kernel from the 2.5.x version and up, but some speculate about how if it really works like it should.

Please see the following links:
http://www.mail-archive.com/kplug-li.../msg25350.html
http://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6

Based on that, if ytou would turn off linux overcommit, you would never have the oom killer kill off random process, only for example apache saying that you either needed to tune MaxServer etc. Another example if you take desktop usage is if you try to start many resource-demanding programs they will get the memory from the malloc call but wont crash before they actually try to use memory that arent available, an example of such an example that crashes before of overcommit is a sound utility in the GNU utilities pacakge, its even listed in the FAQ that this might be a problem because of that.

Please comment on what i said and take into account for example the mod_perl/apache and desktop examples in addition to the discussion on the mailing list and explain why it still isnt a good thing to turn off overcommit.

johnsfine · 02-06-2011, 08:35 AM

Quote:

Originally Posted by exceed1

Please comment on what i said and take into account for example the mod_perl/apache and desktop examples in addition to the discussion on the mailing list and explain why it still isnt a good thing to turn off overcommit.

I'm not sure you are willing to understand what I'm saying here. The information you found online obviously looks more authoritative. But combining the misleading info there with some of your own misunderstanding, I think you have reached some very incorrect conclusions.

I will comment on a few phrases in that document you linked:

Quote:

Normally, a user-space program reserves (virtual) memory by calling malloc().

Misleading because it ignores the important two level allocation process. Ordinary code requests memory from malloc, but that memory comes from a pool inside the process. When that pool is exhausted, malloc requests memory in very large chunks from the kernel.

Quote:

If the return value is NULL, the program knows that no more memory is available, and can do something appropriate.

True, but irrelevant because...

Quote:

Most programs will print an error message and exit

Most programs simply crash without even an understandable error message when memory allocations fail.

Quote:

Linux on the other hand is seriously broken. It will by default answer "yes" to most requests for memory, in the hope that programs ask for more than they actually need.

Linux is doing the right thing answering "yes" to most requests for memory, not in the "hope" that programs request more than they will ever use, but because of the reality that almost every program requests far more than it will ever use.

Quote:

if not then very bad things happen. What happens is that the OOM killer (OOM = out-of-memory) is invoked, and it will select some process and kill it.

That is true. There are too different ways a service might fail due to lack of system memory:
1) The service requests additional memory (that it probably won't use) and the kernel refuses the request causing that service to crash.
2) The service earlier requested memory (which didn't really allocate any). Now the service finally uses that memory, which forces the kernel to really allocate some. The kernel would swap something, but swap is full. So the kernel would reduce cache, but cache is too small. So the OOM killer is triggered and kills some process, which might or might not be the process trying to use more memory.

Quote:

Of course, the very existence of an OOM killer is a bug

There is the theory that failure type (1) is better. In failure type 1 you can claim the OS did everything right. The "fault" is entirely in the service for not responding more gracefully when the memory request failed. But in failure (2) all the services are helpless. It is all in the hands of the system.
So you can disable overcommit to vastly increase the frequency of failure (1) in order to reduce failure (2) from rare down to never. If you believe failure (2) is unacceptable and failure (1) is normal, that's a good idea.

That is all interesting in theory, but in practice what you care about is whether your server fails or not. With default overcommit settings failure (1) is slightly more likely than failure (2) and in total both may be unlikely. Without overcommit (and with the same size swap) you vastly increase the number of failures for the dubious benefit of making them all failure (1).

If you don't want either of these types of failure, make your swap area large enough.

Earlier you posted

Code:

CommitLimit:   3670008 kB
Committed_AS:  4727640 kB

Those values don't have meanings as simple as the names indicate, but they do show what would happen if you ran this same workload with overcommit turned off. You would fail memory allocation requests long before reaching the level at which you posted that info. At minimum, your swap area would need to be 1.1GB larger to have even a chance of running this workload with overcommit off. Then it is only a chance. These numbers show that a 1GB increase in swap wouldn't be enough. They don't in any way say a 1.1GB increase would be enough.

Once you have enough swap, there is no reason to mess with overcommit settings.

At best, more conservative overcommit settings create failure type 1 to avoid failure type 2. Enough swap space avoids both failure types.

Quote:

Originally Posted by exceed1

If you for example have an apache server with a bunch of perl scripts running in mod_perl (which really should be running in mod_fastcgi or mod_fcgi) they are loading an entire perl runtime every time a request is received and they probably use all memory set aside by them, so if you set up your apache servet to receive more requests than it has memory for in this way you would get out of memory errors (which i have also seen several times).

I think you're still missing the very important difference between anonymous memory and non anonymous memory.

I'm not 100% sure what you mean by "loading an entire perl runtime", but I'm pretty sure that is mainly non anonymous memory, probably even shared.

Non anonymous memory doesn't use any swap space and has only an indirect impact on all the issues surrounding overcommit. You can be using ten times the amount of non anonymous memory as you have ram, and still use no swap space, and still have no failures due to lack of memory.

If you have seen "out of memory" errors, I'm nearly sure they occurred because swap space was full when some process asked for additional anonymous memory. That is what I was talking about earlier in this thread when I said it wasn't safe to run a heavily memory loaded system with swap space full, even with half a GB of cache. For most purposes, that half a GB of cache acts like free memory. But not for all purposes, and even with default overcommit you can easily fail moderate size memory requests despite half a GB of cache that the OS knows is almost equivalent to free.