Quote:
Originally Posted by exceed1
Please comment on what i said and take into account for example the mod_perl/apache and desktop examples in addition to the discussion on the mailing list and explain why it still isnt a good thing to turn off overcommit.
|
I'm not sure you are willing to understand what I'm saying here. The information you found online obviously looks more authoritative. But combining the misleading info there with some of your own misunderstanding, I think you have reached some very incorrect conclusions.
I will comment on a few phrases in that document you linked:
Quote:
Normally, a user-space program reserves (virtual) memory by calling malloc().
|
Misleading because it ignores the important two level allocation process. Ordinary code requests memory from malloc, but that memory comes from a pool inside the process. When that pool is exhausted, malloc requests memory in very large chunks from the kernel.
Quote:
If the return value is NULL, the program knows that no more memory is available, and can do something appropriate.
|
True, but irrelevant because...
Quote:
Most programs will print an error message and exit
|
Most programs simply crash without even an understandable error message when memory allocations fail.
Quote:
Linux on the other hand is seriously broken. It will by default answer "yes" to most requests for memory, in the hope that programs ask for more than they actually need.
|
Linux is doing the right thing answering "yes" to most requests for memory, not in the "hope" that programs request more than they will ever use, but because of the reality that almost every program requests far more than it will ever use.
Quote:
if not then very bad things happen. What happens is that the OOM killer (OOM = out-of-memory) is invoked, and it will select some process and kill it.
|
That is true. There are too different ways a service might fail due to lack of system memory:
1) The service requests additional memory (that it probably won't use) and the kernel refuses the request causing that service to crash.
2) The service earlier requested memory (which didn't really allocate any). Now the service finally uses that memory, which forces the kernel to really allocate some. The kernel would swap something, but swap is full. So the kernel would reduce cache, but cache is too small. So the OOM killer is triggered and kills some process, which might or might not be the process trying to use more memory.
Quote:
Of course, the very existence of an OOM killer is a bug
|
There is the theory that failure type (1) is better. In failure type 1 you can claim the OS did everything right. The "fault" is entirely in the service for not responding more gracefully when the memory request failed. But in failure (2) all the services are helpless. It is all in the hands of the system.
So you can disable overcommit to vastly increase the frequency of failure (1) in order to reduce failure (2) from rare down to never. If you believe failure (2) is unacceptable and failure (1) is normal, that's a good idea.
That is all interesting in theory, but in practice what you care about is whether your server fails or not. With default overcommit settings failure (1) is slightly more likely than failure (2) and in total both may be unlikely. Without overcommit (and with the same size swap) you vastly increase the number of failures for the dubious benefit of making them all failure (1).
If you don't want either of these types of failure, make your swap area large enough.
Earlier you posted
Code:
CommitLimit: 3670008 kB
Committed_AS: 4727640 kB
Those values don't have meanings as simple as the names indicate, but they do show what would happen if you ran this same workload with overcommit turned off. You would fail memory allocation requests long before reaching the level at which you posted that info. At minimum, your swap area would need to be 1.1GB larger to have even a chance of running this workload with overcommit off. Then it is only a chance. These numbers show that a 1GB increase in swap wouldn't be enough. They don't in any way say a 1.1GB increase would be enough.
Once you have enough swap, there is no reason to mess with overcommit settings.
At best, more conservative overcommit settings create failure type 1 to avoid failure type 2. Enough swap space avoids both failure types.
Quote:
Originally Posted by exceed1
If you for example have an apache server with a bunch of perl scripts running in mod_perl (which really should be running in mod_fastcgi or mod_fcgi) they are loading an entire perl runtime every time a request is received and they probably use all memory set aside by them, so if you set up your apache servet to receive more requests than it has memory for in this way you would get out of memory errors (which i have also seen several times).
|
I think you're still missing the very important difference between anonymous memory and non anonymous memory.
I'm not 100% sure what you mean by "loading an entire perl runtime", but I'm pretty sure that is mainly non anonymous memory, probably even shared.
Non anonymous memory doesn't use any swap space and has only an indirect impact on all the issues surrounding overcommit. You can be using ten times the amount of non anonymous memory as you have ram, and still use no swap space, and still have no failures due to lack of memory.
If you have seen "out of memory" errors, I'm nearly sure they occurred because swap space was full when some process asked for additional anonymous memory. That is what I was talking about earlier in this thread when I said it wasn't safe to run a heavily memory loaded system with swap space full, even with half a GB of cache. For most purposes, that half a GB of cache acts like free memory. But not for all purposes, and even with default overcommit you can easily fail moderate size memory requests despite half a GB of cache that the OS knows is almost equivalent to free.