||08-02-2012 05:22 AM
Sygfault: overcommit_memory problem on database server
After configuring new database (postgresql) server on Centos 6.3, we got a problem, database randomly stops with "Out of memory" error and such event are written in to the messages log:
Aug 1 09:48:20 dbsrv kernel: postmaster: segfault at 7600000091 ip 00000000006e34a0 sp 00007fff896a8ca0 error 6 in postgres[400000+4e6000]
Aug 1 10:31:19 dbsrv kernel: postmaster: segfault at 140000002c ip 00000000006e34a0 sp 00007fffe2189b00 error 6 in postgres[400000+4e6000]
Aug 1 10:35:31 dbsrv kernel: postmaster: segfault at 3bc6299690 ip 000000000062e103 sp 00007fffe218a750 error 4 in postgres[400000+4e6000]
Aug 1 11:21:21 dbsrv kernel: postmaster: segfault at 28 ip 00000000006e34a0 sp 00007fff197630f0 error 6 in postgres[400000+4e6000]
Aug 1 12:17:47 dbsrv kernel: postmaster: segfault at 6e ip 000000000046efb7 sp 00007fff34f5f310 error 4 in postgres[400000+4e6000]
Aug 1 13:04:18 dbsrv kernel: php: segfault at 0 ip 00007f5fb157d859 sp 00007fffacc86e00 error 4 in pdo_pgsql.so[7f5fb1577000+8000]
Server has 16GB memory in use and several php services + postgreslq 9.1.4 database installed.
Server kernel custom configuration:
kernel.shmmax = 8363171840
kernel.shmall = 2041790
vm.swappiness = 0
vm.overcommit_memory = 2
As desribed in a postgres manual starting from kernel 2.5, the vm.overcommit_memory = 2 politics is the best for postgresql database.
Uname: Linux dbsrv 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Server have at least 3GB free RAM when apllication crashes and about 8GB of cache. Used RAM is not much than 3-4GB.
In a such configuration database or running services craches randomly. After several hours of studying we discovered that runing RAM consumption applications increase a probability of a crash, tried vim a large file, or memtest 3000. In both cases server craches with segfault.
After reconfiguring kernel to vm.overcommit_memory = 0, problem was solved and no crashed was detected from that moment at all.
The question is, why this is happening? What we have misconfigured and why vm.overcommit_memory = 2 is not working?