We have a cluster of servers, the members of which are regularly getting into memory problems. We have tracked the problem down to leaking PF_NETLINK sockets, but are having a hard time taking the diagnosis further.
We would very much appreciate pointers on where to go from here.
The servers are
Linux From Scratch systems with
grsecurity:
Code:
> uname -a
Linux re5ult01 2.4.29 #1 SMP Fri Sep 22 16:59:43 BST 2006 i686 GenuineIntel unknown GNU/Linux
They're running a Ruby on Rails webapp of our own devising on top of MySQL 4.1.12 and Apache 2.2.3.
Over time, memory gradually disappears. We've tracked this down to sockets being allocated by never freed. For example on a server which has been up for approximately 2 days we get the following:
Code:
> vmstat -m | grep sock
sock 151544 151544 1024 37886
> cat /proc/net/netlink | wc
151728 1213811 8876149
> vmstat -m | grep sock
sock 151968 151968 1024 37992
Killing processes has no effect on the number of extant sockets, the only way we have found to free the memory is to reboot the machine.
Sockets are leaking at a rate of approximately one per second:
Code:
> date && vmstat -m | grep sock
Fri Nov 17 17:36:53 GMT 2006
sock 151876 151876 1024 37969
> date && vmstat -m | grep sock
Fri Nov 17 17:36:58 GMT 2006
sock 151880 151880 1024 37970
What we don't know how to do is track down where these sockets are being allocated and why they might not be being freed. Suggestions and advice very welcome!
Thanks in advance,
Paul