Hi,
Its a long reply
System had 6G ram initially when the issue was reported.
Swap was reaching almost 100% in 1 to 2 weeks time. System still had 1G buffer/cache.
The problem here is that there is no single process occupying high memory.
If you run top and sort based on Memory or swap, you will find 5-6 processes using more than 10-15% of the total memory/swap.
This rules out the possibility of memory leak.
And another interesting thing is that this issue happens for multiple nodes in 2 different clusters (running almost same list of applications.)
This was a clear sign that system was running out of memory and swap.
So we increased memory from 6G to 8G hoping that this would solve the issue.
But it didn't.
swap memory usage still rose to more than 90% in 10 days.
The buffer/cache memory which previously used to be 1G, now became 3G
(The behavior was not same in all the nodes. For some nodes it didn't even go beyond 50%).
The only logical justification I could find is that system really needed more swap memory.
Bcoz what I understand is that processes/pages which were swapped out once, were never removed from swap memory.
Unless you manually clear it or the application is killed/stopped.
But somewhere I read that even if a process is killed, system won't really show this space as free bcoz linux uses Lazy memory management as removing page table entries frequently is an expensive process. Until some apps request for more memory.
Any more inputs appreciated.