What is using (kernel) memory?
Hi everyone,
I am facing some memory starvation issues on a server and trying to figure out memory but there is a blank spot in my inderstanding regarding kernel memory usage. I an sentence, I need to find out what is using kernel memory in my server and seems to be unaccounted by any tools. More specifically: I am running an AWS EC2 instance with the following specs: t3.medium instance with 4GB RAM and OS: Debian GNU/Linux 9.5 (stretch) with kernel: 4.9.0-7-amd64. The problem is that without any application load on the server, there is steady memory usage increase, while at the same time the application memory usage remains stable and the increase is observed in kernel memory usage. However this kernel memory usage does not match the output from different diagnostic tools. We need to find what is using kernel memory. Here are 2 charts, the 1st showing how the total application memory remains stable, and the 2nd with the zoom level, shows how the memory avaialble decreases without any other metric rising (as application memory shown stable in 1st chart as well): 1 - Overview: https://i.stack.imgur.com/dernG.png 2 - Available memory decrease: https://i.stack.imgur.com/BJXwb.png More specifically: Our application memory usage is around 2.7 - 2.8GB as all tools agree (when adding the memory in top or ps aux as well as info in vmstat and /proc/meminfo): vmstat: Code:
3895 M total memory Code:
MemTotal: 3989436 kB Code:
top - 11:51:45 up 5 days, 15:16, 1 user, load average: 0.36, 0.56, 0.78 Code:
used = app_noncache + kernel_noncache Code:
Active / Total Objects (% used) : 494415 / 704589 (70.2%) Code:
total used free shared buff/cache available Code:
Area Used Cache Noncache We seem to have a rough 450MB unaccounted kernel memory usage. So what is using our kernel memory? |
I see no evidence to support your supposition that it is a kernel issue - maybe try kmemleak. Designed for the task, and all you have to do is turn it on.
Never used it - let it know how it goes. |
If you can take a look at the pictures I have attached as links, you will be able to see that the available memory is continuously decreasing, while other memory metrics remain stable.
Adding up all other memory metrics reported does not total the memory consumption reported by free. Also the smem output I provided showing 628MB kernel non-cache memory, is the final state. It starts at 180MB and without any usage on the (test) server it goes to 628 level overnight. Also slabtop which is suppossed to show a number close to kernel memory consumption is really lower than the reported noncache memory of smem. That is why I am suggesting kernel memory usage. I have also ready about the kmemleak option, but unfortunately it is not a single turn on option, it needs to re-configure and re-compile the kernel which unfortunately on the specific machines is a bit of trouble to do so in our current setup, so I was hoping for a more "compile-free" :D solution or hint. I will eventually fallback to this option though if not other suggestion is made. Thanks |
Quote:
Or do you mean (by final state) it will not eat up more? probably you will find it interesting: www.linuxatemyram.com |
0.5GB was lost in a day but then it seems to stabilize when around 100-200 mb are left only as available memory.
However this is non-cache memory which we are talking about so it cannot be freed on demand. So we have really low memory available on the server that cannot cover any memory peaks from the running processes. I have also seen linuxatemyram.com but unfortunately it mostly talks about caches. In our case dropping the cache only frees around 10mb of RAM |
I would try to stop/kill dockerd, containerd, kubelet (and probably others) or reboot without them to see if that makes any differences.
|
I have actually found the cause of the problem. It is caused by a TCP communication channel towards a FluentD daemon running.
However my question is mostly targeted, on how would I be able to find where this kernel memory is used with some kind of tool and not just by experimenting as I already did. In addition I have also captured some graphs showing that the continuous decrease of available memory in the system (remember the application memory remains stable), is completely in par with the trend of a TCP memory increase. However they only match in trend since the absolute numbers of TCP usage are really lower. (800MB missing from memory vs 30mb memory increase in TCP) Available Memory decrease: https://pasteboard.co/JESysQ6.png TCP Memory increase: https://pasteboard.co/JESyPPp.png TCP memory is calculated through /proc/net/sockstat where total_memory = mem * 4k (since the number in sockstat of mem is number of memory pages) |
I guess I too had similar issue,
You check once. This link was really helpful (check for Memory black hole). https://titanwolf.org/Network/Articl...2954#gsc.tab=0 Memory allocation via alloc_pages() without proper book-keeping could lead to such situation where we cannot track memory usage from /proc/meminfo dump. |
Hi @sanjibdas. Thank you for your response.
Unfortunately the link for the article you provided is not working. Could you please check it? Thanks! |
|
I see....
It is the most reasonable explanation I have found so far... It actually strikes me that I hadn't previously found this article since I had read all I could find about /proc/meminfo. Really helpful and to the point explanations Thanks sanjibdas for the good reference! |
Just for reference: Mysteries of /proc/meminfo (the Chinese original of the article at titanwolf.org).
|
All times are GMT -5. The time now is 10:25 PM. |