Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am facing some memory starvation issues on a server and trying to figure out memory but there is a blank spot in my inderstanding regarding kernel memory usage. I an sentence, I need to find out what is using kernel memory in my server and seems to be unaccounted by any tools. More specifically:
I am running an AWS EC2 instance with the following specs: t3.medium instance with 4GB RAM and OS: Debian GNU/Linux 9.5 (stretch) with kernel: 4.9.0-7-amd64.
The problem is that without any application load on the server, there is steady memory usage increase, while at the same time the application memory usage remains stable and the increase is observed in kernel memory usage. However this kernel memory usage does not match the output from different diagnostic tools. We need to find what is using kernel memory.
Here are 2 charts, the 1st showing how the total application memory remains stable, and the 2nd with the zoom level, shows how the memory avaialble decreases without any other metric rising (as application memory shown stable in 1st chart as well):
Our application memory usage is around 2.7 - 2.8GB as all tools agree (when adding the memory in top or ps aux as well as info in vmstat and /proc/meminfo):
vmstat:
Code:
3895 M total memory
3288 M used memory
2830 M active memory
102 M inactive memory
239 M free memory
14 M buffer memory
353 M swap cache
0 M total swap
0 M used swap
0 M free swap
9421426 non-nice user cpu ticks
33553 nice user cpu ticks
6918382 system cpu ticks
77883981 idle cpu ticks
388912 IO-wait cpu ticks
0 IRQ cpu ticks
278102 softirq cpu ticks
2354770 stolen cpu ticks
860794765 pages paged in
53978341 pages paged out
0 pages swapped in
0 pages swapped out
923803588 interrupts
1521737988 CPU context switches
1606336490 boot time
3042098 forks
According to slabtop our justified kernel memory usage is around 160MB. However according to smem and free(which shows the available memory being quite less that it should be taking into account), the correct calculation should be:
I see no evidence to support your supposition that it is a kernel issue - maybe try kmemleak. Designed for the task, and all you have to do is turn it on.
If you can take a look at the pictures I have attached as links, you will be able to see that the available memory is continuously decreasing, while other memory metrics remain stable.
Adding up all other memory metrics reported does not total the memory consumption reported by free.
Also the smem output I provided showing 628MB kernel non-cache memory, is the final state. It starts at 180MB and without any usage on the (test) server it goes to 628 level overnight.
Also slabtop which is suppossed to show a number close to kernel memory consumption is really lower than the reported noncache memory of smem.
That is why I am suggesting kernel memory usage.
I have also ready about the kmemleak option, but unfortunately it is not a single turn on option, it needs to re-configure and re-compile the kernel which unfortunately on the specific machines is a bit of trouble to do so in our current setup, so I was hoping for a more "compile-free" solution or hint.
I will eventually fallback to this option though if not other suggestion is made.
0.5GB was lost in a day but then it seems to stabilize when around 100-200 mb are left only as available memory.
However this is non-cache memory which we are talking about so it cannot be freed on demand.
So we have really low memory available on the server that cannot cover any memory peaks from the running processes.
I have also seen linuxatemyram.com but unfortunately it mostly talks about caches.
In our case dropping the cache only frees around 10mb of RAM
I have actually found the cause of the problem. It is caused by a TCP communication channel towards a FluentD daemon running.
However my question is mostly targeted, on how would I be able to find where this kernel memory is used with some kind of tool and not just by experimenting as I already did.
In addition I have also captured some graphs showing that the continuous decrease of available memory in the system (remember the application memory remains stable), is completely in par with the trend of a TCP memory increase. However they only match in trend since the absolute numbers of TCP usage are really lower. (800MB missing from memory vs 30mb memory increase in TCP)
Memory allocation via alloc_pages() without proper book-keeping could lead to such situation where we cannot track
memory usage from /proc/meminfo dump.
It is the most reasonable explanation I have found so far...
It actually strikes me that I hadn't previously found this article since I had read all I could find about /proc/meminfo.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.