What metrics is oom-killer using to determine memory usage in Cgroup

robson007 · 03-25-2023, 02:39 PM

I am trying to find a metrics that represents memory usage logged in syslog when container reaches the threshold and gets killed.

This is the message I refer to:
Nov 6 10:16:24 pool-a53hsbota-7h3co kernel: [2111341.288726] memory: usage 524288kB, limit 524288kB, failcnt 118
Nov 6 10:16:24 pool-a53hsbota-7h3co kernel: [2111341.289672] memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0
Nov 6 10:16:24 pool-a53hsbota-7h3co kernel: [2111341.298582] kmem: usage 5800kB, limit 9007199254740988kB, failcnt 0

I tried to collect different matrices using Prometheus and compare their values to the value in the log, but I couldn't find a metrics or combination of matrices which represents the same value as the one logged at that point of time.

I tried:
- /sys/fs/cgroup/memory/kubepods/burstable/<pod>/<container>/memory.stat
- ps command

All what I am trying to do is to show in Grafana using proper metrics that memory usage for the container grew and when it reached the limit, the container was killed.

Your help here is highly appreciated.

zeebra · 04-03-2023, 07:23 PM

Does it use cgroup info?

Have a look at this nice oom killer article:
https://www.baeldung.com/linux/memor...ent-oom-killer

syg00 · 04-03-2023, 08:27 PM

If using croup2, add this to your reading list. Yes I know it says facebook, but those folks did all the work for PSI then released it for public consumption.
Go get a beverage of choice before starting.

robson007 · 04-22-2023, 12:31 PM

Quote:

Originally Posted by zeebra

Does it use cgroup info?

Have a look at this nice oom killer article:
https://www.baeldung.com/linux/memor...ent-oom-killer

Thank you for sharing the article. My problem is not understanding of OOM, but showing that the application misbehave. I know for the fact that application is being killed by OOM when it reaches its limit of 512MB. I am looking for metrics which clearly shows that consumed memory reached that limit value at that moment. Right now when I check cgroup momory used at the time of OOM invocation, it shows only 100MB, which is very far from the actual limit.
Basically I would like to shows on some chart, that used memory was rising and just before the invocation of OOM the consumed memory was close to the limit. I cannot find a single metrics which would show it.

robson007 · 04-22-2023, 12:35 PM

Quote:

Originally Posted by syg00

If using croup2, add this to your reading list. Yes I know it says facebook, but those folks did all the work for PSI then released it for public consumption.
Go get a beverage of choice before starting.

I briefly scanned through the page and haven't found what I am looking for, but I will read it in more details to see if it gives me the information I am looking for.
Thank you for sharing

zeebra · 04-23-2023, 06:29 AM

Quote:

Originally Posted by robson007

Thank you for sharing the article. My problem is not understanding of OOM, but showing that the application misbehave. I know for the fact that application is being killed by OOM when it reaches its limit of 512MB. I am looking for metrics which clearly shows that consumed memory reached that limit value at that moment. Right now when I check cgroup momory used at the time of OOM invocation, it shows only 100MB, which is very far from the actual limit.
Basically I would like to shows on some chart, that used memory was rising and just before the invocation of OOM the consumed memory was close to the limit. I cannot find a single metrics which would show it.

Well, that article does mention the exact data sources used for OOM killer, which is why I put it there in the first place.

After answering last time, I remember reading up on some specific cgroup "issues" with OOM killer, but most of it was how to solve it by splitting cgroup memory so it doesn't kill a full "container" when it reaches the limit, but rather killing the high memory consumer inside the container before OOM killer kills the container.

robson007 · 04-26-2023, 11:58 AM

Quote:

Originally Posted by zeebra

Well, that article does mention the exact data sources used for OOM killer, which is why I put it there in the first place.

After answering last time, I remember reading up on some specific cgroup "issues" with OOM killer, but most of it was how to solve it by splitting cgroup memory so it doesn't kill a full "container" when it reaches the limit, but rather killing the high memory consumer inside the container before OOM killer kills the container.

The article you are referencing sounds very interesting and could be very helpful in solving the problem I work on. Any chance that you would find a link to that article?