[SOLVED] (Almost) Complete System Freeze upon Heavy Swap Usage

MadMartian · 06-11-2021, 07:54 PM

Quote:

Originally Posted by pan64

Again, please read www.linuxatemyram.com (and also there is a link at the bottom - how can I verify - for additional info) if you wish to understand how is it working. Don't assume anything.

Duly noted, thanks for the background info on this.

Quote:

Originally Posted by syg00

Seems to me the OP has a pretty good handle on the overall theory.
What does /proc/meminfo show over time - say every 15 minutes or so ?.

I think I will take some time to put this in a cron job, parse this into a database table, and then that will allow me to plot some pretty graphs. I'll get back to this thread with results from that when I have them.

Quote:

Originally Posted by syg00

I skimmed over this earlier.
Java is awful, people that code in it are co-erced into lazy habits. Bad combination.
If you can get rid of those tasks altogether, the buy-back may be better than you hope.

I bet this is because the JVM manages memory on its own and does not leave block management up to the kernel, garbage collection cuts both ways. Ironically, I found I get better performance from my Minikube by limiting its memory consumption.

pan64 · 06-12-2021, 03:28 AM

jvm itself manages only its own memory (as any other process), jvm cannot manage swap, that is made by kernel only.

MadeInGermany · 06-12-2021, 07:00 AM

List the top 10 RSS (resident RAM) consumers:

Code:

ps -eo pid,user,rss,vsz,args | sort -k3,3n |tail

Run this every couple of minutes.
Which processes have growing memory?

jefro · 06-16-2021, 02:53 PM

I see developer can easily max out ram. I believe that.
One might be able to configure ram in ways to assist how it is used. The ways one can set ram are almost unlimited.
Some situations do not do well with going past 75% swap no matter how fast or how much.
You can't rely on swap to get out out of limited ram.

Yes, if the above members are correct, you could have some leaks or other memory problems to locate.

MadMartian · 07-09-2021, 08:46 PM

I ran into a couple more freezes, this time I had periodic memory data (dumping /proc/meminfo to a Postgres database table every 5 minutes). However, there doesn't appear to be anything out of the ordinary. I've plotted the memory information below right-up until the freeze occurred. Another time a couple weeks later it froze attempting to resume a Windows VM from suspend, may or may not be memory related however.

Swap Monitoring Plot over the course of a day (SVG format)

ondoho · 07-10-2021, 02:52 PM

Please replace oversized inline image with link.

jamison20000e · 07-10-2021, 07:23 PM

Just leave it as is and delete the image tags, then it'll show up as a thumb nail.

Many other kinds of images and I'll normally disagre. Especially on a screenshot page? Just pinch your fingers or learn how to use your screen and or software Ondoho!

jamison20000e · 07-10-2021, 07:24 PM

Heck even here at looks fine and usable; an image is an image, page a page...

rnturn · 07-11-2021, 02:17 PM

Quote:

Originally Posted by MadMartian

Overview & Symptoms

My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily. Everything freezes including the mouse and keyboard with a few exceptions:

The hard drive light appears to indicate some background activity
The fan sometimes spins up and down indicating some CPU activity
"nmap -sT" (TCP handshake) from another machine reveals open ports indicating that the NIC is responding at the OSI transport layer

Nothing is logged indicating what causes this.

On one rare occasion I remember the mouse was able to move a bit after about a minute or two of the system being frozen. This issue does not appear to occur whenever there is plenty of free RAM available, it only seems to occur when the swap partition starts experiencing significant load.

Here is the output of "free" that indicates free RAM and swap storage, right now there is mild swap usage. This is typically entering the danger zone where the system would freeze, although I've witnessed up to 12MB of swap used without an issue.

Total RAM: 32GB
Total Swap: 24GB

Code:

              total        used        free      shared  buff/cache   available
Mem:           31Gi        26Gi       1.7Gi       1.4Gi       3.5Gi       3.5Gi
Swap:          22Gi       3.9Gi        18Gi

What Might be Causing it

I've had this machine for 5 years, but this behaviour started occurring within the past year since the following changes:

Upgraded the processor from Intel i5 to Intel Core i7 4790K
Upgraded my GPU from an Asus 960 GTX to an EVGA 2070 RTX

I'm gussing that updating the CPU and GPU has encouraged you run bigger and more complex processes than you might have pre-upgrade. The trouble is that now your system has gone from CPU-bound to memory-bound (and since the CPU upgrade, you may be reaching this state more quickly than before). My take on swap is to have at least as much as you have physical RAM. I'd even double what you currently have allocated, especially if you have a second hard disk. (I had a fair amount of experience administering Oracle database systems and they recommended 4X. I thought it was excessive but we never had problems with memory/swap exhaustion. Heck, disk space is relatively cheap nowadays so why not?.) Then, especially if they're the same size, I'd change the priorities on the swap partitions to be the same

Code:

LABEL=swap0   swap   swap   pri=42   0 0
LABEL=swap1   swap   swap   pri=42   0 0

so that Linux uses them in a round-robin fashion and swap activity doesn't overwhelm a single disk.

If/when you're building a new system or doing a wholesale structuring of your disks, consider placing swap partitions in the middle of the disks to reduce head seek time and squeeze every bit of performance you can out of the hardware. (This presumes that you're still using traditional hard disks. Solid state "drives"? Don't use them but I can't see swap partition placement helping much in that case.)

Hope this helps a bit...

MadMartian · 09-20-2021, 08:33 PM

Update on this (at the risk of alerting the necro-posting police), I activated another swap partition on one of my SSD drives amounting to a total of 34GB of swap storage and I am experiencing considerably better stability. With this setup, and while it is arguably strange, I was utilizing 16GB of swap recently, which has never happened before without a system freeze. Now I think this is significant because of two potential causes:
1. My total RAM is 32GB and my swap was previously 22GB but now a combined 34GB with an additional swap partition on one of my SSDs
1. I have configured both swap partitions with the fstab option of discard = once

Presently I am using 12GB swap, and that was after I deliberately put additional memory load on the system that would typically cause my PC to freeze, but this time it did not freeze. Although I have an additional swap partition configured on an SSD I notice that 0 bytes are utilized right as I have configured it as the lower priority swap partition of the two (for now).

I am almost completely certain my system would have frozen by now if not for this new swap configuration, so I felt it prudent to post this update.

In the case that swap size is the determining factor then should I submit a bug to kernel maintainers indicating that there may be a bug related to the ratio of total RAM to total swap storage? AFAIK configuring swap storage at least as much as the amount of RAM on a system was always a recommendation, but based on my experience here and the fact that it has taken me a year to get anywhere troubleshooting this issue I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.

In the case that the discard = once is the determining factor, then I don't understand enough about how this works to comment further. AFAIK the other two options are async and both, the default setting being both.

jefro · 09-20-2021, 09:57 PM

I had to look that up. Seems some folks suggest one might look at trim.

syg00 · 09-20-2021, 10:20 PM

Quote:

Originally Posted by MadMartian

... I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.

The kernel devs will not countenance that. A single user having a glitch doesn't dictate policy. Plenty of users run without swap at all, and plenty more (me included) just allocate a nominal amount because it may be useful one day.

For the longest time the recommendation was for swap to be twice the installed RAM, but with big systems these days that just became ridiculous, so it's largely ignored.

pan64 · 09-21-2021, 12:15 AM

probably this helps to understand it better: https://chrisdown.name/2018/01/02/in...e-of-swap.html

Quote:

Originally Posted by MadMartian

In the case that swap size is the determining factor then should I submit a bug to kernel maintainers indicating that there may be a bug related to the ratio of total RAM to total swap storage? AFAIK configuring swap storage at least as much as the amount of RAM on a system was always a recommendation, but based on my experience here and the fact that it has taken me a year to get anywhere troubleshooting this issue I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.

I have to say it is a wrong approach. You need to know how [do you want] to use your system and you need to know if that was configured properly. There is no way [using a general installation] to take every and each requirement into account.
From the other hand every admin should (and should be able to) fine tune his/her managed hosts.

syg00 · 09-21-2021, 02:14 AM

Nice link.

ondoho · 09-23-2021, 12:49 AM

Quote:

Originally Posted by syg00

The kernel devs will not countenance that. A single user having a glitch doesn't dictate policy. Plenty of users run without swap at all, and plenty more (me included) just allocate a nominal amount because it may be useful one day.

For the longest time the recommendation was for swap to be twice the installed RAM, but with big systems these days that just became ridiculous, so it's largely ignored.

The kernel devs have nothing to do with the size of allocated swap space anyhow.

As to how the kernel handles swapping, these things are configurable. It's hard to believe vm.swappiness hasn't been mentioned yet.