[SOLVED] (Almost) Complete System Freeze upon Heavy Swap Usage
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Again, please read www.linuxatemyram.com (and also there is a link at the bottom - how can I verify - for additional info) if you wish to understand how is it working. Don't assume anything.
Duly noted, thanks for the background info on this.
Quote:
Originally Posted by syg00
Seems to me the OP has a pretty good handle on the overall theory.
What does /proc/meminfo show over time - say every 15 minutes or so ?.
I think I will take some time to put this in a cron job, parse this into a database table, and then that will allow me to plot some pretty graphs. I'll get back to this thread with results from that when I have them.
Quote:
Originally Posted by syg00
I skimmed over this earlier.
Java is awful, people that code in it are co-erced into lazy habits. Bad combination.
If you can get rid of those tasks altogether, the buy-back may be better than you hope.
I bet this is because the JVM manages memory on its own and does not leave block management up to the kernel, garbage collection cuts both ways. Ironically, I found I get better performance from my Minikube by limiting its memory consumption.
I see developer can easily max out ram. I believe that.
One might be able to configure ram in ways to assist how it is used. The ways one can set ram are almost unlimited.
Some situations do not do well with going past 75% swap no matter how fast or how much.
You can't rely on swap to get out out of limited ram.
Yes, if the above members are correct, you could have some leaks or other memory problems to locate.
I ran into a couple more freezes, this time I had periodic memory data (dumping /proc/meminfo to a Postgres database table every 5 minutes). However, there doesn't appear to be anything out of the ordinary. I've plotted the memory information below right-up until the freeze occurred. Another time a couple weeks later it froze attempting to resume a Windows VM from suspend, may or may not be memory related however.
Just leave it as is and delete the image tags, then it'll show up as a thumb nail.
Many other kinds of images and I'll normally disagre. Especially on a screenshot page? Just pinch your fingers or learn how to use your screen and or software Ondoho!
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801
Rep:
Quote:
Originally Posted by MadMartian
Overview & Symptoms
My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily. Everything freezes including the mouse and keyboard with a few exceptions:
The hard drive light appears to indicate some background activity
The fan sometimes spins up and down indicating some CPU activity
"nmap -sT" (TCP handshake) from another machine reveals open ports indicating that the NIC is responding at the OSI transport layer
Nothing is logged indicating what causes this.
On one rare occasion I remember the mouse was able to move a bit after about a minute or two of the system being frozen. This issue does not appear to occur whenever there is plenty of free RAM available, it only seems to occur when the swap partition starts experiencing significant load.
Here is the output of "free" that indicates free RAM and swap storage, right now there is mild swap usage. This is typically entering the danger zone where the system would freeze, although I've witnessed up to 12MB of swap used without an issue.
Total RAM: 32GB Total Swap: 24GB
Code:
total used free shared buff/cache available
Mem: 31Gi 26Gi 1.7Gi 1.4Gi 3.5Gi 3.5Gi
Swap: 22Gi 3.9Gi 18Gi
What Might be Causing it
I've had this machine for 5 years, but this behaviour started occurring within the past year since the following changes:
Upgraded the processor from Intel i5 to Intel Core i7 4790K
Upgraded my GPU from an Asus 960 GTX to an EVGA 2070 RTX
I'm gussing that updating the CPU and GPU has encouraged you run bigger and more complex processes than you might have pre-upgrade. The trouble is that now your system has gone from CPU-bound to memory-bound (and since the CPU upgrade, you may be reaching this state more quickly than before). My take on swap is to have at least as much as you have physical RAM. I'd even double what you currently have allocated, especially if you have a second hard disk. (I had a fair amount of experience administering Oracle database systems and they recommended 4X. I thought it was excessive but we never had problems with memory/swap exhaustion. Heck, disk space is relatively cheap nowadays so why not?.) Then, especially if they're the same size, I'd change the priorities on the swap partitions to be the same
so that Linux uses them in a round-robin fashion and swap activity doesn't overwhelm a single disk.
If/when you're building a new system or doing a wholesale structuring of your disks, consider placing swap partitions in the middle of the disks to reduce head seek time and squeeze every bit of performance you can out of the hardware. (This presumes that you're still using traditional hard disks. Solid state "drives"? Don't use them but I can't see swap partition placement helping much in that case.)
Update on this (at the risk of alerting the necro-posting police), I activated another swap partition on one of my SSD drives amounting to a total of 34GB of swap storage and I am experiencing considerably better stability. With this setup, and while it is arguably strange, I was utilizing 16GB of swap recently, which has never happened before without a system freeze. Now I think this is significant because of two potential causes:
1. My total RAM is 32GB and my swap was previously 22GB but now a combined 34GB with an additional swap partition on one of my SSDs
1. I have configured both swap partitions with the fstab option of discard = once
Presently I am using 12GB swap, and that was after I deliberately put additional memory load on the system that would typically cause my PC to freeze, but this time it did not freeze. Although I have an additional swap partition configured on an SSD I notice that 0 bytes are utilized right as I have configured it as the lower priority swap partition of the two (for now).
I am almost completely certain my system would have frozen by now if not for this new swap configuration, so I felt it prudent to post this update.
In the case that swap size is the determining factor then should I submit a bug to kernel maintainers indicating that there may be a bug related to the ratio of total RAM to total swap storage? AFAIK configuring swap storage at least as much as the amount of RAM on a system was always a recommendation, but based on my experience here and the fact that it has taken me a year to get anywhere troubleshooting this issue I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.
In the case that the discard = once is the determining factor, then I don't understand enough about how this works to comment further. AFAIK the other two options are async and both, the default setting being both.
... I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.
The kernel devs will not countenance that. A single user having a glitch doesn't dictate policy. Plenty of users run without swap at all, and plenty more (me included) just allocate a nominal amount because it may be useful one day.
For the longest time the recommendation was for swap to be twice the installed RAM, but with big systems these days that just became ridiculous, so it's largely ignored.
In the case that swap size is the determining factor then should I submit a bug to kernel maintainers indicating that there may be a bug related to the ratio of total RAM to total swap storage? AFAIK configuring swap storage at least as much as the amount of RAM on a system was always a recommendation, but based on my experience here and the fact that it has taken me a year to get anywhere troubleshooting this issue I would argue it's prudent to upgrade this recommendation to a policy or mandate and perhaps even fail-fast and prevent a user from configuring swap storage less than total available RAM.
I have to say it is a wrong approach. You need to know how [do you want] to use your system and you need to know if that was configured properly. There is no way [using a general installation] to take every and each requirement into account.
From the other hand every admin should (and should be able to) fine tune his/her managed hosts.
The kernel devs will not countenance that. A single user having a glitch doesn't dictate policy. Plenty of users run without swap at all, and plenty more (me included) just allocate a nominal amount because it may be useful one day.
For the longest time the recommendation was for swap to be twice the installed RAM, but with big systems these days that just became ridiculous, so it's largely ignored.
The kernel devs have nothing to do with the size of allocated swap space anyhow.
As to how the kernel handles swapping, these things are configurable. It's hard to believe vm.swappiness hasn't been mentioned yet.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.