[SOLVED] (Almost) Complete System Freeze upon Heavy Swap Usage
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
(Almost) Complete System Freeze upon Heavy Swap Usage
Overview & Symptoms
My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily. Everything freezes including the mouse and keyboard with a few exceptions:
The hard drive light appears to indicate some background activity
The fan sometimes spins up and down indicating some CPU activity
"nmap -sT" (TCP handshake) from another machine reveals open ports indicating that the NIC is responding at the OSI transport layer
Nothing is logged indicating what causes this.
On one rare occasion I remember the mouse was able to move a bit after about a minute or two of the system being frozen. This issue does not appear to occur whenever there is plenty of free RAM available, it only seems to occur when the swap partition starts experiencing significant load.
Here is the output of "free" that indicates free RAM and swap storage, right now there is mild swap usage. This is typically entering the danger zone where the system would freeze, although I've witnessed up to 12MB of swap used without an issue.
Total RAM: 32GB Total Swap: 24GB
Code:
total used free shared buff/cache available
Mem: 31Gi 26Gi 1.7Gi 1.4Gi 3.5Gi 3.5Gi
Swap: 22Gi 3.9Gi 18Gi
What Might be Causing it
I've had this machine for 5 years, but this behaviour started occurring within the past year since the following changes:
Upgraded the processor from Intel i5 to Intel Core i7 4790K
Upgraded my GPU from an Asus 960 GTX to an EVGA 2070 RTX
Reproducing this behavior is fairly consistent, I wrote a script that spins-up background Python processes that sends requests until the system runs out of memory. I was able to reproduce the system freeze twice in a row doing this.
Troubleshooting & Mitigation
This old thread almost exactly mirrors my issue, and I have done the following in attempt to mitigate this issue without any success:
Adjust the RAM timing to lower the voltage
Replaced all DIMMs with 1600MHz frequency and 1.5 voltage spec (no overclocking)
Updated the BIOS firmware
Other things I have tried:
S.M.A.R.T. long and short tests of the swap partition
fsck scan of the swap partition
System Details
Kernel: Linux 5.4.0-73-generic #82-Ubuntu SMP / x86_64 Disks and Partitions:
Code:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 238.5G 0 disk
├─sda1 8:1 0 953M 0 part /boot/efi
├─sda2 8:2 0 28G 0 part /
└─sda3 8:3 0 209.6G 0 part /usr
sdb 8:16 0 1.8T 0 disk
├─sdb1 8:17 0 22.4G 0 part
├─sdb2 8:18 0 144.4G 0 part
├─sdb4 8:20 0 9.3G 0 part
├─sdb5 8:21 0 1.7T 0 part
└─isw_dhciiffhhj_Groovy 253:0 0 1.8T 0 dmraid
├─isw_dhciiffhhj_Groovy1 253:1 0 22.4G 0 part [SWAP]
├─isw_dhciiffhhj_Groovy2 253:2 0 144.4G 0 part /var
├─isw_dhciiffhhj_Groovy4 253:3 0 9.3G 0 part /srv
└─isw_dhciiffhhj_Groovy5 253:4 0 1.7T 0 part /home
sdc 8:32 0 1.8T 0 disk
├─sdc1 8:33 0 22.4G 0 part
├─sdc2 8:34 0 144.4G 0 part
├─sdc4 8:36 0 9.3G 0 part
├─sdc5 8:37 0 1.7T 0 part
└─isw_dhciiffhhj_Groovy 253:0 0 1.8T 0 dmraid
├─isw_dhciiffhhj_Groovy1 253:1 0 22.4G 0 part [SWAP]
├─isw_dhciiffhhj_Groovy2 253:2 0 144.4G 0 part /var
├─isw_dhciiffhhj_Groovy4 253:3 0 9.3G 0 part /srv
└─isw_dhciiffhhj_Groovy5 253:4 0 1.7T 0 part /home
sdd 8:48 0 465.8G 0 disk /opt
System:
Code:
H/W path Device Class Description
=========================================================
system All Series (All)
/0 bus Z97-PRO GAMER
/0/0 memory 64KiB BIOS
/0/45 memory 32GiB System Memory
/0/45/0 memory 8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/1 memory 8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/2 memory 8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/3 memory 8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/54 processor Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
/0/54/55 memory 256KiB L1 cache
/0/54/56 memory 1MiB L2 cache
/0/54/57 memory 8MiB L3 cache
/0/100 bridge 4th Gen Core Processor DRAM Controller
/0/100/1 bridge Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller
/0/100/1.1 bridge Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller
/0/100/1.1/0 display TU104 [GeForce RTX 2070 SUPER]
/0/100/1.1/0.1 multimedia TU104 HD Audio Controller
/0/100/1.1/0.2 bus TU104 USB 3.1 Host Controller
/0/100/1.1/0.2/0 usb5 bus xHCI Host Controller
/0/100/1.1/0.2/1 usb6 bus xHCI Host Controller
/0/100/1.1/0.3 bus TU104 USB Type-C UCSI Controller
/0/100/14 bus 9 Series Chipset Family USB xHCI Controller
/0/100/14/0 usb3 bus xHCI Host Controller
/0/100/14/0/4 input Back-UPS NS 1350M2 FW:954.e3 .D USB FW:e3
/0/100/14/0/9 input Gaming Mouse G502
/0/100/14/0/a input Corsair K70 RGB Gaming Keyboard
/0/100/14/0/d multimedia Blue Microphones
/0/100/14/0/e bus USB2.0 Hub
/0/100/14/0/e/2 multimedia Logitech Wireless Headset
/0/100/14/0/e/4 multimedia C922 Pro Stream Webcam
/0/100/14/1 usb4 bus xHCI Host Controller
/0/100/16 communication 9 Series Chipset Family ME Interface #1
/0/100/19 eno1 network Ethernet Connection (2) I218-V
/0/100/1a bus 9 Series Chipset Family USB EHCI Controller #2
/0/100/1a/1 usb1 bus EHCI Host Controller
/0/100/1a/1/1 bus USB hub
/0/100/1b multimedia 9 Series Chipset Family HD Audio Controller
/0/100/1c bridge 9 Series Chipset Family PCI Express Root Port 1
/0/100/1c.3 bridge 82801 PCI Bridge
/0/100/1c.3/0 bridge ASM1083/1085 PCIe to PCI Bridge
/0/100/1d bus 9 Series Chipset Family USB EHCI Controller #1
/0/100/1d/1 usb2 bus EHCI Host Controller
/0/100/1d/1/1 bus USB hub
/0/100/1f bridge Z97 Chipset LPC Controller
/0/100/1f.2 storage 9 Series Chipset Family SATA Controller [AHCI Mode]
/0/100/1f.3 bus 9 Series Chipset Family SMBus Controller
/0/1 system PnP device PNP0c01
/0/2 system PnP device PNP0c02
/0/3 system PnP device PNP0b00
/0/4 generic PnP device INT3f0d
/0/5 system PnP device PNP0c02
/0/6 system PnP device PNP0c02
/0/7 communication PnP device PNP0501
/0/8 system PnP device PNP0c02
/0/9 scsi0 storage
/0/9/0.0.0 /dev/sda disk 256GB Samsung SSD 850
/0/9/0.0.0/1 /dev/sda1 volume 952MiB Windows FAT volume
/0/9/0.0.0/2 /dev/sda2 volume 27GiB EFI partition
/0/9/0.0.0/3 /dev/sda3 volume 209GiB EFI partition
/0/a scsi2 storage
/0/a/0.0.0 /dev/sdb disk 2TB ST2000DM001-1ER1
/0/a/0.0.0/1 volume 22GiB Linux swap volume
/0/a/0.0.0/2 volume 144GiB EXT4 volume
/0/a/0.0.0/4 volume 9537MiB EFI partition
/0/a/0.0.0/5 volume 1686GiB EXT4 volume
/0/b scsi3 storage
/0/b/0.0.0 /dev/sdc disk 2TB ST2000DM001-1ER1
/0/b/0.0.0/1 volume 22GiB Linux swap volume
/0/b/0.0.0/2 volume 144GiB EXT4 volume
/0/b/0.0.0/4 volume 9537MiB EFI partition
/0/b/0.0.0/5 volume 1686GiB EXT4 volume
/0/c scsi4 storage
/0/c/0.0.0 /dev/sdd volume 465GiB Samsung SSD 860
/1 power To Be Filled By O.E.M.
/2 vethc2afe35 network Ethernet interface
This experience has left me feeling demoralized and deflated, it occurs often enough to significantly impact my productivity. I am tempted to replace the entire system top to bottom but I am suspicious this issue would follow me to the new system too.
Last edited by MadMartian; 06-06-2021 at 05:22 PM.
Reason: Added memory and disk information / specs
My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily.
That is more or less normal. You need to add more ram or check why is it in use (also probably more swap may help a bit).
probably this helps to go further: www.linuxatemyram.com
It's an achievement to run out of ram and swap on any properly built system.
The only time I managed it was compiling about 50 libraries statically on an under-resourced system into this massive verilog program, which incidentally was a total waste of time. I rebooted, closed some stuff, and the thing went together fine.
Increase ram. Add a swap file somewhere, and it will complement and add to existing swap facilities. And stop overloading your system.
Upgrading CPU & GPU has nothing to do with RAM.
Your system reports 32GB of RAM, am I seeing this correctly?
WHAT ARE YOU DOING TO REACH THE LIMIT ON THAT?
Not entirely true - all hardware needs driver support; that includes all the chips on a motherboard - CPU as well as support chips. It's not unknown for drivers - either in-kernel or out-of-tree - to go awry and eat up memory. If they do so, the memory is not available to user-space, and may cause shortages. The TCP stack has been known to do this for example.
Quote:
Your system reports 32GB of RAM, am I seeing this correctly?
WHAT ARE YOU DOING TO REACH THE LIMIT ON THAT?
Yep, this you need to know. There is the small matter of 22G of swap in addition to the RAM. Normally swap is only used for evicting anonymous memory, but may be a side-effect in (very) rare cases. I'd be inclined to keep an eye on meminfo over time and see if anything increases unexpectedly. Then look at all the userspace processes to see if you have an obvious suspect.
I'm kinda surprised you aren't hearing from OOM-killer.
I tried to reply to this thread a few days ago but the thread went missing, not sure what happened, but I'll try again now...
I am a developer so I find it pretty easy to max-out 32G of RAM if I really want to. While this problem is an annoyance it is also interesting to me. I assumed that a process can potentially be swapped entirely to disk, analogous to "per-process hibernation," but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.
Quote:
Originally Posted by igadoter
How old is your hard drive?
It's about 5 years old, RAID L1, weekly long and short S.M.A.R.T. tests, all green AFAIK. There were some offline uncorrectable errors for awhile but they went away. However it was suggested to me to run mkswap -c ... to re-create and check the swap partition for bad sectors. I have replacement drives on standby.
Quote:
Originally Posted by frankbell
I think it might help to know what processes are using how much RAM.
I've analyzed the processes using smem and found that the biggest offenders are:
IntelliJ IDEA (~5G)
plasmashell (~1.5G, tends to leak memory, I think it's a known issue without a resolution)
kwin_x11 (~0.8G)
Several background Java processes for development (sums-up to about ~3.5G)
Firefox (~0.5G but has various sub processes that sum-up to about ~2G)
Waterfox (a fork of Firefox, similar RAM usage, I run these separate instances for the purpose of browsing segmentation)
clementine
LBRY
Then the remaining processes consuming less than 0.5G of RAM but still worth mention are:
Discord
ipfs
Then there's about a hundred more processes that don't really add-up to much more than 2G.
Quote:
Originally Posted by syg00
I'm kinda surprised you aren't hearing from OOM-killer.
That's what I thought too but it's disabled on this system, I'm tempted to turn it on if it might mean I get to recover from a complete system freeze, I just have to make sure the worst culprits have terrible OOM-kill scores (looking at you IntelliJ, Java, and Firefox!).
Last edited by MadMartian; 06-09-2021 at 03:55 PM.
Reason: Reply to OOM-kill question
I am a developer so I find it pretty easy to max-out 32G of RAM if I really want to. While this problem is an annoyance it is also interesting to me. I assumed that a process can potentially be swapped entirely to disk, analogous to "per-process hibernation," but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.
Again, please read www.linuxatemyram.com (and also there is a link at the bottom - how can I verify - for additional info) if you wish to understand how is it working. Don't assume anything.
... but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.
I skimmed over this earlier.
If they are not being forced out to swap, they are not unused, background tasks. Only long-term unreferenced anonymous pages are candidates for swap-out. Java is awful, people that code in it are co-erced into lazy habits. Bad combination.
If you can get rid of those tasks altogether, the buy-back may be better than you hope.
The idea about /proc/meminfo is good. Back in earlier days, there was a kernel memory leak which gradually ground the system to a halt. It can happen with a process too. Running out of memory with 32G sounds like windows. I've only 6G here, and rarely use swap. I lost hibernation at one point because the 3 month old hibernation image was dodgy, so hibernation didn't happen.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.