LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-04-2021, 12:34 PM   #1
MadMartian
LQ Newbie
 
Registered: Jun 2021
Distribution: N00buntu
Posts: 11

Rep: Reputation: Disabled
Unhappy (Almost) Complete System Freeze upon Heavy Swap Usage


Overview & Symptoms

My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily. Everything freezes including the mouse and keyboard with a few exceptions:
  • The hard drive light appears to indicate some background activity
  • The fan sometimes spins up and down indicating some CPU activity
  • "nmap -sT" (TCP handshake) from another machine reveals open ports indicating that the NIC is responding at the OSI transport layer

Nothing is logged indicating what causes this.

On one rare occasion I remember the mouse was able to move a bit after about a minute or two of the system being frozen. This issue does not appear to occur whenever there is plenty of free RAM available, it only seems to occur when the swap partition starts experiencing significant load.

Here is the output of "free" that indicates free RAM and swap storage, right now there is mild swap usage. This is typically entering the danger zone where the system would freeze, although I've witnessed up to 12MB of swap used without an issue.

Total RAM: 32GB
Total Swap: 24GB

Code:
              total        used        free      shared  buff/cache   available
Mem:           31Gi        26Gi       1.7Gi       1.4Gi       3.5Gi       3.5Gi
Swap:          22Gi       3.9Gi        18Gi
What Might be Causing it

I've had this machine for 5 years, but this behaviour started occurring within the past year since the following changes:
  • Upgraded the processor from Intel i5 to Intel Core i7 4790K
  • Upgraded my GPU from an Asus 960 GTX to an EVGA 2070 RTX

Reproducing this behavior is fairly consistent, I wrote a script that spins-up background Python processes that sends requests until the system runs out of memory. I was able to reproduce the system freeze twice in a row doing this.

Troubleshooting & Mitigation

This old thread almost exactly mirrors my issue, and I have done the following in attempt to mitigate this issue without any success:
  1. Adjust the RAM timing to lower the voltage
  2. Replaced all DIMMs with 1600MHz frequency and 1.5 voltage spec (no overclocking)
  3. Updated the BIOS firmware

Other things I have tried:
  • S.M.A.R.T. long and short tests of the swap partition
  • fsck scan of the swap partition

System Details

Kernel: Linux 5.4.0-73-generic #82-Ubuntu SMP / x86_64
Disks and Partitions:
Code:
NAME                       MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT
sda                          8:0    0 238.5G  0 disk   
├─sda1                       8:1    0   953M  0 part   /boot/efi
├─sda2                       8:2    0    28G  0 part   /
└─sda3                       8:3    0 209.6G  0 part   /usr
sdb                          8:16   0   1.8T  0 disk   
├─sdb1                       8:17   0  22.4G  0 part   
├─sdb2                       8:18   0 144.4G  0 part   
├─sdb4                       8:20   0   9.3G  0 part   
├─sdb5                       8:21   0   1.7T  0 part   
└─isw_dhciiffhhj_Groovy    253:0    0   1.8T  0 dmraid 
  ├─isw_dhciiffhhj_Groovy1 253:1    0  22.4G  0 part   [SWAP]
  ├─isw_dhciiffhhj_Groovy2 253:2    0 144.4G  0 part   /var
  ├─isw_dhciiffhhj_Groovy4 253:3    0   9.3G  0 part   /srv
  └─isw_dhciiffhhj_Groovy5 253:4    0   1.7T  0 part   /home
sdc                          8:32   0   1.8T  0 disk   
├─sdc1                       8:33   0  22.4G  0 part   
├─sdc2                       8:34   0 144.4G  0 part   
├─sdc4                       8:36   0   9.3G  0 part   
├─sdc5                       8:37   0   1.7T  0 part   
└─isw_dhciiffhhj_Groovy    253:0    0   1.8T  0 dmraid 
  ├─isw_dhciiffhhj_Groovy1 253:1    0  22.4G  0 part   [SWAP]
  ├─isw_dhciiffhhj_Groovy2 253:2    0 144.4G  0 part   /var
  ├─isw_dhciiffhhj_Groovy4 253:3    0   9.3G  0 part   /srv
  └─isw_dhciiffhhj_Groovy5 253:4    0   1.7T  0 part   /home
sdd                          8:48   0 465.8G  0 disk   /opt
System:
Code:
H/W path          Device       Class          Description
=========================================================
                               system         All Series (All)
/0                             bus            Z97-PRO GAMER
/0/0                           memory         64KiB BIOS
/0/45                          memory         32GiB System Memory
/0/45/0                        memory         8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/1                        memory         8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/2                        memory         8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/45/3                        memory         8GiB DIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/54                          processor      Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
/0/54/55                       memory         256KiB L1 cache
/0/54/56                       memory         1MiB L2 cache
/0/54/57                       memory         8MiB L3 cache
/0/100                         bridge         4th Gen Core Processor DRAM Controller
/0/100/1                       bridge         Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller
/0/100/1.1                     bridge         Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller
/0/100/1.1/0                   display        TU104 [GeForce RTX 2070 SUPER]
/0/100/1.1/0.1                 multimedia     TU104 HD Audio Controller
/0/100/1.1/0.2                 bus            TU104 USB 3.1 Host Controller
/0/100/1.1/0.2/0  usb5         bus            xHCI Host Controller
/0/100/1.1/0.2/1  usb6         bus            xHCI Host Controller
/0/100/1.1/0.3                 bus            TU104 USB Type-C UCSI Controller
/0/100/14                      bus            9 Series Chipset Family USB xHCI Controller
/0/100/14/0       usb3         bus            xHCI Host Controller
/0/100/14/0/4                  input          Back-UPS NS 1350M2 FW:954.e3 .D USB FW:e3
/0/100/14/0/9                  input          Gaming Mouse G502
/0/100/14/0/a                  input          Corsair K70 RGB Gaming Keyboard
/0/100/14/0/d                  multimedia     Blue Microphones
/0/100/14/0/e                  bus            USB2.0 Hub
/0/100/14/0/e/2                multimedia     Logitech Wireless Headset
/0/100/14/0/e/4                multimedia     C922 Pro Stream Webcam
/0/100/14/1       usb4         bus            xHCI Host Controller
/0/100/16                      communication  9 Series Chipset Family ME Interface #1
/0/100/19         eno1         network        Ethernet Connection (2) I218-V
/0/100/1a                      bus            9 Series Chipset Family USB EHCI Controller #2
/0/100/1a/1       usb1         bus            EHCI Host Controller
/0/100/1a/1/1                  bus            USB hub
/0/100/1b                      multimedia     9 Series Chipset Family HD Audio Controller
/0/100/1c                      bridge         9 Series Chipset Family PCI Express Root Port 1
/0/100/1c.3                    bridge         82801 PCI Bridge
/0/100/1c.3/0                  bridge         ASM1083/1085 PCIe to PCI Bridge
/0/100/1d                      bus            9 Series Chipset Family USB EHCI Controller #1
/0/100/1d/1       usb2         bus            EHCI Host Controller
/0/100/1d/1/1                  bus            USB hub
/0/100/1f                      bridge         Z97 Chipset LPC Controller
/0/100/1f.2                    storage        9 Series Chipset Family SATA Controller [AHCI Mode]
/0/100/1f.3                    bus            9 Series Chipset Family SMBus Controller
/0/1                           system         PnP device PNP0c01
/0/2                           system         PnP device PNP0c02
/0/3                           system         PnP device PNP0b00
/0/4                           generic        PnP device INT3f0d
/0/5                           system         PnP device PNP0c02
/0/6                           system         PnP device PNP0c02
/0/7                           communication  PnP device PNP0501
/0/8                           system         PnP device PNP0c02
/0/9              scsi0        storage        
/0/9/0.0.0        /dev/sda     disk           256GB Samsung SSD 850
/0/9/0.0.0/1      /dev/sda1    volume         952MiB Windows FAT volume
/0/9/0.0.0/2      /dev/sda2    volume         27GiB EFI partition
/0/9/0.0.0/3      /dev/sda3    volume         209GiB EFI partition
/0/a              scsi2        storage        
/0/a/0.0.0        /dev/sdb     disk           2TB ST2000DM001-1ER1
/0/a/0.0.0/1                   volume         22GiB Linux swap volume
/0/a/0.0.0/2                   volume         144GiB EXT4 volume
/0/a/0.0.0/4                   volume         9537MiB EFI partition
/0/a/0.0.0/5                   volume         1686GiB EXT4 volume
/0/b              scsi3        storage        
/0/b/0.0.0        /dev/sdc     disk           2TB ST2000DM001-1ER1
/0/b/0.0.0/1                   volume         22GiB Linux swap volume
/0/b/0.0.0/2                   volume         144GiB EXT4 volume
/0/b/0.0.0/4                   volume         9537MiB EFI partition
/0/b/0.0.0/5                   volume         1686GiB EXT4 volume
/0/c              scsi4        storage        
/0/c/0.0.0        /dev/sdd     volume         465GiB Samsung SSD 860
/1                             power          To Be Filled By O.E.M.
/2                vethc2afe35  network        Ethernet interface
This experience has left me feeling demoralized and deflated, it occurs often enough to significantly impact my productivity. I am tempted to replace the entire system top to bottom but I am suspicious this issue would follow me to the new system too.

Last edited by MadMartian; 06-06-2021 at 05:22 PM. Reason: Added memory and disk information / specs
 
Old 06-05-2021, 09:11 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,786

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
Quote:
Originally Posted by MadMartian View Post
Overview & Symptoms

My system freezes almost completely whenever my system runs out of RAM and starts hitting the swap partition heavily.
That is more or less normal. You need to add more ram or check why is it in use (also probably more swap may help a bit).
probably this helps to go further: www.linuxatemyram.com
 
Old 06-05-2021, 10:23 AM   #3
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,248

Rep: Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321
It's an achievement to run out of ram and swap on any properly built system.
The only time I managed it was compiling about 50 libraries statically on an under-resourced system into this massive verilog program, which incidentally was a total waste of time. I rebooted, closed some stuff, and the thing went together fine.

Increase ram. Add a swap file somewhere, and it will complement and add to existing swap facilities. And stop overloading your system.
 
Old 06-06-2021, 02:31 AM   #4
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Upgrading CPU & GPU has nothing to do with RAM.
Your system reports 32GB of RAM, am I seeing this correctly?
WHAT ARE YOU DOING TO REACH THE LIMIT ON THAT?
 
Old 06-06-2021, 03:25 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,119

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by ondoho View Post
Upgrading CPU & GPU has nothing to do with RAM.
Not entirely true - all hardware needs driver support; that includes all the chips on a motherboard - CPU as well as support chips. It's not unknown for drivers - either in-kernel or out-of-tree - to go awry and eat up memory. If they do so, the memory is not available to user-space, and may cause shortages. The TCP stack has been known to do this for example.
Quote:
Your system reports 32GB of RAM, am I seeing this correctly?
WHAT ARE YOU DOING TO REACH THE LIMIT ON THAT?
Yep, this you need to know. There is the small matter of 22G of swap in addition to the RAM. Normally swap is only used for evicting anonymous memory, but may be a side-effect in (very) rare cases. I'd be inclined to keep an eye on meminfo over time and see if anything increases unexpectedly. Then look at all the userspace processes to see if you have an obvious suspect.

I'm kinda surprised you aren't hearing from OOM-killer.
 
Old 06-07-2021, 02:54 PM   #6
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,965

Rep: Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622Reputation: 3622
Might be possible to change how swap is spread across all the drives and in some setting that reflects the drive usage versus swap needs.

As noted above, you seem to require more ram or reduce the ram usage.
 
1 members found this post helpful.
Old 06-07-2021, 08:09 PM   #7
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,307
Blog Entries: 28

Rep: Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136Reputation: 6136
I think it might help to know what processes are using how much RAM.

Have you checked that with top or htop?
 
1 members found this post helpful.
Old 06-07-2021, 08:22 PM   #8
jamison20000e
Senior Member
 
Registered: Nov 2005
Location: ...uncanny valley... infinity\1975; (randomly born:) Milwaukee, WI, US( + travel,) Earth&Mars (I wish,) END BORDER$!◣◢┌∩┐ Fe26-E,e...
Distribution: any GPL that work on freest-HW; has been KDE, CLI, Novena-SBC but open.. http://goo.gl/NqgqJx &c ;-)
Posts: 4,888
Blog Entries: 2

Rep: Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567
2400 MHz + faster SSD or RPMs (++) likely won't fix software issues... may make you feel warmer and fuzzier LogicLol
Quote:
WHAT ARE YOU DOING TO REACH THE LIMIT ON THAT?
Canonical and dealerships*** want us(-no pun) to buy more so they sugar down your gas &pee!

My* kernel is at 5.10 with only 4Gb RAM, less SWAP and no problems,,, tho I wish it was on: https://www.crowdsupply.com/mnt/reform

Edit; add: oh i forgot to say: just to throw it away; we still have our: II series revolution; next GPL: hardwares!

Last edited by jamison20000e; 06-07-2021 at 11:31 PM.
 
Old 06-08-2021, 01:19 AM   #9
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
How old is your hard drive?
 
1 members found this post helpful.
Old 06-09-2021, 03:51 PM   #10
MadMartian
LQ Newbie
 
Registered: Jun 2021
Distribution: N00buntu
Posts: 11

Original Poster
Rep: Reputation: Disabled
I tried to reply to this thread a few days ago but the thread went missing, not sure what happened, but I'll try again now...

I am a developer so I find it pretty easy to max-out 32G of RAM if I really want to. While this problem is an annoyance it is also interesting to me. I assumed that a process can potentially be swapped entirely to disk, analogous to "per-process hibernation," but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.

Quote:
Originally Posted by igadoter View Post
How old is your hard drive?
It's about 5 years old, RAID L1, weekly long and short S.M.A.R.T. tests, all green AFAIK. There were some offline uncorrectable errors for awhile but they went away. However it was suggested to me to run mkswap -c ... to re-create and check the swap partition for bad sectors. I have replacement drives on standby.

Quote:
Originally Posted by frankbell View Post
I think it might help to know what processes are using how much RAM.
I've analyzed the processes using smem and found that the biggest offenders are:
  • IntelliJ IDEA (~5G)
  • plasmashell (~1.5G, tends to leak memory, I think it's a known issue without a resolution)
  • kwin_x11 (~0.8G)
  • Several background Java processes for development (sums-up to about ~3.5G)
  • Firefox (~0.5G but has various sub processes that sum-up to about ~2G)
  • Waterfox (a fork of Firefox, similar RAM usage, I run these separate instances for the purpose of browsing segmentation)
  • clementine
  • LBRY

Then the remaining processes consuming less than 0.5G of RAM but still worth mention are:
  • Discord
  • ipfs

Then there's about a hundred more processes that don't really add-up to much more than 2G.

Quote:
Originally Posted by syg00 View Post
I'm kinda surprised you aren't hearing from OOM-killer.
That's what I thought too but it's disabled on this system, I'm tempted to turn it on if it might mean I get to recover from a complete system freeze, I just have to make sure the worst culprits have terrible OOM-kill scores (looking at you IntelliJ, Java, and Firefox!).

Last edited by MadMartian; 06-09-2021 at 03:55 PM. Reason: Reply to OOM-kill question
 
Old 06-10-2021, 01:10 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,786

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
Quote:
Originally Posted by MadMartian View Post
I am a developer so I find it pretty easy to max-out 32G of RAM if I really want to. While this problem is an annoyance it is also interesting to me. I assumed that a process can potentially be swapped entirely to disk, analogous to "per-process hibernation," but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.
Again, please read www.linuxatemyram.com (and also there is a link at the bottom - how can I verify - for additional info) if you wish to understand how is it working. Don't assume anything.
 
Old 06-11-2021, 02:18 AM   #12
jamison20000e
Senior Member
 
Registered: Nov 2005
Location: ...uncanny valley... infinity\1975; (randomly born:) Milwaukee, WI, US( + travel,) Earth&Mars (I wish,) END BORDER$!◣◢┌∩┐ Fe26-E,e...
Distribution: any GPL that work on freest-HW; has been KDE, CLI, Novena-SBC but open.. http://goo.gl/NqgqJx &c ;-)
Posts: 4,888
Blog Entries: 2

Rep: Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567
Try running a live operating system and see if you can crash it?
 
Old 06-11-2021, 02:43 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,119

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Seems to me the OP has a pretty good handle on the overall theory.
What does /proc/meminfo show over time - say every 15 minutes or so ?.
 
Old 06-11-2021, 02:52 AM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,119

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by MadMartian View Post
... but what I've found is that when main RAM is constrained none of my processes get swapped 100% to disk, in fact most of them only get about 15% swapped to disk at best. This would definitely be good reason to shut down unused apps and processes rather than permit them to linger in the background.
I skimmed over this earlier.
If they are not being forced out to swap, they are not unused, background tasks. Only long-term unreferenced anonymous pages are candidates for swap-out. Java is awful, people that code in it are co-erced into lazy habits. Bad combination.
If you can get rid of those tasks altogether, the buy-back may be better than you hope.
 
Old 06-11-2021, 06:29 AM   #15
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,248

Rep: Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321Reputation: 2321
The idea about /proc/meminfo is good. Back in earlier days, there was a kernel memory leak which gradually ground the system to a halt. It can happen with a process too. Running out of memory with 32G sounds like windows. I've only 6G here, and rarely use swap. I lost hibernation at one point because the 3 month old hibernation image was dodgy, so hibernation didn't happen.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] diagnosing heavy cpu system usage andrews-mark Linux - General 3 10-24-2011 03:48 PM
System freeze on large file/heavy disk jshellman Linux - Hardware 1 11-15-2006 03:47 PM
LCD Screens ~ Dangerous under heavy usage? Twiggy794 Linux - Hardware 8 01-10-2004 01:13 PM
Heavy processor usage while doing almost nothing Wynd Linux - General 12 01-09-2004 11:02 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 02:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration