disk thrashing on 5.15.x kernels (but not on 4.4.x or 4.19.x kernels)

twy · 06-09-2022, 01:03 AM

I'm having a problem with very bad hard disk storage performance since upgrading to slackware64-15.0 from 14.2. I seem to have narrowed down the problem to being the kernel version. Using a storage configuration that I have used for 10 years, just changing the kernel only makes the difference. On slackare64-15.0, if I run linux kernel 4.4.302 or 4.19.246, then disk performance is normal (with 4.19.246 it seems very good and I'm running it now) like it has been for 10 years trouble-free. If I run 5.15.38 (or any 5.15.x), then I get severe disk thrashing when the dirty writeback pages are flushed to disks. The thrashing slows the writes down to less than 500KiB/s per disk, and the noise from the disks is far more than normal (they are usually almost silent).

My disk configuration is 8x 1TB western digital RE/Ultrastar-HA210 disks on a LSI SAS 2008 (P20 IT firmware) ASUS PIKE 2008 controller using the mpt3sas module. The disks are each partitioned the same into two partitions using sgdisk (GPT partitions, but MBR boot sector compatible with lilo installed). Partition 1 is about 1G for /boot. All of the first partitions are in a md-raid1 (version 0.90 metadata) 8-way mirror for lilo /boot. All of the second partitions are members of an md-raid6 with chunk size 64k and internal bitmap (for root /). The partitions aligned on 2048-sector (1MB) boundaries, but actually aligned more like on 16MB boundaries the way I spaced them out with some padding. LUKS (luks1) is put on the raid6 disk. Then, the luks disk is used as the only PV (physical volume) in a LVM VG (volume group). The LVM VG has swap and root LVs (logical volumes). The root LV has ext4 filesystem with stride=64/4=16 and stripe_width=stride*(8-2)=96 set using tune2fs. The root LV has readahead 6144 (6144=64*6 KiB user data per stripe). Using blockdev, I setra 1024 on /dev/sd? and setra 6144 on /dev/md1 so that readahead is matched up (note: these values could all probably be doubled since they are numbers of 512B sectors, not 1KiB sectors, but these values have worked fine). I also set echo 32768 > /sys/block/md1/md/stripe_cache_size since I have enough RAM (32GB) and this helps the raid6 performance. The system defaults /proc/sys/vm/dirty_ratio = 20 and /proc/sys/vm/dirty_background_ratio = 10 are used since they allow a lot of free RAM to be used as writeback cache. Basically, this works perfectly for me on linux 4.19.246 on slackware64-15.0. But, if I try to use the slackware kernel 5.15.38 with slackware's .config, or any kind of custom .config with any 5.15.x kernel that I have tried, the results are the same - very very bad disk thrashing and/or throttling that causes the disks to make a lot of noise and run very very slowly. I tried all-default settings, not making any of tweaks above that I have used for 10 years, but it made no difference on 5.15.x kernels - bad thrashing on flushing dirty writeback cache. Running with my tweaks above worked fine for 10 years on kernels up to 4.4.x and now 4.19.x. I had no hard drive failures in 10 years (!), but recently had some and the raid6 worked as it should, so I just replaced failed drives. Btw, some failed drives could be zeroed (using ddrescue) and then recovered from pending sectors problems (sectors that return read failure in normal operation) - they then passed long self-tests.

As an example of this disk thrashing problem, I can just run tar xvf qemu-6.2.0.tar.gz and then monitor I/O with iostat -s 1. The extraction goes very fast into the kernel's writeback cache. After 5 seconds (/proc/sys/vm/dirty_writeback_centsecs = 500), the data is marked as dirty in the cache and is to be written back to disks. After 30 seconds or less (/proc/sys/vm/dirty_expire_centsecs = 3000), the data is suddenly written to disk. The extracted tar is 793M, so it should write back in seconds and should not be very noticable. With the problem, the write back takes a long time, slows down, and is noisy. I cannot run with this problem, because it would probably lead to hard drive failures - and it is so slow!

I have tried playing with many settings on the 5.15 kernels, but nothing fixes the problem. I tried no tweaking with generic kernel, using all defaults. I tried the generic kernel with only minor changes to compile-in my device drivers. I tried the generic kernel with only the bfq I/O scheduler compiled-in to try to make it the default, but the "none" scheduler was then the default. I did "for i in /sys/block/sd? ; do echo bfq > $i/queue/scheduler" but bfq still did not fix the problem with bad thrashing disk I/O on sync. I tried the other schedulers. What does work nice is the old CFQ scheduler and whatever else magic in 4.4.302 and 4.19.246. Since linux kernel 5.0, the CFQ scheduler is removed and replaced by mq-deadline, bfq, and other schedulers - they do not work for me with my disk configuration - the result is very very bad performance with terrible disk thrashing that makes the hard drives make a lot more noise than normal and work very very slowly on something that should take seconds, then takes minutes of torture on the disks. 5.15.x kernels are basically not usable for my configuration.

Does anyone have any idea what the problem is? Is there a simple way to fix this? For now, I am running okay (performance is rather nice) on 4.19.246 with all of my old tweaks in place like I ran with for 10 years on earlier kernels up to 4.4.302. I think something went terribly wrong since linux kernel 5.0 for configurations like mine.

(this message had triggered a submit block until I edited out some example linux commands - kind of surprising on a site where it is handy to explain what commands you have used)

twy · 06-11-2022, 04:39 PM

I guess no one else has had my problem with disk thrashing on linux kernel 5.15.x on slackware64-15.0. I tested linux kernel 4.19.246 and had no disk thrashing.

Linux kernel 5.10.121 is also working for me without disk thrashing. The iostat looks a little slower than on 4.19.246, but 5.10.121 seems to throttle the i/o (or is just slower) such that audio and video playback do not suffer any jitters during background page write backs using default settings in /proc/sys/vm. Linux 4.19.x had some jitters - to fix this, I used to set /proc/sys/vm/dirty_background_bytes to $((1024*1024*8)) = 8 MiB. I configured 5.10.121 based on slackware's generic config for 5.15.38, and then compiled in some of my drivers needed to boot. On 5.10.121, my disks are using the mq-deadline scheduler (not old CFQ), so it is able to work and I still do not understand what the problem is on 5.15.x, but there must be some difference between 5.10.x and 5.15.x that breaks my storage configuration.

I guess 5.10.x is good, and has long support until Dec 2026. This seems familiar, like some time ago (maybe more than a year ago) during the slackware64-15.0 development I tested 5.10 and other kernels and noticed something, but then quickly went back to 4.4.x because I don't do a lot of testing - I'm just a user, not developer. I remember now, thinking 5.10 would be good, but then slackware moved on to later kernels and the development dragged on forever. Even now, 5.10 is what is good for my configuration.

Question:
Is there anything that I should be concerned about if I continue to run linux kernel 5.10.x on slackware64-15.0 (no multilib)?

tjallen · 06-11-2022, 07:33 PM

I don't know what the problem with 5.15.x is, but I can say that I was having more trouble with recollindex freezing my machine on 5.15.x than on 5.4.x, though I had some trouble with 5.4.x as well. Then I set the following in /etc/sysctl.conf:

Code:

vm.swappiness = 1
vm.vfs_cache_pressure = 50
vm.dirty_background_bytes = 1048576
vm.dirty_bytes = 8388608

and that seems to help, but my machine (MacBookPro 2012 with 16GB of RAM) is still not all that responsive when there is a lot of disk activity, but the lags are much shorter. Using bfq doesn't seem to fix it either.

Have you tried setting definite values for vm.dirty_background_bytes and vm.dirty_bytes -- instead of percentages -- as you did with 4.19.x?

rkelsen · 06-11-2022, 11:13 PM

Quote:

Originally Posted by twy

As an example of this disk thrashing problem, I can just run tar xvf qemu-6.2.0.tar.gz and then monitor I/O with iostat -s 1.

I performed the same experiment with the only HDD I have left, which is a 1TB WD Black which is approximately 10 years old. (I'm almost 100% SSD baby! Life is too short.)

Anyhow:

The extraction is done is a few seconds. After a few more seconds, the data is written to the drive and you can hear it happening. It takes 5, maybe 6 seconds and doesn't seem to affect anything else... It's certainly not thrashing the hard drive.

How much RAM is in your machine, and are you using the same disk for swap space? What brand is your HDD controller?

EDIT: This is with Patrick's 'generic' 5.15.38... using xfs as the filesystem on that disk.

twy · 06-12-2022, 03:34 AM

Quote:

Originally Posted by rkelsen

I performed the same experiment with the only HDD I have left, which is a 1TB WD Black which is approximately 10 years old. (I'm almost 100% SSD baby! Life is too short.)

Anyhow:

The extraction is done is a few seconds. After a few more seconds, the data is written to the drive and you can hear it happening. It takes 5, maybe 6 seconds and doesn't seem to affect anything else... It's certainly not thrashing the hard drive.

How much RAM is in your machine, and are you using the same disk for swap space? What brand is your HDD controller?

EDIT: This is with Patrick's 'generic' 5.15.38... using xfs as the filesystem on that disk.

I gave all these details in my original post, but I have 32GB RAM and the swap is one of the LVs. It is a mdraid6\luks\lvm\ext4 storage configuration using eight 1TB disks. The swap does not get used, so swapping to the swap is not what the problem is.

But anyhow, I do get the impression that after linux kernel 5.10, few care anymore about HDDs. It seems that the newer kernels just concern SSDs (the NVMe type, I guess). I have only old "legacy" HDDs, and none of SSDs yet. My motherboard is old (Asus P7F-E) and does not have any support for SSDs. I'm not even sure how to run SSDs in a raid6 in a configuration like mine, and maybe no one does. Or, if they do, any sort of disk thrashing does not apply the same way on SSDs, or then hard-to-notice possible fragmentation and extra wear on the SSDs. The SSDs just don't make any noise to let you know there is thrashing or very fragmented access. I could imagine that with SSDs, what you will notice is bad, slow performance if this kind of thing happens, while with HDDs you will hear the HDDs actuators/heads moving like crazy when the scheduler/elevator is bad somehow. Maybe some day I will move on to SSDs, but I do not have the money! I'm just trying to keep my legacy system going as it is but with some upgraded software. I have never been able to afford keeping up with all of the fast-moving and expensive PC developments year after year. I already had to drop quite a bit just to get this legacy system! It is very tiresome running a PC. I may just give it up after this PC and just use something else, like just a tablet or even a Windows laptop, really. I feel the days of my linux computing maybe nearing an end along with the end of HDDs. The end of a long nightmarish era. I would probably relax a lot more now with just an iphone and ipad! The hell with it!

rkelsen · 06-12-2022, 04:50 AM

Quote:

Originally Posted by twy

I feel the days of my linux computing maybe nearing an end along with the end of HDDs. The end of a long nightmarish era.

OK well good luck in your future endeavours.

It might interest you to know that you weren't alone in noticing performance regressions in 5.15:

https://www.phoronix.com/scan.php?it...1&page=article

Have you tried a later version?

dchmelik · 06-13-2022, 10:52 PM

For later 5.n (sometime after 5.10 maybe) to 5.18.n I regularly have a large amount of HDD thrashing, despite all our PCs have 32 to 64GB RAM. Mine has WD/HGST UltraStar 8TB. I didn't have much HDD thrashing on earlier kernels, and even have some 120GB WD black drives maybe 20 +years old WD SMART reports fine. However because of Linux kernel not being what it once was (but also WD decreasing quality) I already had WD SMART diagnosis software say my 8TB HDD had a problem after barely a year or two. We also have WD/HGST 4TB, 6TB drives working almost 10+ years because weren't used with HDD-thrashing kernels.

rkelsen · 06-14-2022, 07:53 PM

Quote:

Originally Posted by dchmelik

For later 5.n (sometime after 5.10 maybe) to 5.18.n I regularly have a large amount of HDD thrashing, despite all our PCs have 32 to 64GB RAM.

Are there any clues in the logs?

What filesystem are you using?

dchmelik · 06-14-2022, 09:59 PM

Quote:

Originally Posted by rkelsen

Are there any clues in the logs?

Where?

Quote:

What filesystem are you using?

ZFS (with no fancy features) but think had same happen on EXT4.

rkelsen · 06-15-2022, 04:33 AM

Quote:

Originally Posted by dchmelik

Where?

How about /var/log/messages to begin with?

ZFS. Raid. I'm seeing a pattern. Do you also use LUKS?

dchmelik · 06-15-2022, 04:42 AM

Quote:

Originally Posted by rkelsen

ZFS. Raid. I'm seeing a pattern. Do you also use LUKS?

I stated I don't use fancy filesystem (fs) features; I don't use RAID and don't know what LUKS is. Twy uses those but finally mentioned using EXT4 fs (so there's no pattern/similarity to twy's and my posts--entirely different filesystem setups--other than possibly kernel hardware support not being kept up-to-date... though I did also state I may have seen this happen years ago with EXT4... never seen it happen on UNIX/*BSD/OpenSolaris/IllumOS with same ZFS nor when those can open EXT4 as EXT2... the only pattern is hardware used with newer Linux 5.n kernel, regardless of fs.)

If I had to hard poweroff a ZFS, sometimes it was thrashing while repairing, but that may be normal. Though I have 32 CPU threads, if I quickly started maybe 12 or more programs in KDE, thrashing often starts, but same doesn't happen in XFCE, so think KDE hasn't kept up with parallel processing (more a KDE than kernel problem maybe.) If I quickly started maybe 20 or more programs in KDE not only is it likely to thrash but might halt the KDE taskbar 'permanently' (which I'll call until two hours later, longest I've waited) unless killed & restarted (again, KDE being behind in parallel processing, different issue, but possible newer Linux 5.n kernel inadequacy led to KDE stuff halting.)

However, I'd had thrashing on both KDE and XFCE at other seemingly-random times also, not always when even loading many files in a program (but typical to happen any such time when didn't happen in 2010s with very similar loads.) Sometimes I even get thrashing when only 10% or fewer CPU resources are used and I've barely used a GB or few RAM after starting/rebooting PC (which of course, if a program is accessing a lot that will go in RAM, there will be a lot of disk activity, but I didn't happen to such an extent (overdoing it?) in 2010s.)

rkelsen · 06-15-2022, 05:07 AM

Anything in the logs?

dchmelik · 06-29-2022, 02:31 AM

Quote:

Originally Posted by rkelsen

Anything in the logs?

Thrashing isn't an error: no logs (other than when causes hardware instability/damage as mentioned/logged with SMART, and if one tries to do more during thrashing (bad idea) one may get software errors related to system slowdown, not necessarily hardware itself.)