LinuxQuestions.org - [SOLVED] file system cache issues 4.19.0 and onwards

- Slackware (https://www.linuxquestions.org/questions/slackware-14/)

- - file system cache issues 4.19.0 and onwards (https://www.linuxquestions.org/questions/slackware-14/file-system-cache-issues-4-19-0-and-onwards-4175646993/)

file system cache issues 4.19.0 and onwards

In my somewhat frustration fueled, erratic last thread:
https://www.linuxquestions.org/quest...it-4175645595/
I described a file system related problem where a compile job almost stopped when copying large files at the same time, regardless of to what disk or from they were copied.
The problem was particularly pronounced when using xfs and it is only present from kernel version 4.19.0, and all following versions NOT before. It affects ext2, ext3, ext4 also but to a lesser extent.
After adding another 16GB ram to the testing machine I noticed that it took much more time before the compile job slowed down and the machine became unresponsive.
This led me to suspect some cache related issue. I made a few test runs and I observed that as soon as cache passed ~ 23G (total 24G) while copying, the compile job slowed down to almost a halt, while the copying also slowed down significantly.
It seemed to Me as if these two processes were fighting over cache.
Sure enough, after echo 0 >/proc/sys/vm/vfs_cache_pressure the compilation runs without slowdown all the way through, while copying retains its steady +100MB/sec.
This "solution" is tested on 4.19.17 and a VERY heavily modified 5.0.0-rc3 both on xfs.
Setting vfs_cache_pressure to 0 is probably not advisable, but works without issues sofar.
Maybe someone else has a better solution. ( maybe the good kernel folks will change back whatever they changed for 4.19.0 ?).

Here's how to hit this bug on a default "current" install using the "generic" kernel on pre-zen AMD:

1. You need a decent amount of data to copy, probably at least 5-10 times as much as your ram and reasonably fast media to copy from and to (Gbit nfs mount, usb3 drive, regular hard drive...).

2. A dedicated xfs formatted regular rotating hard drive for the compile job (any io-latency sensitive parallellizable job will do), as it would'nt be fair to use the drive you're copying to. This problem is most likely present for ssd's as well, but because they are so damn fast, dysfunctional cache becomes less of an issue and you will probably not notice much.

For a job I recommend a defconfig linux kernel compile (parallellizable easy to redo).
Now open a few terminals with "top" in one of them, start copying in another (use mc, easy to start and stop). Watch buff/cache grow in top, as is reaches to within 70-80% of your ram, start compilation in another terminal, I use "time make -j16" on my eight core 9590 AMD.

You're probably going to see processors waiting for someting to do, watch "wa" and "id" in top while the compile crawls. You can try the hillbilly trick above (echo 0 >/proc/sys/vm/vfs_cache_pressure) and watch what happens, or you can reboot using any previos (to 4.19.0) kernel and redo the process.

I use core number +1
make -j9

you coud try few other options in regard of vm:
vm.swappiness
vm.drop_caches
vm.vfs_cache_pressure
vm.dirty_writeback_centisecs
vm.dirty_expire_centisecs
vm.dirty_background_bytes
vm.dirty_bytes

plus some tweaks in fstab
e.g
data=ordered (I would not use writeback but you can consider it understanding risks involved)
commit=

I have filed this as a regression bug on bugzilla. We'll see if anyone's interested I guess.

The problem was inode reclaim. Dave (Chinner) solved it.
link: https://bugzilla.kernel.org/show_bug.cgi?id=202441

Excellent. Thanks for sticking with it.

you did great Rogan , grab a "git" how to and keep up they great help.

It was actually great fun. I didn't expect the fast response, a very positive surprise.

The drama continues... Mr Gushins inode.c commit in 4.19.3 made the _real_ problem, introduced in 4.19-rc5, more generally visible. Memory management debugging is difficult.

Quote:

Originally Posted by rogan (Post 5955387)

Memory management debugging is difficult.

:p

Once upon a time, as a newbie, I decided it might be interesting to have a look at the mm code. Drove me to drink chasing it.

I still drink, but I gave up on becoming a kernel hacker.

Quote:

Roger 2019-01-29 21:19:52 UTC
...
Anyhow, after a filthy amount of copying and compiling ...

Quote:

Dave Chinner 2019-01-29 21:41:21 UTC
...
You've been busy!

Quote:

Roger 2019-01-29 03:36:40 UTC
...
Have to get some sleep first.

Quote:

Roger 2019-01-29 09:09:11 UTC
Created attachment 280837

Where was the sleep? :)

A salutary lesson, but very inspiring. My congrats.

Quote:

Originally Posted by rogan (Post 5955387)

The drama continues...

https://lkml.org/lkml/2019/1/29/1508 and https://lkml.org/lkml/2019/1/28/1865

allend: Some guy called on my phone and woke me up, I forgot to shut it off. I could not go back to sleep again. Sucks to be 51 :p
Petri: Thanks a bunch for the links. I was a bit worried that Dave had missed the fact that already rc5 had these issues. Now I know better :)

I can remember sitting in a conference discussing how the (very) latest -rc kernel broke a bunch of stuff. A fella sitting off to the side said he merged fixes for it as he was flying back to Aus, and had pushed out an updated -rc after he landed. Should be available.
Andrew Morton.
If he's involved, it's being looked at seriously.

It's a nasty performance/usability killing bug, and it's been hiding deep because of how fast hardware is these days.
I remember listening to a talk by Theo De Raadt, where he said they keep supporting all these old architectures because they help them discover both new and very old bugs.
I think this whole issue examplifies that in a way.

As I'm home on sickleave (influenza), I thought I'd fill you folks in on what's happened/ing.
If you would like to read the mail conversations, just follow the links posted by Petri above. Here's a short resume:

A commit made in 4.19-rc5, designed to increase cache pressure enough to get rid of stale cgroups, a sort of memory leak, proved to be a bit aggressive, or in DC's own wording "insane". From what I understand, a bit like cleaning up after the horses with a bulldozer...

The effect can be seen when filling the cache with large batches of continuous data: one second it appears full, the next, empty again. However, filling it with large sets of highly fragmented data, like a sizeable collection of kernel source trees, forced other things out. For instance, running compile jobs, which results in a tremendous amount of disk seeking and your computer becoming unresponsive, or locking up completely.

A commit made in 4.19.3 sought to remedy some side effects of the previos patch.
It was basically telling the bulldozer driver to avoid small houses, buses, cars...

The side effect of that "fix" was that filling up the cache with really large batches of continuous data, forced important stuff out and so on ...

DC want's these "patches" reverted. -Horse owners should clean up their own goo.
The patch submitter claims it does the job.

If you dont' mind a little bit of horse goo and want a normally behaving (pre 4.19.0) cache you can apply the reverts that DC suggested. I have tested them on 5-{rc3,rc5} and 4.19.{18,19}. They work!

I made patches for both 4.19.19 and 5.0-rc5. They are really simple one-liners. Apply them using "patch [file to patch] [patch]" as these are normal diff's. Files to patch are mm/vmscan.c and fs/inode.c in the kernel source root directory.