LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 01-26-2019, 08:30 AM   #1
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Rep: Reputation: 35
file system cache issues 4.19.0 and onwards


In my somewhat frustration fueled, erratic last thread:
https://www.linuxquestions.org/quest...it-4175645595/
I described a file system related problem where a compile job almost stopped when copying large files at the same time, regardless of to what disk or from they were copied.
The problem was particularly pronounced when using xfs and it is only present from kernel version 4.19.0, and all following versions NOT before. It affects ext2, ext3, ext4 also but to a lesser extent.
After adding another 16GB ram to the testing machine I noticed that it took much more time before the compile job slowed down and the machine became unresponsive.
This led me to suspect some cache related issue. I made a few test runs and I observed that as soon as cache passed ~ 23G (total 24G) while copying, the compile job slowed down to almost a halt, while the copying also slowed down significantly.
It seemed to Me as if these two processes were fighting over cache.
Sure enough, after echo 0 >/proc/sys/vm/vfs_cache_pressure the compilation runs without slowdown all the way through, while copying retains its steady +100MB/sec.
This "solution" is tested on 4.19.17 and a VERY heavily modified 5.0.0-rc3 both on xfs.
Setting vfs_cache_pressure to 0 is probably not advisable, but works without issues sofar.
Maybe someone else has a better solution. ( maybe the good kernel folks will change back whatever they changed for 4.19.0 ?).

Here's how to hit this bug on a default "current" install using the "generic" kernel on pre-zen AMD:

1. You need a decent amount of data to copy, probably at least 5-10 times as much as your ram and reasonably fast media to copy from and to (Gbit nfs mount, usb3 drive, regular hard drive...).

2. A dedicated xfs formatted regular rotating hard drive for the compile job (any io-latency sensitive parallellizable job will do), as it would'nt be fair to use the drive you're copying to. This problem is most likely present for ssd's as well, but because they are so damn fast, dysfunctional cache becomes less of an issue and you will probably not notice much.

For a job I recommend a defconfig linux kernel compile (parallellizable easy to redo).
Now open a few terminals with "top" in one of them, start copying in another (use mc, easy to start and stop). Watch buff/cache grow in top, as is reaches to within 70-80% of your ram, start compilation in another terminal, I use "time make -j16" on my eight core 9590 AMD.

You're probably going to see processors waiting for someting to do, watch "wa" and "id" in top while the compile crawls. You can try the hillbilly trick above (echo 0 >/proc/sys/vm/vfs_cache_pressure) and watch what happens, or you can reboot using any previos (to 4.19.0) kernel and redo the process.

Last edited by rogan; 01-28-2019 at 12:50 AM.
 
Old 01-28-2019, 01:43 PM   #2
Aeterna
Member
 
Registered: Aug 2017
Location: Terra Mater
Distribution: VM Host: Slackware-current, VM Guests: Artix, CRUX, FreeBSD, Funtoo, HardenedBSD, OpenIndiana
Posts: 178

Rep: Reputation: Disabled
I use core number +1
make -j9

you coud try few other options in regard of vm:
vm.swappiness
vm.drop_caches
vm.vfs_cache_pressure
vm.dirty_writeback_centisecs
vm.dirty_expire_centisecs
vm.dirty_background_bytes
vm.dirty_bytes

plus some tweaks in fstab
e.g
data=ordered (I would not use writeback but you can consider it understanding risks involved)
commit=
 
Old 01-28-2019, 03:54 PM   #3
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
I have filed this as a regression bug on bugzilla. We'll see if anyone's interested I guess.
 
1 members found this post helpful.
Old 01-29-2019, 06:39 PM   #4
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
The problem was inode reclaim. Dave (Chinner) solved it.
link: https://bugzilla.kernel.org/show_bug.cgi?id=202441
 
7 members found this post helpful.
Old 01-29-2019, 07:22 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,509

Rep: Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744
Excellent. Thanks for sticking with it.
 
Old 01-29-2019, 07:58 PM   #6
Drakeo
Senior Member
 
Registered: Jan 2008
Location: Urbana IL
Distribution: Slackware, Slacko,
Posts: 3,575
Blog Entries: 3

Rep: Reputation: 457Reputation: 457Reputation: 457Reputation: 457Reputation: 457
you did great Rogan , grab a "git" how to and keep up they great help.
 
Old 01-29-2019, 09:38 PM   #7
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
It was actually great fun. I didn't expect the fast response, a very positive surprise.
 
Old 01-30-2019, 06:24 AM   #8
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
The drama continues... Mr Gushins inode.c commit in 4.19.3 made the _real_ problem, introduced in 4.19-rc5, more generally visible. Memory management debugging is difficult.

Last edited by rogan; 01-30-2019 at 06:31 AM.
 
Old 01-30-2019, 06:29 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,509

Rep: Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744
Quote:
Originally Posted by rogan View Post
Memory management debugging is difficult.


Once upon a time, as a newbie, I decided it might be interesting to have a look at the mm code. Drove me to drink chasing it.

I still drink, but I gave up on becoming a kernel hacker.
 
Old 01-30-2019, 07:35 AM   #10
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,103

Rep: Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792Reputation: 1792
Quote:
Roger 2019-01-29 21:19:52 UTC
...
Anyhow, after a filthy amount of copying and compiling ...
Quote:
Dave Chinner 2019-01-29 21:41:21 UTC
...
You've been busy!
Quote:
Roger 2019-01-29 03:36:40 UTC
...
Have to get some sleep first.
Quote:
Roger 2019-01-29 09:09:11 UTC
Created attachment 280837
Where was the sleep?

A salutary lesson, but very inspiring. My congrats.
 
Old 01-30-2019, 09:35 AM   #11
Petri Kaukasoina
Member
 
Registered: Mar 2007
Posts: 389

Rep: Reputation: 246Reputation: 246Reputation: 246
Quote:
Originally Posted by rogan View Post
The drama continues...
https://lkml.org/lkml/2019/1/29/1508 and https://lkml.org/lkml/2019/1/28/1865
 
3 members found this post helpful.
Old 01-30-2019, 10:34 AM   #12
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
allend: Some guy called on my phone and woke me up, I forgot to shut it off. I could not go back to sleep again. Sucks to be 51
Petri: Thanks a bunch for the links. I was a bit worried that Dave had missed the fact that already rc5 had these issues. Now I know better

Last edited by rogan; 01-30-2019 at 10:44 AM.
 
Old 01-30-2019, 06:25 PM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,509

Rep: Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744
I can remember sitting in a conference discussing how the (very) latest -rc kernel broke a bunch of stuff. A fella sitting off to the side said he merged fixes for it as he was flying back to Aus, and had pushed out an updated -rc after he landed. Should be available.
Andrew Morton.
If he's involved, it's being looked at seriously.
 
Old 01-30-2019, 11:40 PM   #14
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
It's a nasty performance/usability killing bug, and it's been hiding deep because of how fast hardware is these days.
I remember listening to a talk by Theo De Raadt, where he said they keep supporting all these old architectures because they help them discover both new and very old bugs.
I think this whole issue examplifies that in a way.
 
1 members found this post helpful.
Old 02-05-2019, 09:49 AM   #15
rogan
Member
 
Registered: Aug 2004
Distribution: Slackware
Posts: 121

Original Poster
Rep: Reputation: 35
As I'm home on sickleave (influenza), I thought I'd fill you folks in on what's happened/ing.
If you would like to read the mail conversations, just follow the links posted by Petri above. Here's a short resume:

A commit made in 4.19-rc5, designed to increase cache pressure enough to get rid of stale cgroups, a sort of memory leak, proved to be a bit aggressive, or in DC's own wording "insane". From what I understand, a bit like cleaning up after the horses with a bulldozer...

The effect can be seen when filling the cache with large batches of continuous data: one second it appears full, the next, empty again. However, filling it with large sets of highly fragmented data, like a sizeable collection of kernel source trees, forced other things out. For instance, running compile jobs, which results in a tremendous amount of disk seeking and your computer becoming unresponsive, or locking up completely.

A commit made in 4.19.3 sought to remedy some side effects of the previos patch.
It was basically telling the bulldozer driver to avoid small houses, buses, cars...

The side effect of that "fix" was that filling up the cache with really large batches of continuous data, forced important stuff out and so on ...

DC want's these "patches" reverted. -Horse owners should clean up their own goo.
The patch submitter claims it does the job.

If you dont' mind a little bit of horse goo and want a normally behaving (pre 4.19.0) cache you can apply the reverts that DC suggested. I have tested them on 5-{rc3,rc5} and 4.19.{18,19}. They work!

I made patches for both 4.19.19 and 5.0-rc5. They are really simple one-liners. Apply them using "patch [file to patch] [patch]" as these are normal diff's. Files to patch are mm/vmscan.c and fs/inode.c in the kernel source root directory.
Attached Files
File Type: txt rev-4.19-fs_inode.c.patch.txt (121 Bytes, 6 views)
File Type: txt rev-4.19-mm_vmscan.c.patch.txt (519 Bytes, 6 views)
File Type: txt rev-5rc5-fs_inode.c.patch.txt (291 Bytes, 0 views)
File Type: txt rev-5rc5-mm_vmscan.c.patch.txt (527 Bytes, 1 views)
 
3 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting Artifacts and occasional "signal not supported" messages - External Monitor from fc25 onwards 6th_sense Linux - Laptop and Netbook 3 02-08-2019 08:43 PM
LXer: Onwards to Valhalla: Java ain't dead yet and it's only getting bigger LXer Syndicated Linux News 0 10-06-2017 02:42 AM
awk - print field 2 from line 8 and onwards dazdaz Programming 2 03-27-2013 04:55 AM
LXer: Onwards to Four LXer Syndicated Linux News 0 10-28-2012 02:40 PM
Onwards to kde4.2 jdkaye Debian 11 08-01-2009 12:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 02:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration