Linux Kernel: page reclaim priority?

DonDoerner · 04-17-2009, 07:39 PM

I am using a Linux 2.6.19 kernel and seeing some page reclamation behavior that I do not understand.

It appears that if I write a lot of data to an EXT3 file system, I get a lot of dirty pages. This is expected. It also appears that if I simultaneously need memory, the system swaps before flushing and reclaiming dirty pages. I did not expect this.

I have never worked in a Linux kernel, and have not worked in any other kernel for a few years, so some answers/pointers would be appreciated...

When a system runs into significant memory pressure, from where does Linux get pages, and in what order?
Is the information in /proc/meminfo accurate in real time? Is the polling mode of vmstat accurate in real time?
If I wanted to go look at the code that implements this, in what kernel file or files would I find the code?

Thanks very much.

sundialsvcs · 04-17-2009, 08:07 PM

You are probably seeing the "write through" behavior of the buffer pool. Normally, there's a surplus of memory so material goes to the pool and is "lazily" written to disk... but not immediately flushed. When memory contention appears, it can't afford to be lazy anymore.

DonDoerner · 04-17-2009, 08:17 PM

But would it swap before it pushed dirty pages out to files? Short term it might not make much (any?) difference, but long term whatever is swapped out has to be swapped back in. That's what is confusing me...

DonDoerner · 04-17-2009, 08:18 PM

And, BTW, thank you.

syg00 · 04-17-2009, 09:47 PM

Linux desktops are biased toward maintaining cache in memory - at the expense of potentially swapping some application(s) anonymous memory.
There are a few (crude) controls to alter this. Be aware that what you perceive is not necessarily the whole story - you see the end result, not the decision making processes (plural).
The most obvious knob is "swappiness" - this is a recommendation to the system about how you want things managed. 100 says always prefer to keep cache in memory - 0 (zero) says always try to minimize swapping. The default is 60 - can be seen by

Code:

cat /proc/sys/vm/swappiness

You can adjust its behaviour on the fly by echo'ing a new value to that control - maybe try 30 ... or 10 ... or 0. Depending on your distro can be hardened via sysctl.
In addition to this, dirty file writes are managed by the I/O schedulers - they use different algorithms to consolidate/optimize I/O. This includes delaying writes - for some seconds in some cases. There are individual controls per scheduler, and the scheduler itself can be changed.
Then there is the specific filesystem itself, and its block-level driver.
And any hardware (RAID) or software (LVM, mdadm ...) that happens to interpose itself.

All potentially affect the rate at which I/O is (physically) written, and consequently how fast storage (file cache in this case) can be released to the free pool.

There is also some "hidden" issues with memory allocation - the slab allocator has had some serious work in recent kernels. Even just upgrading the kernel might help from your kernel level.

As you can see, this is not a trivial "where can I go to look at the code" sort of thing. You will be all over the place.
/proc is a window into kernel structures - it is always current at the instant it was read, as the data only exists if (when) it is read. vmstat reads /proc "files", so the same applies, although it does some averaging.

DonDoerner · 04-18-2009, 01:31 PM

Thanks very much - time to go play!

Final question: where is a good place to read about stuff like "swappiness"? Incanting 'man 5 proc' is a bit of help, but is there a good URL to read up on this?

syg00 · 04-19-2009, 12:29 AM

google.com/linux - first hit was http://kerneltrap.org/node/3000
Any number of Linux news sites or blogs - I usually avoid blogs like the plague, but some are very good for things like this. Now if you don't happen to know what to look for, you'll have to subscribe to one (or more), or lkml to stay up to date.

Edit: should have also mentioned this - I have sat in on Jonathon Corbet talking on kernel issues, and he is very good.