How to avoid writing back the pages in mmaped-region to disks?

Seongyun Ko · 01-17-2017, 06:21 AM

I use Cent OS 7 with kernel 3.10.0-327.36.1.el7.x86_64.

The kernel provides memory mapping capability via mmap() API.

In my C++ program, I use memory mapping for a file, then I update its contents as if it's an array.
I found that during the updates on the mapped memory regions the dirty pages are written back to disks incurring huge disk I/O, which I want to avoid.

I tried mlock-ing to lock the pages in memory. But it still writes back the dirty pages.
I guess the background kernel threads keep writing back the dirty pages.

Can I 'lock' a certain memory mapped region so that even the dirty pages are not written back to disks?

sundialsvcs · 01-17-2017, 09:40 AM

When you map a file, Linux is going to write data back to that file quickly, because it knows that it is a file. It very-briefly buffers the data, then writes it, because, "that's what disk-files are supposed to do."

If you are incurring very high disk-I/O volumes ... which doesn't surprise me in the least ... then you need to change your algorithm; change your approach. Random-access disk I/O is very expensive, because it requires two mechanical movements: the read/write head must "seek," then the disk must rotate into position. "Milliseconds add up very fast."

One strategy might be to accumulate updates that you intend to make to the memory-mapped file, initially using a non-mapped buffer, then apply the updates to the memory-mapped file in ascending order by some kind of key that is relative to the position of the data in the file. In this way, the disk drive has less work to do.

More generally, instead of mapping the entire file and treating it like an array, set a window into that file and do all of the updates that fall within that window before moving the window. Accumulate all changes that are to be made to a single record before updating that record in the mapped file.

It is also possible to "go completely retro" and do it the way that COBOL programmers did it when they didn't have disk space: only magnetic tape. If you apply sorted updates to a sorted file, you produce another sorted version of that file. (Or, you update the file extremely efficiently.)

In a famous-to-me program that I wrote (and sold!) a decade ago, I made an algorithm run more than one hundred times faster by applying these techniques.