When you map a file, Linux is going to write data back to that file quickly, because it knows that it is a file. It very-briefly buffers the data, then writes it, because, "that's what disk-files are supposed to do."
If you are incurring very high disk-I/O volumes ... which doesn't surprise me in the least ... then you need to change your algorithm; change your approach. Random-access disk I/O is very expensive, because it requires two mechanical movements: the read/write head must "seek," then the disk must rotate into position. "Milliseconds add up very fast."
One strategy might be to accumulate updates that you intend to make to the memory-mapped file, initially using a non-mapped buffer, then apply the updates to the memory-mapped file in ascending order by some kind of key that is relative to the position of the data in the file. In this way, the disk drive has less work to do.
More generally, instead of mapping the entire file and treating it like an array, set a window into that file and do all of the updates that fall within that window before moving the window. Accumulate all changes that are to be made to a single record before updating that record in the mapped file.
It is also possible to "go completely retro" and do it the way that COBOL programmers did it when they didn't have disk space: only magnetic tape. If you apply sorted updates to a sorted file, you produce another sorted version of that file. (Or, you update the file extremely efficiently.)
In a famous-to-me program that I wrote (and sold!) a decade ago, I made an algorithm run more than one hundred times faster by applying these techniques.
Last edited by sundialsvcs; 01-17-2017 at 09:41 AM.
|