ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Memory-mapped files pushe the responsibility for requesting I/O to the virtual-memory manager. However, the ultimate speed of the operation rests upon but one thing: was a physical I/O request necessary, or not?
If the answer is "yes," then of course there will be no appreciable speed difference. If, on the other hand, you are making many random requests for data within a particular window of a file, memory-mapping can help significantly, because they leverage the already highly-optimized algorithms of the virtual memory manager.
Last edited by sundialsvcs; 02-28-2012 at 04:14 PM.
I think mmap is more for ease of use than speed. That's why I use it anyway.
Just because you mmap'd a file doesn't mean it's been read into memory.
It will still just page it in when you access it. It will I imagine read in in page sizes
just as read does. You can try adjusting the size of the read buffer or comparing small and large files. If you stat a file it shows the preferred block size. Usually about 4096k I think or
the size of a memory page.
That's exactly how it works. The virtual-memory system can "page in" from any source, not just the paging file. (This is used for example when managing modules and libraries of various kinds.) When you touch a portion of this mapped memory, a page-fault occurs and it is resolved by the OS from the specified file. It can be very efficient especially when many processes need to hit the same file because the copies can readily be shared using well-developed OS code. But, "fast" is entirely dependent on whether or not the data is present. If it's not, then a disk read is going to take place (as it would also take place with any other form of file I/O), and you're going to pay more or less the same price for the privilege. Under the right set of circumstances, for which it was designed, mmap() is the cat's meow. In other circumstances it is nondescript.
There is a bit more to it than I thought.
I had the code straight out of Stevens' Unix programming book. He used it for one single file. mmap is faster when used on one file, so I thought, well...
But for simply copying a list of files it doesn't seem to help, because, i now understand, for each file it is certain its contents will be read..
It is a useful tool. Part of programming is to let the OS do the work for you.
It is good for strictly structured files of records.
If you have a load of floating points in a file, you mmap them you have an instant array.
No messing about with malloc and all that nonsense. Less chance of error.
Or if you are operating on a file, say encoding it, mmap it you have a convenient giant char string.
think of it more as saving programming time than processing time.
much more valuable.
hint: if you extend an mmap'd file you will need to seek past the end first to establish the new size then write back at the append position.
Last edited by bigearsbilly; 03-05-2012 at 02:13 AM.
and with an allocated 'buffer' of 'buffersize', I also tried:
Compiling with gcc/linux 2.4.X, both versions are just as fast. Why don't I get a faster copy with mmap as promised in the books?
The problem with your code is that lseek() to required end of file and then mmap() and memcpy() to it do not create really nice file on physical disk. If you are not on SSD drive, your memcpy() will create serious walking of disk's head. In fact, if you copied first to in-memory buffer, and then to destination memory area, you might have gotten better time.
This is the advantage of read()/write() example: you read to memory holding disk head over source file, then you write holding disk head over destination file - there's no head walking like in mmap() to mmap() example.
Extending destination file with lseek() is bound to create disk fragmentation, and you may find better result by mmap() of read() on source file and plain write() on output. In such example I saw 50% speedup over open() + read() in large mailbox example.
IN following example, results are pretty disappointing for mmap() with lseek() to end of file and memcopy. Mmap() with buffer roughly compares to plain read in huge chunk.
mtodorov@domac:~/c$ time ./mmap-cpy --read /var/mail/mtodorov m1
mtodorov@domac:~/c$ time ./mmap-cpy --mmap /var/mail/mtodorov m1
mtodorov@domac:~/c$ time ./mmap-cpy --mmap+buffer /var/mail/mtodorov m1
I have found in my travails that using a seek a lot is very expensive.
Have you tried using madvise
IN the end you are defeated by disk speed even if you use Linux-specific sendfile (2). Disk head just doesn't go any faster, and all four methods spend 2.000 - 2.500 seconds in work and 51.0s to 1m06s in waiting on disk. Especially writing.