LinuxQuestions.org - mmap is not faster than read/write ???

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - mmap is not faster than read/write ??? (https://www.linuxquestions.org/questions/programming-9/mmap-is-not-faster-than-read-write-931799/)

mmap is not faster than read/write ???

Hi

I am trying to use mmap for copying files. However, my mmap programs are not any faster than my ordinary write/read programs.

I am looping over a number of files, and for each I do:

fdin=open(filenamein,O_RDONLY)
fstat(fdin,&statstruct);
fdout=open(filenameout,O_RDWR|O_CREAT|O_TRUNC,S_IRUSR|S_IWUSR)

for the mmap case I do

lseek(fdout,statstruct.st_size-1,SEEK_SET)
write(fdout,"",1)
mapfrom=mmap(0,statstruct.st_size,PROT_READ,MAP_FILE|MAP_SHARED,fdin,0)
mapto=mmap(0,statstruct.st_size,PROT_READ|PROT_WRITE,MAP_FILE|MAP_SHARED,fdout,0)
memcpy(mapto,mapfrom,statstruct.st_size);
munmap(mapto,statstruct.st_size);
munmap(mapfrom,statstruct.st_size);

and with an allocated 'buffer' of 'buffersize', I also tried:
while (n=read(fdin,buffer,buffersize))
write(fdout,buffer,n)

Compiling with gcc/linux 2.4.X, both versions are just as fast. Why don't I get a faster copy with mmap as promised in the books?

Memory-mapped files pushe the responsibility for requesting I/O to the virtual-memory manager. However, the ultimate speed of the operation rests upon but one thing: was a physical I/O request necessary, or not?

If the answer is "yes," then of course there will be no appreciable speed difference. If, on the other hand, you are making many random requests for data within a particular window of a file, memory-mapping can help significantly, because they leverage the already highly-optimized algorithms of the virtual memory manager.

I think mmap is more for ease of use than speed. That's why I use it anyway.

Just because you mmap'd a file doesn't mean it's been read into memory.
It will still just page it in when you access it. It will I imagine read in in page sizes
just as read does. You can try adjusting the size of the read buffer or comparing small and large files. If you stat a file it shows the preferred block size. Usually about 4096k I think or
the size of a memory page.

Of course i could be wrong or you could play about with this:
http://pubs.opengroup.org/onlinepubs...x_madvise.html

That's exactly how it works. The virtual-memory system can "page in" from any source, not just the paging file. (This is used for example when managing modules and libraries of various kinds.) When you touch a portion of this mapped memory, a page-fault occurs and it is resolved by the OS from the specified file. It can be very efficient especially when many processes need to hit the same file because the copies can readily be shared using well-developed OS code. But, "fast" is entirely dependent on whether or not the data is present. If it's not, then a disk read is going to take place (as it would also take place with any other form of file I/O), and you're going to pay more or less the same price for the privilege. Under the right set of circumstances, for which it was designed, mmap() is the cat's meow. In other circumstances it is nondescript.

There is a bit more to it than I thought.
I had the code straight out of Stevens' Unix programming book. He used it for one single file. mmap is faster when used on one file, so I thought, well...
But for simply copying a list of files it doesn't seem to help, because, i now understand, for each file it is certain its contents will be read..

It is a useful tool. Part of programming is to let the OS do the work for you.

It is good for strictly structured files of records.
e.g:
If you have a load of floating points in a file, you mmap them you have an instant array.
No messing about with malloc and all that nonsense. Less chance of error.
Or if you are operating on a file, say encoding it, mmap it you have a convenient giant char string.

think of it more as saving programming time than processing time.
much more valuable.

hint: if you extend an mmap'd file you will need to seek past the end first to establish the new size then write back at the append position.

Quote:

Originally Posted by muggabug (Post 4614466)

lseek(fdout,statstruct.st_size-1,SEEK_SET)
write(fdout,"",1)
mapfrom=mmap(0,statstruct.st_size,PROT_READ,MAP_FILE|MAP_SHARED,fdin,0)
mapto=mmap(0,statstruct.st_size,PROT_READ|PROT_WRITE,MAP_FILE|MAP_SHARED,fdout,0)
memcpy(mapto,mapfrom,statstruct.st_size);
munmap(mapto,statstruct.st_size);
munmap(mapfrom,statstruct.st_size);

and with an allocated 'buffer' of 'buffersize', I also tried:
while (n=read(fdin,buffer,buffersize))
write(fdout,buffer,n)

Compiling with gcc/linux 2.4.X, both versions are just as fast. Why don't I get a faster copy with mmap as promised in the books?

Hi, muggabug,

The problem with your code is that lseek() to required end of file and then mmap() and memcpy() to it do not create really nice file on physical disk. If you are not on SSD drive, your memcpy() will create serious walking of disk's head. In fact, if you copied first to in-memory buffer, and then to destination memory area, you might have gotten better time.

This is the advantage of read()/write() example: you read to memory holding disk head over source file, then you write holding disk head over destination file - there's no head walking like in mmap() to mmap() example.

Extending destination file with lseek() is bound to create disk fragmentation, and you may find better result by mmap() of read() on source file and plain write() on output. In such example I saw 50% speedup over open() + read() in large mailbox example.

Hope this helps.

Rgdz,
mbarley42

IN following example, results are pretty disappointing for mmap() with lseek() to end of file and memcopy. Mmap() with buffer roughly compares to plain read in huge chunk.

Code:

mtodorov@domac:~/c$ time ./mmap-cpy --read /var/mail/mtodorov m1

real    0m46.972s

user    0m0.000s

sys    0m2.580s

mtodorov@domac:~/c$ time ./mmap-cpy --mmap /var/mail/mtodorov m1

real    1m6.064s

user    0m0.320s

sys    0m1.400s

mtodorov@domac:~/c$ time ./mmap-cpy --mmap+buffer /var/mail/mtodorov m1

real    0m47.748s

user    0m0.632s

sys    0m1.600s

mtodorov@domac:~/c$

I have found in my travails that using a seek a lot is very expensive.

Have you tried using madvise

Note: mmap is unix-specific, so if you want to develop multiplatform-programs, don't use it.

Quote:

Originally Posted by bigearsbilly (Post 4818432)

I have found in my travails that using a seek a lot is very expensive.

Have you tried using madvise

IN the end you are defeated by disk speed even if you use Linux-specific sendfile (2). Disk head just doesn't go any faster, and all four methods spend 2.000 - 2.500 seconds in work and 51.0s to 1m06s in waiting on disk. Especially writing. :scratch:

Rgdz,
mbarley