mmap() error with length greater than 4G

bayoulinux · 06-11-2012, 12:16 PM

Hello:

Is there a way from user space to invoke mmap() with a length greater than 4G ?

I tried playing around with the NO_RESERVE mmap() flag as well as /proc/sys/vm/overcommit_memory without success.

BASH$ ulimit -u -v
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited

Using Fedora 15 with kernel 2.6.42.

Thanks!

NevemTeve · 06-11-2012, 12:28 PM

Is this a 64-bit system?

bayoulinux · 06-11-2012, 12:41 PM

Yes, its 64 bit... sorry left that out

Nominal Animal · 06-11-2012, 02:31 PM

I cannot replicate your problem on a kernel.org 3.3.2 kernel. In fact, I tried with a 4395903221760-byte file (4TB + 2MB) with mostly unallocated zeros, created using

Code:

dd if=/dev/urandom of=test bs=1024 count=4096 seek=4292870144

and I could map it fine using

Code:

map = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_NORESERVE, descriptor, (off_t)0);

even with a misaligned size (not a multiple of sysconf(_SC_PAGE_SIZE)). (I did read it here and there, getting zeros (except at the last 4MB at end) exactly as one would expect. The mapping was too large for me to wait for it to do a full scan.)

I also tested with a smaller file (4TB + 2MB) created the same way, and it worked just as well. No issues here.

Are you sure your size variable is of correct type (size_t)? Perhaps your size is of unsigned int type?

On the other hand, Fedora and Red Hat developers are known to break stuff. If you show the offending code, I'd be happy to test it on my machine. If you don't want to show a test case which replicates the problem on your machine, I'd say you should file a bug in the Fedora bugzilla, because there is no such issue on a kernel.org 3.3.2 kernel on x86-64.

bayoulinux · 06-11-2012, 04:15 PM

Ugh! Thanks for all the replies... I rebooted and things seem fine.

Is the memory used with mmap persistant, meaning it stays used even when a process ends? I wasn't munmapp'ing at all, so I wonder if I got myself in a bad state...

Regardless, thanks for all the responses. As always, its very appreciated!

Nominal Animal · 06-11-2012, 05:50 PM

Quote:

Originally Posted by bayoulinux

Is the memory used with mmap persistant, meaning it stays used even when a process ends?

When a process exits, all its mappings should be removed automatically.

It gets a bit more complicated than that if you're memory-mapping anything other than normal files, like named shared memory segments; the mappings do get torn down, but the shared memory itself may stay persistent. See shm_open() and shmget() for those kinds of shared memory. Just mapping a file MAP_SHARED does not make a mapping persistent that way.

It is possible to request such a large mapping that the kernel has trouble satisfying the memory needs for the kernel structures need to map so much virtual memory. Negative sizes in particular tend to be problematic, if you have lots of RAM so the kernel will try to satisfy the request. If you say ran a test with such a bad size, it would cause all kind of memory allocation and usage issues which would take quite a bit to settle down -- one of the cases where sudo sh -c 'sync ; echo 3 >/proc/sys/vm/drop_caches ; sync' may be useful; it clears the kernel page and inode caches, usually completely clearing up the issues. Rebooting of course clears the mess up immediately. If the OOM killer ends up taking down other processes, you'll need to log out and back in to get your full desktop environment running, and reassert the init level to make sure all services are still running (or just restart them) -- although usually it is a lot simpler to just reboot.

My point is that reboot is unlikely to really fix anything in Linux. It is usually much more useful to find out what actually happened, as then the real problem can be fixed. If it was a programming bug on your part, I wish you would say so, because then I wouldn't be stuck wondering (and running some tests) to find out if there is a kernel bug in there somewhere. I use very large mappings, in heavy simulations, so I rely on them working right; if there is a kernel bug, I need to know.

It's not like there is any shame in having bugs in one's code; nobody writes perfect code. We learn most from mistakes, not from successes.

sundialsvcs · 06-11-2012, 10:45 PM

As a rule of thumb I would say don't map such a large space "all at once," even on a 64-bit system. Allocate a more reasonably-sized "window" into the total resource, moving that window around as need be. Don't be a million-pound elephant. Consider how the operating system will go about trying to satisfy your request, and then design your application to be a gracious and well-behaved citizen that is very easy to get along with . . .

bayoulinux · 06-12-2012, 08:26 AM

I wish I could determine a bug... the only thing I changed was you were right, I was using 'unsigned int' versus size_t for the length parameter. However, that didn't seem to make a difference. I rebooted and then mmap API worked fine when invoked. I dunno'... I appreciate you running tests and everyone taking a look, and I'd be very happy to share anything I've learned (that's how I and everyone else reading this thread may gain knowledge), but I can't put my finger upon something specific.

Thank you again everyone for taking a look... if I see the problem again and gain further insight, I will not hesitate to post.

johnsfine · 06-12-2012, 09:34 AM

Quote:

Originally Posted by bayoulinux

I was using 'unsigned int' versus size_t for the length parameter. However, that didn't seem to make a difference.

How could that not make a difference?

In x86_64 architecture, an unsigned int can't hold a number 2**32 or larger. (size_t can hold such numbers).

bayoulinux · 06-12-2012, 11:54 AM

Okay, I'm keeping my promise and posting...

I retraced my steps, and it seems that my failing case, independent of my original post, is when I did not set the MAP_NORESERVE flag within the call to mmap(). Total pilot error on my part.

So that was the mystery. Does anyone want to ellorate more about the use of this flag in addition to what I see on the mmap() man pages.

Really sorry for all the confusion... I was spinning myself...

Nominal Animal · 06-12-2012, 02:03 PM

Quote:

Originally Posted by bayoulinux

Does anyone want to ellorate more about the use of this flag in addition to what I see on the mmap() man pages.

When you mmap() a file without MAP_NORESERVE, the mapping is only allowed if the size does not exceed current memory allocation limits. In other words, without MAP_NORESERVE, you can only mmap() as large a chunk as you could allocate. The reason for this check is that in some situations the file cannot be relied upon for backing, and the contents must stay either in RAM or swap. I do believe these situations are exceptional, things like disk full, write error, filesystem remounted read-only, temporary loss of connection to the server if using a remote filesystem, and so on.

MAP_NORESERVE indicates that the process is willing to rely solely on the file backing, and therefore also avoid the size restrictions. The downside is that if the kernel cannot read a part of the file from disk, or write a modified page back, whenever it needs to, without delay, a segment violation signal (SIGSEGV or SIGBUS) is delivered to the process. The signal can be caught, but POSIX states that the process state is undefined after it is delivered. Essentially, you can do some limited cleanup, but the signal handler must cause the process to exit.

(There are certain tricks one can do in Linux, but they are rather complicated to implement. The main problem is that even if the signal handler returns, the same instruction causing the problem will be rerun, repeating the problem endlessly. I have found that it is more robust to just allow the process to die if such errors occur, and maybe have a supervisor process that spawns a new process if that happens. After all, it is a rare, exceptional situation. I have wondered whether asynchronous cancellation of the offending thread would work, but haven't tested that: mainly because it is still nontrivial to find out the offending thread, and even if it works, it is completely unportable and not guaranteed to work on different pthread library versions or kernel versions.)

johnsfine · 06-12-2012, 02:10 PM

I am confused by the phrase in the man page:
"In kernels before 2.6, this flag only had effect for private writable mappings."

If the mapping is private writable, the flag obviously matters. Is your mapping private writable?

If the mapping is not private writable, I have no clue why that flag should matter. I would have thought that swap space should not be needed for the mapping if it is not private writable.

Edit: While I was typing that, the answer was provided one post above. I expect that answer should cover whatever was confusing you as well as what was confusing me.