LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (http://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   mmap of several GB of reserved memory using (http://www.linuxquestions.org/questions/linux-kernel-70/mmap-of-several-gb-of-reserved-memory-using-805818/)

PeterWurmsdobler 05-04-2010 07:19 AM

mmap of several GB of reserved memory using
 
Hello,

I am using 64bit Ubuntu 9.04, kernel 2.8.28, on a PC equipped with 12GB RAM. I would like to capture and store data in a reserved memory area, 8GB above 4GB. This memory area is currently being reserved at boot time by passing "mem=4G memmap=8G$4G" to the kernel. (Interestingly enough, "free" reports only 3GB memory, a mismatch I do not quite understand.)

I wrote a small char device driver and mmap'ed the the entire reserved physical memory into the user's virtual memory space. From user space I can open the dev file, call mmap, but unfortunatley, when I try to read/write from the mmaped area, the kernel is complaining with a "Corrupted page table at address ..." and a register dump.

This is what my code does. In my rmem_test kernel module I define for 8GB reserved memory at an offset of 4GB:

Code:

#define RAW_DATA_SIZE 0x200000000UL
#define RAW_DATA_OFFSET 0x100000000UL

In the init_modulule, I ioremap the physical memory to kernel virtual memory, and in addition set it to zero:

Code:

rawdataStart = ioremap(RAW_DATA_OFFSET, RAW_DATA_SIZE);
memset(rawdataStart, 0, RAW_DATA_SIZE);

After some parameter checking, the mmap file operation does:

Code:

remap_pfn_range(vma, vma->vm_start,
    (unsigned long) rawdataStart, RAW_DATA_SIZE, PAGE_SHARED);

In user space, the mmap call on the corresponding file descriptor still works:

Code:

    fd = open("/dev/rawdata", O_RDWR | O_SYNC);
    mptr = mmap(0, RAW_DATA_SIZE, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, fd, 4096);
    mptr[0] = 'a';

but the assignment fails with the dmesg output below.

What am I doing wrong? Do I need to loop through smaller chunks when calling remap_pfn_range?

Help is very much appreciated.

Kind regards,
peter


[ 481.669633] rmem_test: opened
[ 481.669696] rmem_test: mmap
[ 481.717016] rmem_test: mmap OK
[ 481.717157] rmem_map: Corrupted page table at address 7f63179e1000
[ 481.717222] PGD b6471067 PUD b44eb067 PMD b55ed067 PTE 7c20011890000227
[ 481.717434] Bad pagetable: 000d [#3] SMP
[ 481.717567] last sysfs file: /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
[ 481.717649] CPU 0
[ 481.717741] Modules linked in: rmem_test nfs lockd nfs_acl sunrpc input_polldev video output lp parport snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr psmouse serio_raw iTCO_wdt iTCO_vendor_support usbhid nvidia(P) ohci1394 r8169 mii ieee1394 floppy
[ 481.719030] Pid: 3781, comm: rmem_map Tainted: P D 2.6.28.paw2 #1
[ 481.719094] RIP: 0033:[<00000000004006bc>] [<00000000004006bc>] 0x4006bc
[ 481.719198] RSP: 002b:00007fff1f442ee0 EFLAGS: 00010206
[ 481.719260] RAX: 00007f63179e1000 RBX: 0000000000400730 RCX: 0000000000000002
[ 481.719325] RDX: 00007f6517d4e9c0 RSI: 00007f6517f6e029 RDI: 00007f6517f6e027
[ 481.719390] RBP: 00007fff1f442f00 R08: 0000000000000001 R09: 0000000000000002
[ 481.719455] R10: 0000000000000022 R11: 00000000ffffffff R12: 0000000000400550
[ 481.719520] R13: 00007fff1f442fd0 R14: 0000000000000000 R15: 0000000000000000
[ 481.719586] FS: 00007f6517f666f0(0000) GS:ffffffff80a8f000(0000) knlGS:0000000000000000
[ 481.719667] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 481.719730] CR2: 00007f63179e1000 CR3: 00000000b44e5000 CR4: 00000000000006a0
[ 481.719795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 481.719860] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 481.719926] Process rmem_map (pid: 3781, threadinfo ffff8800b446e000, task ffff8800b8cf4320)
[ 481.720007]
[ 481.720061] RIP [<00000000004006bc>] 0x4006bc
[ 481.720158] RSP <00007fff1f442ee0>
[ 481.720236] ---[ end trace e47eba847a88b683 ]---
[ 481.756598] rmem_test: released

PeterWurmsdobler 05-04-2010 07:28 AM

Sorry for my post ending up in Member Intro Forum; it was intended for the Linux kernel forum. I thought that I had skipped the intro forum. What an entry.
Perhaps the contents can be moved there?
Cheers, peter

TheIndependentAquarius 05-04-2010 07:32 AM

Hello,

I have reported your post for being moved to some appropriate forum !

XavierP 05-04-2010 12:26 PM

And moved as desired and reported :)

PeterWurmsdobler 05-17-2010 12:51 PM

If memory is reserved during boot time (by passing "mem=4G memmap=8G$4G" to the kernel"), does the kernel build page tables for this block of memory?

If so, why cannot I mmap the kernel pages to a user space application in my kernel module?

If not, how can I force the kernel to build page tables for the reserved memory later?

Any ideas?
peter

nini09 05-17-2010 04:05 PM

You should pass physical memory address to remap_pfn_range function, not kernel virtual memory address.

PeterWurmsdobler 05-18-2010 04:57 AM

Hello and thanks, I'll try the following:

Code:

remap_pfn_range(vma, vma->vm_start,
    RAW_DATA_OFFSET >> PAGE_SHIFT,
    RAW_DATA_SIZE, PAGE_SHARED);

or perhaps:

Code:

remap_pfn_range(vma, vma->vm_start,
    virt_to_phys(rawdataStart) >> PAGE_SHIFT,
    RAW_DATA_SIZE, PAGE_SHARED);

I'll keep you posted,
peter

PeterWurmsdobler 05-18-2010 12:34 PM

works
 
Code:

remap_pfn_range(vma, vma->vm_start,
    RAW_DATA_OFFSET >> PAGE_SHIFT,
    RAW_DATA_SIZE, PAGE_SHARED);

works fine. So its the down-shifted physical address that has to be remapped.

peter

jaydesh9 10-08-2012 10:04 PM

Don't you need a page fault handler here for the vm_operations_struct.fault ?

PeterWurmsdobler 10-09-2012 06:19 AM

page fault handler
 
Hello, I have never come across a page fault handler. Tell me more.
peter

jaydesh9 10-09-2012 01:20 PM

You mentioned that its resolved and it works, i wonder how !

After the mmaping is done, whenever the user space application requests a page , doesn't your driver need to handle the address conversion ? [1]


If you look at the /proc/vmallocinfo, it shows the ioremapped memory chunk ! print the address returned by the ioremap in the device driver init and it matches the vmallocinfo entry !


I am doing almost similar to what you did here , my platform is 64 bit Linux 3.0 with more than 128 GB of ram. The only difference being i use boot params as "memmap=8G$64G"

currently when my user space application does a mmap, mmap driver function hangs the system while doing the remap_pfn_range ( page fault handling issue ? )

Pls share information if you have other findings ?


Cheers,
-J

[1]
function : mmap_drv_vmmap at http://www.scs.ch/~frey/linux/memorymap.html

jaydesh9 10-09-2012 09:03 PM

when I ask to reserve 128GB memory starting at 64 GB.
I correctly see the following in /proc/vmallocinfo

Code:

0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus starting virtual address is : 0xffffc9001f3a8000

But in my mmap driver routine when i print the vma->start its around 4 GB (3.8GB) . So i think when i try to do remap_pfn_range , it starts at the boundary of 4GB instead of 64GB and screws up the page table ?

Any ideas ?

PeterWurmsdobler 10-10-2012 05:43 AM

Example code
 
Hello,

Using Ubuntu 12.04 with kernel 3.2.0-30 on a 12GB PC, I have configured /etc/default/grub with

Code:

GRUB_CMDLINE_LINUX="memmap=8G\\\\$4G"
After running update-grub and a reboot, 8G from 4GB onwards should be reserved. Then I insert the kernel module contained in http://www.wurmsdobler.org/files/resmem.tgz and run the test application also contained in that tgz archive.

The test application uses CUDA; if you haven't got that, you'll need to comment out calls to it. You will be left with a simple executable that opens the device file created by the resmem driver, mmaps the reserved memory and copies data into it.

I am using this code in a more complex application and it has worked so far.

Cheers,
peter

jaydesh9 10-11-2012 10:45 PM

Thanks Peter for the inputs.
Currently, I see two issues :
1. the mmap syscall takes in the size as int thus anything in the range of few GBs mmaping results in glibc complaining about the invalid argument to mmap because its truncated to 0 .
2.
Quote:

But in my mmap driver routine when i print the vma->start its around 4 GB (3.8GB) .
Here's the dmesg from the driver's mmap routine:

Quote:

Oct 10 19:39:48 Node001 kernel: MymemDriver: mmap: vm start : 3074633728
Oct 10 19:39:48 Node001 kernel: MymemDriver: mmap: vm end : 4148375552
Oct 10 19:39:48 Node001 kernel: MymemDriver: mmap: vm pg offset : 0
and it shows only 1 page as part of the vm : (4148375552 - 3074633728) ~= 1 GB

I think even if we start reserving at offset that is beyond 4G ( in my example my kernel boot params are 128G$64G . thus reservation offset is 64GB ) i still see the vma->start to be around 4G.

I'll try out reservation starting at 4GB offset. As you mentioned that should work.

Regards,
-J

PeterWurmsdobler 10-13-2012 03:12 AM

Hello, two quick comments.

When I reserve memory like mentioned earlier, then the memory addresses are physical addresses. Since the mmap creates virtual addresses, I would not expect them to be the same as physical addresses.

I have made the experience, that addresses just below 4GB are cluttered by BIOS several small legacy address ranges, from the time when 4GB was much more than any software expected to use. You can see these address ranges when the kernel boots. Reserving in this area won't work because of conflicts with legacy hardware declared in most BIOSes.

Cheers,
peter


All times are GMT -5. The time now is 04:33 PM.