Is it safe to use copy_to_user on a __iomem pointer returned by ioremap?

orback · 02-25-2010, 09:23 AM

Hi,

I am currently writing a device driver and i wounder if it is safe to use copy_to_user on an __iomem pointer return by ioremap()? Or do I have to copy with memcpy_fromio() to an kmalloced buffer in the driver first and then copy_to_user?

Thanks

fslateef · 02-26-2010, 05:43 AM

Hello,

I think copy_to_user will work without problem (not 100% confirm) as ioremap actually maps memory-region in virtual-address space which is valid for any normal memory operations although if you are doing atomic ioremap then make sure that the memory functions used should not sleep

I hope this helps.

Fawad Lateef

cladisch · 02-26-2010, 06:21 AM

The contents of the address range returned by ioremap() should not be accessed like normal memory; you have to use the read/writeb/w/l() or memcpy_from/toio() functions.

On x86(-64), MMIO accesses actually are normal memory accesses, so copy_to_user() will work, but this would not be portable to some other architectures.

fslateef · 02-26-2010, 06:41 AM

Quote:

Originally Posted by cladisch

The contents of the address range returned by ioremap() should not be accessed like normal memory; you have to use the read/writeb/w/l() or memcpy_from/toio() functions.

On x86(-64), MMIO accesses actually are normal memory accesses, so copy_to_user() will work, but this would not be portable to some other architectures.

Yup, you are right. This is what the comment above that function tells. I used these functions many many years ago, so I mixed up the things. Thanks for reminding me

orback · 02-26-2010, 08:39 AM

Ok, thank you for you reply. Do you know if this extra memory copy consumes much performace?

fslateef · 02-26-2010, 08:46 AM

Quote:

Originally Posted by orback

Ok, thank you for you reply. Do you know if this extra memory copy consumes much performace?

There might be some way of copying iomapped data directly to user-space buffer (just search for this), but if you do this extra copy then obviously it will consume CPU cycles and how much cycles it consume depends on size of the buffer/mapped-region

cladisch · 02-26-2010, 09:37 AM

Quote:

There might be some way of copying iomapped data directly to user-space buffer (just search for this)

AFAIK there is no function for this.

Since the intermediate buffer is fully cached, there shouldn't be too much of a performance hit; I'd expect it to be unnoticeable in most cases.

If you have to copy really big amounts of data, you might want to allow the application to mmap() the device's address range, or, if that is not possible, to mmap() your own buffer that you copied the data into. (See chapter 15 of LDD3.)

fslateef · 02-26-2010, 10:28 AM

Quote:

Originally Posted by cladisch

AFAIK there is no function for this.

Since the intermediate buffer is fully cached, there shouldn't be too much of a performance hit; I'd expect it to be unnoticeable in most cases.

If you have to copy really big amounts of data, you might want to allow the application to mmap() the device's address range, or, if that is not possible, to mmap() your own buffer that you copied the data into. (See chapter 15 of LDD3.)

Intermediate buffer _might_ be in cache but still copying data will eat cycles (lets say if copy each byte from cache takes 1-cycle then copying 1KB of data will take at-least 1000 cycles, right ?)

Yes, this mmap thing will be right approach for avoiding extra memcpy. As if data is big amount then copying will reduce performance.

cladisch · 02-26-2010, 11:05 AM

Quote:

if copy each byte from cache takes 1-cycle then copying 1KB of data will take at-least 1000 cycles, right ?

Yes. However, reading one uncached word from memory can take about 150 cycles, and reading one word from a device might take 1000 cycles. Anything you do in the L1 cache just doesn't matter in comparison.

fslateef · 02-26-2010, 11:33 AM

Quote:

Originally Posted by cladisch

Yes. However, reading one uncached word from memory can take about 150 cycles, and reading one word from a device might take 1000 cycles. Anything you do in the L1 cache just doesn't matter in comparison.

Correct, its negligible as compare to memory or device access, but if high performance is needed then each cycle makes a difference.

Anyways, we can leave this discussion (if you don't mind) and let thread-starter decide if this extra copy effecting him or not according to his requirements.

orback · 02-27-2010, 03:29 AM

Thank you for all you info. I have tried to mmap() the device address range and just copy the data direct in the user application with memcpy(). But my problem is that the performance is to bad only 2.5MB/s in read from the device with mmap. I guest that no burst reads are done, only single accesses. The device by the way is an PCI express device. Unfortunately the device has no DMA on it an I have to read large amount of data. It should be enough if I could increase the performance with a factor of 10. Any ideas how to increase the performance? A question about mmap then, the memory that I am mmap in the driver is not ioremapped, I only do:
remap_pfn_range(vma, vma->vm_start, pci_resource_start(pcie->pci_dev, 0) >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot)
This is the right to do it? I works but like I said the performance is hideous.

fslateef · 03-01-2010, 09:49 AM

Quote:

Originally Posted by orback

Thank you for all you info. I have tried to mmap() the device address range and just copy the data direct in the user application with memcpy(). But my problem is that the performance is to bad only 2.5MB/s in read from the device with mmap. I guest that no burst reads are done, only single accesses. The device by the way is an PCI express device. Unfortunately the device has no DMA on it an I have to read large amount of data. It should be enough if I could increase the performance with a factor of 10. Any ideas how to increase the performance? A question about mmap then, the memory that I am mmap in the driver is not ioremapped, I only do:
remap_pfn_range(vma, vma->vm_start, pci_resource_start(pcie->pci_dev, 0) >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot)
This is the right to do it? I works but like I said the performance is hideous.

I used the same remap_pfn_range function in mmap in one of my projects and it was working fine and very fast (but mmaped memory in my case was simple kernel memory-pages not device).

How you are accessing this mapped memory from user-space ? And how you are filling the data from kernel-space ?

It will be good if you can provide some code-snippet, so that anyone can guide you in a better way ?

Regards,

Fawad Lateef

cladisch · 03-01-2010, 10:32 AM

Quote:

the performance is too bad only 2.5MB/s in read from the device with mmap. I guess that no burst reads are done, only single accesses.

To enable burst reads, set the page protection flags to cacheable.
However, vma->vm_page_prot comes from userspace and is likely to alredy include those flags. Furthermore, if the device changes the mmap-ed memory, you cannot make it cacheable because you might get stale data.

How fast is kernel driver able to read from the device into its own buffer?