Is it safe to use copy_to_user on a __iomem pointer returned by ioremap?
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Is it safe to use copy_to_user on a __iomem pointer returned by ioremap?
Hi,
I am currently writing a device driver and i wounder if it is safe to use copy_to_user on an __iomem pointer return by ioremap()? Or do I have to copy with memcpy_fromio() to an kmalloced buffer in the driver first and then copy_to_user?
I think copy_to_user will work without problem (not 100% confirm) as ioremap actually maps memory-region in virtual-address space which is valid for any normal memory operations although if you are doing atomic ioremap then make sure that the memory functions used should not sleep
The contents of the address range returned by ioremap() should not be accessed like normal memory; you have to use the read/writeb/w/l() or memcpy_from/toio() functions.
On x86(-64), MMIO accesses actually are normal memory accesses, so copy_to_user() will work, but this would not be portable to some other architectures.
The contents of the address range returned by ioremap() should not be accessed like normal memory; you have to use the read/writeb/w/l() or memcpy_from/toio() functions.
On x86(-64), MMIO accesses actually are normal memory accesses, so copy_to_user() will work, but this would not be portable to some other architectures.
Yup, you are right. This is what the comment above that function tells. I used these functions many many years ago, so I mixed up the things. Thanks for reminding me
Ok, thank you for you reply. Do you know if this extra memory copy consumes much performace?
There might be some way of copying iomapped data directly to user-space buffer (just search for this), but if you do this extra copy then obviously it will consume CPU cycles and how much cycles it consume depends on size of the buffer/mapped-region
There might be some way of copying iomapped data directly to user-space buffer (just search for this)
AFAIK there is no function for this.
Since the intermediate buffer is fully cached, there shouldn't be too much of a performance hit; I'd expect it to be unnoticeable in most cases.
If you have to copy really big amounts of data, you might want to allow the application to mmap() the device's address range, or, if that is not possible, to mmap() your own buffer that you copied the data into. (See chapter 15 of LDD3.)
Since the intermediate buffer is fully cached, there shouldn't be too much of a performance hit; I'd expect it to be unnoticeable in most cases.
If you have to copy really big amounts of data, you might want to allow the application to mmap() the device's address range, or, if that is not possible, to mmap() your own buffer that you copied the data into. (See chapter 15 of LDD3.)
Intermediate buffer _might_ be in cache but still copying data will eat cycles (lets say if copy each byte from cache takes 1-cycle then copying 1KB of data will take at-least 1000 cycles, right ?)
Yes, this mmap thing will be right approach for avoiding extra memcpy. As if data is big amount then copying will reduce performance.
if copy each byte from cache takes 1-cycle then copying 1KB of data will take at-least 1000 cycles, right ?
Yes. However, reading one uncached word from memory can take about 150 cycles, and reading one word from a device might take 1000 cycles. Anything you do in the L1 cache just doesn't matter in comparison.
Yes. However, reading one uncached word from memory can take about 150 cycles, and reading one word from a device might take 1000 cycles. Anything you do in the L1 cache just doesn't matter in comparison.
Correct, its negligible as compare to memory or device access, but if high performance is needed then each cycle makes a difference.
Anyways, we can leave this discussion (if you don't mind) and let thread-starter decide if this extra copy effecting him or not according to his requirements.
Thank you for all you info. I have tried to mmap() the device address range and just copy the data direct in the user application with memcpy(). But my problem is that the performance is to bad only 2.5MB/s in read from the device with mmap. I guest that no burst reads are done, only single accesses. The device by the way is an PCI express device. Unfortunately the device has no DMA on it an I have to read large amount of data. It should be enough if I could increase the performance with a factor of 10. Any ideas how to increase the performance? A question about mmap then, the memory that I am mmap in the driver is not ioremapped, I only do:
remap_pfn_range(vma, vma->vm_start, pci_resource_start(pcie->pci_dev, 0) >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot)
This is the right to do it? I works but like I said the performance is hideous.
Thank you for all you info. I have tried to mmap() the device address range and just copy the data direct in the user application with memcpy(). But my problem is that the performance is to bad only 2.5MB/s in read from the device with mmap. I guest that no burst reads are done, only single accesses. The device by the way is an PCI express device. Unfortunately the device has no DMA on it an I have to read large amount of data. It should be enough if I could increase the performance with a factor of 10. Any ideas how to increase the performance? A question about mmap then, the memory that I am mmap in the driver is not ioremapped, I only do:
remap_pfn_range(vma, vma->vm_start, pci_resource_start(pcie->pci_dev, 0) >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot)
This is the right to do it? I works but like I said the performance is hideous.
I used the same remap_pfn_range function in mmap in one of my projects and it was working fine and very fast (but mmaped memory in my case was simple kernel memory-pages not device).
How you are accessing this mapped memory from user-space ? And how you are filling the data from kernel-space ?
It will be good if you can provide some code-snippet, so that anyone can guide you in a better way ?
the performance is too bad only 2.5MB/s in read from the device with mmap. I guess that no burst reads are done, only single accesses.
To enable burst reads, set the page protection flags to cacheable.
However, vma->vm_page_prot comes from userspace and is likely to alredy include those flags. Furthermore, if the device changes the mmap-ed memory, you cannot make it cacheable because you might get stale data.
How fast is kernel driver able to read from the device into its own buffer?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.