Efficient data copy from PCIe device to RAM in kernel
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Efficient data copy from PCIe device to RAM in kernel
Hello,
I have got an FPGA card (4 PCIe v1.0 lanes) that exposes a prefetchable PCI memory window to the system with pcie_mem_length (16MB). In the driver I have ioremapped its base address into the kernel's virtual memory space, called pcie_mem_vaddr. On the other side, I have got memory reserved during boot time which I have also ioremapped into the kernel's virtual memory space, called reserved_vaddr.
Now, since the FPGA card does not support DMA (yet), I have measured the data transfer time of a single memcpy (reserved_vaddr, pcie_mem_vaddr, pcie_mem_length) using ktime_t and associated hr_timer functions. It takes quite consistently 1.84s for 16MB, i.e. roughly 8MB/s. That's not terrific for nowadays standards.
The kernel's default implementation for memcpy in kernelsource/lib/string.c seems to carry out a loop counting down to zero with a *dest++ = *src++, both being char*; this does not look to efficient. Is it this function that is linked in by gcc and called from a kernel module, or a more architecture specific version found in kernelsource/arch/x86/lib/*.S ? There I can find for example a page_copy.S which seems to be efficient (uses prefetch). How can I make use of that?
How could I speed up the data transfer in the kernel, but without using DMA?
Many thanks for any hints,
peter
Last edited by PeterWurmsdobler; 07-14-2010 at 05:03 AM.
The throughput is too low and it looks like something is wrong.
Why do you use ioremap to reserve/allocate memory instead of kmalloc? Are you sure no memory conflicting?
I need to record 500MB/s of data generated by an FGPA card to RAM. Reserving 8GB at boot time, then using ioremap to bring it into kernel virtual memory space gives me the guarantee that I have memory available to record for roughly 16seconds. kmalloc would return only small chunks and I can not be sure that I can claim 8GB in total; in addition, I would need to maintain the chunks returned by kmalloc.
So my question remains unanswered, how can I transfer data from a PCIe card efficiently without using DMA. From what I have tried so far, using memcpy, or readq(), I only get 8MB/s as every read() is translated into a PCIe request and a single PCIe transation packet is returned for every single word, even though PCIe would support 4k packets.
If you want PCIe 4k burst transfer, it is DMA job. My point is even if CPU can only generate single word PCI request, the throughput shouldn't be that low, 8MB/s.
You can try some CISC CPU, such as x86, and use the instruction of move word from string to string.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.