LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices



Reply
 
Search this Thread
Old 07-13-2010, 05:36 PM   #1
PeterWurmsdobler
LQ Newbie
 
Registered: May 2010
Location: Cambridge, UK
Distribution: Ubuntu 9.04
Posts: 17

Rep: Reputation: 1
Efficient data copy from PCIe device to RAM in kernel


Hello,

I have got an FPGA card (4 PCIe v1.0 lanes) that exposes a prefetchable PCI memory window to the system with pcie_mem_length (16MB). In the driver I have ioremapped its base address into the kernel's virtual memory space, called pcie_mem_vaddr. On the other side, I have got memory reserved during boot time which I have also ioremapped into the kernel's virtual memory space, called reserved_vaddr.

Now, since the FPGA card does not support DMA (yet), I have measured the data transfer time of a single memcpy (reserved_vaddr, pcie_mem_vaddr, pcie_mem_length) using ktime_t and associated hr_timer functions. It takes quite consistently 1.84s for 16MB, i.e. roughly 8MB/s. That's not terrific for nowadays standards.

The kernel's default implementation for memcpy in kernelsource/lib/string.c seems to carry out a loop counting down to zero with a *dest++ = *src++, both being char*; this does not look to efficient. Is it this function that is linked in by gcc and called from a kernel module, or a more architecture specific version found in kernelsource/arch/x86/lib/*.S ? There I can find for example a page_copy.S which seems to be efficient (uses prefetch). How can I make use of that?

How could I speed up the data transfer in the kernel, but without using DMA?

Many thanks for any hints,
peter

Last edited by PeterWurmsdobler; 07-14-2010 at 06:03 AM.
 
Old 07-13-2010, 06:55 PM   #2
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,058

Rep: Reputation: 88
What's that memcpy, from system memory to system memory, or from PCIe memory to system memory?
 
Old 07-14-2010, 04:20 AM   #3
PeterWurmsdobler
LQ Newbie
 
Registered: May 2010
Location: Cambridge, UK
Distribution: Ubuntu 9.04
Posts: 17

Original Poster
Rep: Reputation: 1
Hello,

sorry if it was not clearn from my first post. In the kernel driver's init part I have in essence:

Code:
unsigned long pcie_mem_hwaddr = pci_resource_start (pcie_dev, 0);
unsigned long pcie_mem_length = pci_resource_len(pcie_dev, 0);//(16MB)
void * pcie_mem_vaddr = ioremap(pcie_mem_hwaddr, pcie_mem_length);
void * reserved_vaddr = ioremap(0x100000000UL, 0x200000000UL);
In a periodically called function I have in essence:

Code:
ktime_t before = ktime_get_real();
memcpy(reserved_vaddr, pcie_mem_vaddr, pcie_mem_length);
ktime_t after = ktime_get_real();
ktime_t diff = ktime_sub(after, before);  /* kt1 - kt2 
printk("dT = %lld ns\n", ktime_to_ns(diff));*/
And this is what produces quite consitently 1.8s. So how could I speed up the transfer, without DMA?

Cheers
 
Old 07-14-2010, 03:44 PM   #4
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,058

Rep: Reputation: 88
The throughput is too low and it looks like something is wrong.
Why do you use ioremap to reserve/allocate memory instead of kmalloc? Are you sure no memory conflicting?
 
Old 07-15-2010, 05:31 AM   #5
PeterWurmsdobler
LQ Newbie
 
Registered: May 2010
Location: Cambridge, UK
Distribution: Ubuntu 9.04
Posts: 17

Original Poster
Rep: Reputation: 1
Hello,

I need to record 500MB/s of data generated by an FGPA card to RAM. Reserving 8GB at boot time, then using ioremap to bring it into kernel virtual memory space gives me the guarantee that I have memory available to record for roughly 16seconds. kmalloc would return only small chunks and I can not be sure that I can claim 8GB in total; in addition, I would need to maintain the chunks returned by kmalloc.

So my question remains unanswered, how can I transfer data from a PCIe card efficiently without using DMA. From what I have tried so far, using memcpy, or readq(), I only get 8MB/s as every read() is translated into a PCIe request and a single PCIe transation packet is returned for every single word, even though PCIe would support 4k packets.

Cheers,
peter
 
Old 07-15-2010, 03:43 PM   #6
nini09
Senior Member
 
Registered: Apr 2009
Posts: 1,058

Rep: Reputation: 88
If you want PCIe 4k burst transfer, it is DMA job. My point is even if CPU can only generate single word PCI request, the throughput shouldn't be that low, 8MB/s.
You can try some CISC CPU, such as x86, and use the instruction of move word from string to string.
 
Old 07-16-2010, 11:18 AM   #7
PeterWurmsdobler
LQ Newbie
 
Registered: May 2010
Location: Cambridge, UK
Distribution: Ubuntu 9.04
Posts: 17

Original Poster
Rep: Reputation: 1
Hello,
thanks for the answers. We have now changed the FPGA design to incorporate DMA. We now get 600MB/s on 4 lane v1.0 PCIe.
peter
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Writing PCIe device driver pythonarms Linux - Kernel 0 02-23-2009 02:45 PM
Doubts about PCIe device driver rwpa Linux - Newbie 4 12-10-2008 09:25 AM
XCDRoast Against Kernel 2.6 and How To Copy Data CD. btbx Linux - Hardware 2 06-09-2008 02:27 PM
Efficient kernel<->userspace data transfering maverik Programming 2 05-31-2008 03:24 AM
a question about kernel copy userspace data linwenyuan Programming 0 03-28-2007 10:55 AM


All times are GMT -5. The time now is 11:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration