Multiprocessor Programming Example Code?

joele54 · 02-01-2012, 08:18 PM

I am working on an embedded Linux project which has 3 CPU boards running the same software under Linux on each CPU (these are separate CPU boards on a passive backplane). I need for all 3 CPUs to have read/write access to the same block of memory across the backplane, with some sort of protection mechanism (intercpu semaphores ?).

If anyone has any link to programming examples (especially in C) that might help me figure out how to set this up, please post them.

Thanks.

Nominal Animal · 02-01-2012, 09:33 PM

Do you have the hardware already set up so they share a memory aperture correctly? Have you already taken care of caching issues (usually, disabled CPU caching for the aperture)?

joele54 · 02-03-2012, 11:09 PM

I have since found out that mmap allows mapping a shared segment at a fixed address. This memory block could be programmed to be in PCI address space so it should be visible to other CPUs over the PCI bus.

Nominal Animal · 02-04-2012, 05:23 AM

Yes. You'd better use a trivial character device driver (that exports open(), close(), and mmap()) for this, though. You only need request_mem_region() to reserve the address range, then io_remap_pfn_range() to remap it to user space. See Linux Device Drivers, 3rd Edition, especially chapter 3 for details.

I'm still a bit intrigued. The PCI bus is a good choice for something like this.

Perhaps you could set up the IOMMU on each CPU so that each has an (separate) address range on the PCI bus mapped to uncached memory on it? In this case, the standard multiprocessor locking primitives (that apply the PCI bus lock (LOCK opcode prefix on x86 architectures) will Just Work for you. Instead of a single shared aperture, you'd have three apertures, each one owned by a specific CPU, but accessible to all three. Even if one of the CPUs drops, the other two can continue working (assuming you have hardware that will make sure one CPU will not hog the CPU bus indefinitely). Your userspace will use your driver to map all three apertures.

The logic on how to keep all three in sync (if you intend to have majority rule, like NASA does for satellite stuff -- any one CPU is overruled by the other two, but consensus is expected) is built on top of the locking primitives; the operations you need depend on the way you intend to utilize the CPU triplet: in parallel, sharing the workload, or with dedicated purpose each.