Registered: Mar 2011
Location: Milford, MA. USA
Distribution: MontaVista, Ubuntu, MINT
Short answer is semaphores only for system calls. And you can read my lengthy discourse if interested.
The address of that shared memory is governed by the hardware, which you already knew.
Further, it sounds like the transfer is governed by hardware rules; which means that you are not arbitrating memory access between two CPU's via a scheme which you're designing, instead you said you have a DMA engine which will perform the DMA when it receives data.
Issues you may have to deal with would be the endian-ness of the data if this is different between the two CPU's, and word, or longword alignment of data if optimization of transfers is best done on a longword boundary. It's been a while for me on that (> 10 years), but again there are pragma directives you can place in your code to force longword alignment of a variable or array. Further it depends what the DMA is doing for you, such as how large of a buffer you can have and what do you do to indicate that it's time to transfer. For instance it could be just a 32 or 64-bit word when once written, it will transfer it. Or it could be a big buffer up to some maximum size where you fill however much of that you need too, and then write to a register to tell it to perform the transfer. In the case of endian differences between CPU's, find out if the DMA will assist and resolve that for you, because what's the benefit of hardware assisted transfer if you still have to process every byte?
Code management of the DMA from the perspectives of read and write, I would do through one central set of library functions or one sub-group of processes who's sole purpose was to perform DMA administration and transfers.
And sorry I'm not aware of any specific Linux system calls which help facilitate this, and would further expect that for the specific DMA, if special things were required, they'd either provide a set of library functions, or a specification with examples so that you could create them in your program space.
In a very large scale system, you could protect access to the DMA registers and buffers via semaphores; however in that case I would create a process or set of processes with public interfaces solely for DMA interfacing in my program architecture. Hence the only processes directly talking to the DMA would be the ones designed to do so.
Again, with the compiler and linker, there are ways to declare an address to treat it as read-only and another one as write-only where you'd get software exceptions in cases where you had code trying to do an incorrect operation. (1) While good, you have to process those exceptions and not just ignore them, overall they're bad things to have happen, (2) making data segments read or write only doesn't restrict the processes which may do so, therefore if someone writes an additional process for your architecture which writes right over a write-only region, but does so in error because it doesn't understand the mapping and restrictions of that address space, the bad result is still the same, a problem; however you won't see an exception because you can write to that location. To help deal with this, the special segment for the DMA address space is typically mapped in it's own segment, and then the stack and heap are placed elsewhere, so that the normal use of variables and stack are elsewhere in memory.
A quick thought on pure shared memory. This is different from DMA assist. My experience is actually more with pure shared memory where the memory was read-able by both sides. One side was the Linux host CPU, the other side was a bank of signal processing cores, therefore it was a shared memory map; one large segment, further sub-segmented to 12 DSP core processors, all identical per processor, just 12 different offset values and the larger segment was 12 times the size of the "per processor" layout. In other words, say an individual segment was 10 bytes, for 12 DSP cores, that would require 120 bytes. The first one would be at offset 0, the second one would be at offset 10, and so forth. To further complicate matters, we have FPGA filters which would take ingress and egress data and convert it accordingly; prior to, and just after, signal processing. That's not too relevant, just me being honest. Because those FPGA's used memory, per DSP, from within each processor's sub-segment address space. Say for instance, you had my example of 12 DSPs with 10 byte address spaces, the FPGA for ingress might map channel 0 to offset DSP_0_BASE + 2 bytes and use 2 bytes; and then for channel 1 it would map to DSP_1_BASE + 2 bytes, and then the egress FPGA might use + 4 as it's sub-offset and use a similar amount of space in the per processor map. These mappings were all known, they were much larger than 2, or 10 bytes by the way; the end result was that by definition only, programmers knew what addressing spaces were reserved and how they accessed those spaces so as to cause their desired result.
In the application side of the world, where I was; we located the large segment separately and used library functions to access locations within there. Further, since it was not a DMA, we had to define a signaling method to indicate to each side when data was placed in there. The FPGA channels just worked on any data they were presented with, and further that data came from outside the system via high speed acquisition hardware. Really what we were doing with the memory mapped area was two things (1) making reserved spaces for these transfer mechanisms which was compiler and linker rules and mappings, and (2) controlling the DSPs and getting status from them. For instance if we needed to change a coefficient or a matrix, we had that capability and we'd place it in the right location and then modify the master control to say "Hey DSP! You've got a new coefficient from me!" And we left the space for the FPGA's along except for reading it during debug. Debug was of course huge, if we had bad results we had to have lots of "dump DSP memory" functions, because the FPGA's never stopped, the DSP's never stopped, they just operated on whatever data they were presented with and they altered their operations based on what they were told through the shared memory map. Therefore in addition to dumping the shared memory map, we have mirror acknowledgments from each processor which told us what it was last given. The application programmer could swear all they wanted that they sent "A, A, C, Q" but if the mirror showed "A, B, Q, D" ... problem! (And invariably it was the application which didn't know what it was doing.)
Circle all the way back to any potential system calls. You "could" use semaphores to arbitrate multi-thread or multi-process access to those memory locations and internally in the DSP interface process, we did have them. But mainly all "other" user processes which asked for data, gave new data or commands, sent that information via public API functions made available by the DSP interface process. Therefore normal Linux IPC mechanisms (PIPES - my personal favorite) were used.