On my L138, I have a working SPI driver using per-byte interrupts which I'm starting to convert to use DMA transfers.
The L138 is the SPI slave, and we're using the non-standard 4-wire-with-enable mode which lets us hold the SPI link until data is ready. As the slave, I need to set up the Tx data when the transfer starts, and not before (else we could be transferring stale data). I'm preloading the Tx shift register and TXDATA with zeros, so there are always two bytes ready to go. I get a Tx interrupt after the first zero byte has been transmitted, and then I have the duration of the second zero byte to prepare data and get TXDATA loaded with actual data. The third byte is then the start of actual data. This keeps the SPI peripheral fed at all times, so transfers are efficient, and in the unlikely case that the data isn't ready, the enable output makes the SPI master wait until we are ready. It's working OK with interrupts, but the ISRs take too much time so I need to move to DMA.
Converting this to DMA, I think I'm going to need two buffers on the Tx side - call them "application Tx" and "driver Tx". The "application Tx" buffer will be written periodically by the application with the latest values. When we get our first Tx interrupt, this buffer will be copied to "driver Tx" in its entirety. "Driver Tx" will then be DMA'd to the SPI peripheral one byte at a time, as needed for SPI Tx. The application could change "application Tx" asynchronously to the SPI transfer, and we don't want it stomping over data halfway through. Using two separate buffers keeps us safe here. On the Rx side, things are much simpler - I'll just be using a single PaRaM set with a callback when it's complete, and job done.
I have a few questions about the DMA though.
1) The Tx looks like a job for linked PaRaM sets. From the docs, it looks like I can set up one PaRaM set to do the buffer copy, and a second PaRaM set for copying individual bytes to the SPI peripheral, and the EDMA3 controller will take care of everything. If I set these up and link them, will the first SPI Tx event automatically do the full-buffer copy *and* copy the first byte to the SPI peripheral, and then subsequent SPI Tx events will copy the rest of the bytes one at a time until the buffer is empty? Or is there some extra step needed?
2) Do I need to do anything special to ensure the transfer controller has completed the buffer copy before it starts DMA'ing individual bytes to the SPI Tx?
3) If I set this up to auto-reload the first PaRaM set (buffer copy) on completion of the second PaRaM set (SPI Tx), I think the whole Tx side should run hands-off without needing the processor involved at all. This seems too good to be true. :) Am I missing something?
4) I'm transmitting 32-bit values. I don't care if I have some inconsistency between the values in the buffer, if the transmit starts halfway through an update, but I absolutely must have intact 32-bit values. (Using the upper 16 bits of the last value and the lower 16 bits of the new value is not a good thing!) Writes to "application Tx" by the application will be atomic for each 32-bit value, because it's a 32-bit processor and the buffers are in L1D. But I can't tell from the datasheet whether the DMA copy from "application Tx" to "driver Tx" will be atomic for 32-bit values. If it isn't, this would be a problem. In that case I'd probably need to receive the first SPI Tx interrupt myself and have an ISR which does the copy safely.
5) What limits are there on the PaRaM sets I use? My reading of the EDMA3 LLD is that the first 32 PaRaM sets map one-to-one onto the 32 channels as the "first" PaRaM set for an event, and the remaining PaRaM sets are freely available. Is that correct?
6) And as far as "freely available" goes, do I need to worry about SYS/BIOS or IPC using any of the other PaRaMs? Is there any way of telling whether a PaRaM is actually in use, short of trying it and seeing whether it crashes?
Sorry if some of this is obvious to longer-term users of the device, but my experience is that the hardest bugs to fix are the ones coming from basic assumptions (like "of course data gets transferred 32 bits at a time") which turn out not to be true. :)