I have been chasing a memory corruption issue.
We are using the OMAP 138 - Linux on the ARM and DSP BIOS (5.41.13.42) on the DSP. The board is a custom board.
The problem first manifested itself as "losing" the MSGQ handle after some of the code has run. Upon investigation what happened was 64K bytes of memory was overwritten at some arbitrary point in the code run. Sometimes the memory corrupted is the MSGQ handle - sometimes it is something else - it always appears to be 64K at a time. The memory is located in the DDR space of the DSP. The corruption is most often seen in the space is filled with global variables which point to structures declared on the stack. (i.e. the MSGQ handle which points to the actual MSGQ structure allocated at runtime.) The data on the stack is not corrupted.
There is plenty of stack assigned for all tasks. In investigating this problem, the code has been whittled down to mainly start-up stuff - initialization of the message queues and the shared memory and launching tasks which do nothing more than return.
The odd thing is that the corruption happens at a certain line of code.... until you make a change and re-compile. Then it happens at another arbitary line of code. Sometimes it is a print statement... sometimes it is where a variable gets incremented. The functionality of the code seems completely disassociated from the memory corrupted.
We have inspected the stacks of the tasks running. We have disabled the cache and seen no change. We have watched the edma isr and seen no association with the corruption.
Because it seems so closely tied to the code of the DSP, I have assumed the corruption is due to the DSP.
The questions I have are this:
1) Is there some configuration we have overlooked?
2) Is there a method to find out what is writing to the memory when it gets corrupted?
3) Has anyone else seen this issue?
Thanks for your help.