Hello,
We are having issues with corrupted NAND reads. In our setup we have an FPGA (Xilinx xc61x16) and a NAND chip (MT29F2G08) connected to the L138 EMIF bus. We observe NAND read errors when reading from NAND at the same time that the L138 is performing a DMA to retrieve data from an internal FIFO in the FPGA.
We previously posted a similar error that was resolved by lowering the priority of the EDMA. Details can be found here: http://e2e.ti.com/support/dsp/omap_applications_processors/f/42/p/301392/1051072.aspx
The difference in this scenario is that lowering the EDMA priority does not fix the issue.
Our customer did not start experiencing problems until we switched from a 16-bit wide NAND chip (MT29F2G16AADWP) to an 8-bit wide NAND chip (MT29F2G08). The customer was running the L138 at 456 MHz and 32-bit EDMA transfers were being used to retrieve data from the FPGA.
So, we experimented with the 8-bit NAND and observed the following:
L138 Clock Speed = 456 MHZ, 32-bit EDMA transfers -> READ ERRORS
L138 Clock Speed = 456 MHz, 16-bit EDMA transfers -> READ ERRORS
L138 Clock Speed = 300 MHz, 32-bit EDMA transfers -> READ ERRORS
L138 Clock Speed = 300 MHz, 16-bit EDMA transfers -> No errors
L138 Clock Speed = 456 MHz, 32-bit DSP memory access transfers -> No errors.
It is important to point out that when clocking the L138 faster, the EMIF clock speed is actually slightly lowered (from 100 MHz to about 92 MHz) due to the way PLLs in the L138 need to be adjusted. Therefore, the data we captured shows us that the NAND read error is somehow related to EDMA use, the 8-bit NAND and longer EMIF transfers.
Next, we utilized the FPGA on-chip analyzer to monitor the EMIF bus and attempt to capture the scenario in which the NAND reads were getting corrupted. We were able to observe that when experiencing NAND read problems NAND read accesses were being attempted while the NAND was reporting that it was busy. I’ve attached two examples of these scenarios below.
What we are seeing in these captures is that a command for a read request will be issued to the NAND chip. The active low NAND busy line will then drop low as the NAND chip retrieves the requested data. However, once the NAND busy line returns to being high, indicating that NAND data is ready to be read, a second NAND read command is issued. Immediately following this second NAND read command, after the NAND chip has been brought busy low again in response to the second NAND read command, we see the output enable requests being issued to NAND asking for read data, presumably from the first read request.
We are not sure what to make of this. It almost seems like the first NAND read is getting interrupted somehow by the second NAND read.
We did find a work around for the moment. We modified nand_base.c in the kernel such that in the nand_command() and nand_command_lp() functions the uninterruptable ndelay(100) executed before checking the NAND busy pin was extended to 20 us. Since we never experienced the NAND busy pin going low longer than 20 us, this forced the kernel to pause in an uninterruptable state until the NAND was no longer busy from the command request. Obviously this makes NAND access a bit less efficient and is not ideal, but this is the only fix we were able to find.
If anyone has any ideas or insight as to what is going on and how we can fix this without limiting the efficiency at which we can access NAND, it would be greatly appreciated.
Thanks,
\Greg