Quantcast
Channel: Processors forum - Recent Threads
Viewing all articles
Browse latest Browse all 17527

OMAP-L138 ARM9 cache latency

$
0
0

Hello,

I am trying to do some performance testing on the OMAP-L138. The idea is to compare the time to hit and miss the ARM data cache. The processor clock is set to 456MHz, mDDR at 150MHz. All clocks verified through the CLKOUT pin. 

Both I and D cache are enabled, along with the MMU:

/* Enables MMU */
CP15MMUEnable();

/* Enable Instruction Cache */
CP15ICacheEnable();

/* Enable Data Cache */
CP15DCacheEnable();

We create 2 buffer Mem1 and Mem2 in mDDR in the cached region:

static int Mem1[1024*20/4];
static int Mem2[1024*20/4];
static int testDst;

Then we make sure to have (or part of) Mem1 in the data cache:

for (i=0; i<16*1024/4; i++)
{
testDst += Mem1[i];
}

We just toggle few time the GPIO so that both instructions and data are also cached:

for(index=0; index<2; index++) {
GPIO_BANK01->OUT_DATA &= ~(1<<5); // Set GP0[5] to low
GPIO_BANK01->OUT_DATA |= (1<<5); // Set GP0[5] to high
}

Then we access the cached buffer:

GPIO_BANK01->OUT_DATA &= ~(1<<5); // Set GP0[5] to low

testDst = Mem1[(16*1024/4)-1];

testDst = Mem1[(16*1024/4)-2];

testDst = Mem1[(16*1024/4)-3];
testDst = Mem1[(16*1024/4)-4];
testDst = Mem1[(16*1024/4)-5];
testDst = Mem1[(16*1024/4)-6];
testDst = Mem1[(16*1024/4)-7];
testDst = Mem1[(16*1024/4)-8];

And the uncached buffer:

GPIO_BANK01->OUT_DATA |= (1<<5); // Set GP0[5] to high - place breakpoint here
GPIO_BANK01->OUT_DATA &= ~(1<<5); // Set GP0[5] to low
testDst = Mem2[0];
GPIO_BANK01->OUT_DATA |= (1<<5); // Set GP0[5] to high

The GP0[5] on the oscilloscope shows unexpectedly slow times.

1) toggling the GP0[5] the first time takes 330ns on average

2) the following times instead, 180ns

3) Accessing 8 times the cached buffer entries takes 1.14us

4) Accessing 1 time the uncached buffer takes 500ns

180ns for just toggling a GPIO seems a bit too much, even considering that the peripheral is clocked at 114MHz.

~1us (1140-180ns) to access 8 32-bit variables in the data cache seems too much. Is anywhere specified what the I/D cache access time should be? 

Are these figures realistic? Am I doing some wrong? 

Thanks,

Giuseppe

 


Viewing all articles
Browse latest Browse all 17527

Trending Articles