Quantcast
Channel: Processors forum - Recent Threads
Viewing all articles
Browse latest Browse all 17527

OMAP L138 - ARM9 performance

$
0
0

Hi!

That's my first post on this forum. I wanted to share my experience with ARM9 execution time. I tested the following loop:

int i;
int max = 90000000;
for( i = 0; i < max; i++ );

with different configurations:
- ICache on/off,
- variables i and max declared as registers (register int i; register int max=90000000) or not
- program and variables placed in: shared memory/ARM memory

I use Code Compose 4.2.4. Compiler optimization was switched off. The core runs on 300MHz

What I did not test yet, is the influence of data cache. I'll update, once I have more free time.

The results are in the attachment.

When the variables are declared as NOT registers, the loop is unrolled as follows:
$C$L1:
LDR           R0, $C$CON1
LDR           R12, [R0]
ADD           R12, R12, #0x1
STR           R12, [R0]
LDR           R12, $C$CON2
LDR           R0, $C$CON1
LDR           R12, [R12]
LDR           R0, [R0]
CMP           R12, R0
BGT           $C$L1

The variables i and max are retrieved from memory each iteration, compromising the performance.

When variables i and max are declared as registers, the loop unfolds as:
$C$L1:
ADD           R12, R12, #0x1
CMP           R4, R12
BGT           $C$L1
what eliminates the need to grab to the memory, giving considerable boost.

Conclusions:
1) The execution time for extreme cases differs by a factor of 50!!!
2) Observe assembler code to pinpoint bottlenecks.

Best regards
Przemyslaw Baranski


Viewing all articles
Browse latest Browse all 17527

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>