Hi!
That's my first post on this forum. I wanted to share my experience with ARM9 execution time. I tested the following loop:
int i;
int max = 90000000;
for( i = 0; i < max; i++ );
with different configurations:
- ICache on/off,
- variables i and max declared as registers (register int i; register int max=90000000) or not
- program and variables placed in: shared memory/ARM memory
I use Code Compose 4.2.4. Compiler optimization was switched off. The core runs on 300MHz
What I did not test yet, is the influence of data cache. I'll update, once I have more free time.
The results are in the attachment.
When the variables are declared as NOT registers, the loop is unrolled as follows:
$C$L1:
LDR R0, $C$CON1
LDR R12, [R0]
ADD R12, R12, #0x1
STR R12, [R0]
LDR R12, $C$CON2
LDR R0, $C$CON1
LDR R12, [R12]
LDR R0, [R0]
CMP R12, R0
BGT $C$L1
The variables i and max are retrieved from memory each iteration, compromising the performance.
When variables i and max are declared as registers, the loop unfolds as:
$C$L1:
ADD R12, R12, #0x1
CMP R4, R12
BGT $C$L1
what eliminates the need to grab to the memory, giving considerable boost.
Conclusions:
1) The execution time for extreme cases differs by a factor of 50!!!
2) Observe assembler code to pinpoint bottlenecks.
Best regards
Przemyslaw Baranski