Hi,
First question, what's the real FFT clock cycle? or How to reach the FFT clock cycle as the reference book said?
Following are details:
Hardware: LCDK C6748, default CPU clock at 300MHz;
Data: much large, about 5k complex numbers; I need to do 6 times FFT: twice 2048 complex data, and 4 times 256 complex data.
I am using <dsplib_c674x_3_2_0_1> package. Here I is the results with code of <DSPF_sp_fftSPxSP_d.c> in <dsplib_c674x_3_2_0_1\packages\ti\dsplib\src\DSPF_sp_fftSPxSP\c674\>:
DSPF_sp_fftSPxSP_674LE_LE_ELF (example project in the package, imported; changed the MAXN to (256*8) ):
[C674X_0] DSPF_sp_fftSPxSP Iter#: 1 Result Successful N = 8 radix = 2 natC: 4413 optC: 3405
DSPF_sp_fftSPxSP Iter#: 2 Result Successful N = 16 radix = 4 natC: 4103 optC: 4735
DSPF_sp_fftSPxSP Iter#: 3 Result Successful N = 32 radix = 2 natC: 12753 optC: 12359
DSPF_sp_fftSPxSP Iter#: 4 Result Successful N = 64 radix = 4 natC: 25201 optC: 23207
DSPF_sp_fftSPxSP Iter#: 5 Result Successful N = 128 radix = 2 natC: 68759 optC: 61313
DSPF_sp_fftSPxSP Iter#: 6 Result Successful N = 256 radix = 4 natC: 136177 optC: 117835
DSPF_sp_fftSPxSP Iter#: 7 Result Successful N = 512 radix = 2 natC: 345511 optC: 293227
DSPF_sp_fftSPxSP Iter#: 8 Result Successful N = 1024 radix = 4 natC: 691223 optC: 581577
DSPF_sp_fftSPxSP Iter#: 9 Result Successful N = 2048 radix = 2 natC: 1670179 optC: 1407735
Memory: 928 bytes
Cycles: 61313 (N=128) 117835 (N=256)
// same code(copied from DSPF_sp_fftSPxSP_d.c, create new project with wizard), turned kernel_size(for memory calculation) off.
// different .cmd file. Using default C6748.cmd which auto-created with project wizard.
[C674X_0] DSPF_sp_fftSPxSP Iter#: 1 Result Successful N = 8 radix = 2 natC: 3959 optC: 769
DSPF_sp_fftSPxSP Iter#: 2 Result Successful N = 16 radix = 4 natC: 4667 optC: 179
DSPF_sp_fftSPxSP Iter#: 3 Result Successful N = 32 radix = 2 natC: 12822 optC: 677
DSPF_sp_fftSPxSP Iter#: 4 Result Successful N = 64 radix = 4 natC: 23703 optC: 2096
DSPF_sp_fftSPxSP Iter#: 5 Result Successful N = 128 radix = 2 natC: 63965 optC: 5158
DSPF_sp_fftSPxSP Iter#: 6 Result Successful N = 256 radix = 4 natC: 125128 optC: 11451
DSPF_sp_fftSPxSP Iter#: 7 Result Successful N = 512 radix = 2 natC: 313733 optC: 24047
DSPF_sp_fftSPxSP Iter#: 8 Result Successful N = 1024 radix = 4 natC: 608097 optC: 47757
DSPF_sp_fftSPxSP Iter#: 9 Result Successful N = 2048 radix = 2 natC: 1492300 optC: 99953
Cycles: 5158 (N=128) 11451 (N=256)
// code is same as above one
// modify C6748.cmd, set .far DDR2, in case I need big memory in my self code.
[C674X_0] DSPF_sp_fftSPxSP Iter#: 1 Result Successful N = 8 radix = 2 natC: 7461 optC: 2827
DSPF_sp_fftSPxSP Iter#: 2 Result Successful N = 16 radix = 4 natC: 9860 optC: 3054
DSPF_sp_fftSPxSP Iter#: 3 Result Successful N = 32 radix = 2 natC: 29384 optC: 10440
DSPF_sp_fftSPxSP Iter#: 4 Result Successful N = 64 radix = 4 natC: 56614 optC: 21244
DSPF_sp_fftSPxSP Iter#: 5 Result Successful N = 128 radix = 2 natC: 155644 optC: 59170
DSPF_sp_fftSPxSP Iter#: 6 Result Successful N = 256 radix = 4 natC: 303454 optC: 115866
DSPF_sp_fftSPxSP Iter#: 7 Result Successful N = 512 radix = 2 natC: 777104 optC: 291026
DSPF_sp_fftSPxSP Iter#: 8 Result Successful N = 1024 radix = 4 natC: 1532580 optC: 579496
DSPF_sp_fftSPxSP Iter#: 9 Result Successful N = 2048 radix = 2 natC: 3747348 optC: 1404954
Cycles: 59170 (N=128) 115866 (N=256)
However, it's much larger than the benchmark(more than 10 times). The link below asked the same question.
http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/215681.aspx
-----------------------------
Second question:
If I want to store data at DDR2, how do I need to do? (I know I can copy it from DDR2 to L1/L2 when I want to do FFT, I just ask how to configure the memory, or modify the C6748.cmd, and how to clarify the variables in code).