Part Number:PROCESSOR-SDK-AM57X
Tool/software: Linux
Hi Ti,
I am doing OpenCV+DSP accelerate on AM5718 board,SDK version is ti-processor-sdk-linux-rt-am57xx-evm-04.01.00.06.
I found UMat convert to Mat takes too much time。
Here is my test code and result:
code:
int main(int argc, char ** argv)
{
unsigned char *yuv_raw_data = NULL;
int c = -1;
struct timespec tp0, tp1, tp2, tp3, tp4, tp5;
cameraOpenDevice(1);
Mat mat_yuv(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC2);
Mat mat_rgb(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC3);
UMat umat_yuv(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC2);
UMat umat_rgb(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC3);
while(1) {
clock_gettime(CLOCK_MONOTONIC, &tp0);
DqBuffer(&yuv_raw_data);//yuv raw data from V4l2 camera
QBuffer();
memcpy(mat_yuv.data, yuv_raw_data, CAMERA_WIDTH*CAMERA_HEIGHT*2);
clock_gettime(CLOCK_MONOTONIC, &tp1);
mat_yuv.copyTo(umat_yuv);
clock_gettime(CLOCK_MONOTONIC, &tp2);
cvtColor(umat_yuv, umat_rgb, CV_YUV2BGR_YUYV);
clock_gettime(CLOCK_MONOTONIC, &tp3);
umat_rgb.copyTo(mat_rgb);
clock_gettime(CLOCK_MONOTONIC, &tp4);
imshow("frame", mat_rgb);
clock_gettime(CLOCK_MONOTONIC, &tp5);
printf("get yuv Mat tdiff=%lf ms \n", tdiff_calc(tp0, tp1));
printf("Mat2UMat tdiff=%lf ms \n", tdiff_calc(tp1, tp2));
printf("cvtColor_YUV2BGR tdiff=%lf ms \n", tdiff_calc(tp2, tp3));
printf("UMat2Mat tdiff=%lf ms \n", tdiff_calc(tp3, tp4));
printf("imshow tdiff=%lf ms \n", tdiff_calc(tp4, tp5));
c = waitKey(1);
if( c == 27 || c == 'q' || c == 'Q' )
break;
}
cameraCloseDevice();
return 0;
}
Then I compile it to an exec bin: dsp_accelerate_opencv_opencl
result :
root@ok5718-idk:/home/forlinx/qt# on_dsp_accelerate.sh
get yuv Mat tdiff=5.017779 ms
Mat2UMat tdiff=0.672464 ms
cvtColor_YUV2BGR tdiff=0.165757 ms
UMat2Mat tdiff=151.863810 ms
imshow tdiff=1.787381 ms
root@ok5718-idk:/home/forlinx/qt# off_dsp_accelerate.sh
get yuv Mat tdiff=0.509635 ms
Mat2UMat tdiff=0.479541 ms
cvtColor_YUV2BGR tdiff=11.217494 ms
UMat2Mat tdiff=5.070482 ms
imshow tdiff=2.764032 ms
root@ok5718-idk:/home/forlinx/qt# cat on_dsp_accelerate.sh
export TI_OCL_CACHE_KERNELS=Y
export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
echo "OpenCL on"
./dsp_accelerate_opencv_opencl
root@ok5718-idk:/home/forlinx/qt# cat off_dsp_accelerate.sh
export OPENCV_OPENCL_DEVICE='disabled'
echo "OpenCL off"
./dsp_accelerate_opencv_opencl
The result shows UMat convert to Mat takes too much time when dsp accelerate is working.
I Found TI docs , http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_OpenCV.html#alternative-approach-to-add-new-opencl-kernels-at-opencv-application-level :
It said the UMat cost time includes: waiting time (for DSP to finish) plus actual write and any format conversion (done on CPU). It also depends a lot on data types used, and if floating point operations are involved. This can be accelerated if DSP optimized implementation of remap() is created.
So, my question is:How to optimized remap() with DSP accelerate。
I read the Chapters Creating OpenCL C kernel optimized for C66 core to Alternative approach to add new OpenCL kernels at OpenCV application level, but I found no way to do.
Please give me help, thank you.