Quantcast
Channel: Processors forum - Recent Threads
Viewing all articles
Browse latest Browse all 17527

Linux/PROCESSOR-SDK-AM57X: OpenCL+DSP accelerate

$
0
0

Part Number:PROCESSOR-SDK-AM57X

Tool/software: Linux

Hi Ti,

I am doing OpenCV+DSP accelerate on AM5718 board,SDK version is ti-processor-sdk-linux-rt-am57xx-evm-04.01.00.06.

I found UMat convert to Mat takes too much time。

Here is my test code and result:

code:

int main(int argc, char ** argv)

{

unsigned char *yuv_raw_data = NULL;
int c = -1;
struct timespec tp0, tp1, tp2, tp3, tp4, tp5;

cameraOpenDevice(1);

Mat mat_yuv(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC2);
Mat mat_rgb(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC3);

UMat umat_yuv(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC2);
UMat umat_rgb(CAMERA_HEIGHT , CAMERA_WIDTH, CV_8UC3);

while(1) {
clock_gettime(CLOCK_MONOTONIC, &tp0);
DqBuffer(&yuv_raw_data);//yuv raw data from V4l2 camera
QBuffer();
memcpy(mat_yuv.data, yuv_raw_data, CAMERA_WIDTH*CAMERA_HEIGHT*2);

clock_gettime(CLOCK_MONOTONIC, &tp1);
mat_yuv.copyTo(umat_yuv);

clock_gettime(CLOCK_MONOTONIC, &tp2);

cvtColor(umat_yuv, umat_rgb, CV_YUV2BGR_YUYV);

clock_gettime(CLOCK_MONOTONIC, &tp3);

umat_rgb.copyTo(mat_rgb);
clock_gettime(CLOCK_MONOTONIC, &tp4);

imshow("frame", mat_rgb);
clock_gettime(CLOCK_MONOTONIC, &tp5);

printf("get yuv Mat tdiff=%lf ms \n", tdiff_calc(tp0, tp1));
printf("Mat2UMat tdiff=%lf ms \n", tdiff_calc(tp1, tp2));
printf("cvtColor_YUV2BGR tdiff=%lf ms \n", tdiff_calc(tp2, tp3));
printf("UMat2Mat tdiff=%lf ms \n", tdiff_calc(tp3, tp4));
printf("imshow tdiff=%lf ms \n", tdiff_calc(tp4, tp5));

c = waitKey(1);
if( c == 27 || c == 'q' || c == 'Q' )
break;
}

cameraCloseDevice();
return 0;

}

Then I compile it to an exec bin:  dsp_accelerate_opencv_opencl

result :

root@ok5718-idk:/home/forlinx/qt# on_dsp_accelerate.sh

get yuv Mat tdiff=5.017779 ms
Mat2UMat tdiff=0.672464 ms
cvtColor_YUV2BGR tdiff=0.165757 ms
UMat2Mat tdiff=151.863810 ms
imshow tdiff=1.787381 ms

root@ok5718-idk:/home/forlinx/qt# off_dsp_accelerate.sh

get yuv Mat tdiff=0.509635 ms
Mat2UMat tdiff=0.479541 ms
cvtColor_YUV2BGR tdiff=11.217494 ms
UMat2Mat tdiff=5.070482 ms
imshow tdiff=2.764032 ms

root@ok5718-idk:/home/forlinx/qt# cat on_dsp_accelerate.sh
export TI_OCL_CACHE_KERNELS=Y
export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP'
echo "OpenCL on"
./dsp_accelerate_opencv_opencl

root@ok5718-idk:/home/forlinx/qt# cat off_dsp_accelerate.sh
export OPENCV_OPENCL_DEVICE='disabled'
echo "OpenCL off"
./dsp_accelerate_opencv_opencl

The result shows UMat convert to Mat takes too much time when dsp accelerate is working.

I Found TI docs , http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_OpenCV.html#alternative-approach-to-add-new-opencl-kernels-at-opencv-application-level :

It said the UMat cost time includes: waiting time (for DSP to finish) plus actual write and any format conversion (done on CPU). It also depends a lot on data types used, and if floating point operations are involved. This can be accelerated if DSP optimized implementation of remap() is created.

So, my question is:How to optimized remap() with DSP accelerate。

I read the Chapters Creating OpenCL C kernel optimized for C66 core to Alternative approach to add new OpenCL kernels at OpenCV application level, but I found no way to do.

Please give me help, thank you.


Viewing all articles
Browse latest Browse all 17527

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>