I'm trying to make orb-slam2 run on Android with OpenCL acceleration.
But some OpenCL program doesnot work as expected, 'v0 = img[tofs];' will get a '0', but 'v1 = img[-tofs]' get the correct value.
Here is the part of OpenCL program FAST.cl (from opencv3 with some modification at the bottom):
#define UPDATE_MASK(idx, ofs) \
tofs = ofs; v0 = img[tofs]; v1 = img[-tofs]; \
if (i==40 && j==19) printf("UPDATE MASK:(%d,%d), after, v0:%d, v1:%d, tofs:%d, img:%p\n", idx, ofs, v0, v1, tofs, img); \
m0 |= ((v0 < t0) << idx) | ((v1 < t0) << (8 + idx)); \
m1 |= ((v0 > t1) << idx) | ((v1 > t1) << (8 + idx))
UPDATE_MASK(0, 3);
if( (m0 | m1) == 0 ) {
return;
}
UPDATE_MASK(2, -step*2+2);
UPDATE_MASK(4, -step*3);
UPDATE_MASK(6, -step*2-2);
.....
{
#if 1
int s = cornerScore(img, step);
printf("s:%d\n", s);
#endif
}
The problem is : if the '#if ... #endif' statement is enabled, the macro 'UPDATE_MASK' got wrong image data 'v0':
UPDATE MASK:(0,3), after, v0:56, v1:0, tofs:3, img:00000007ff8ae9c3
UPDATE MASK:(2,-122), after, v0:0, v1:46, tofs:-122, img:00000007ff8ae9c3
UPDATE MASK:(4,-186), after, v0:0, v1:44, tofs:-186, img:00000007ff8ae9c3
UPDATE MASK:(6,-126), after, v0:0, v1:42, tofs:-126, img:00000007ff8ae9c3
If I remove the call to 'cornerScore', it works fine.
And on my Ubuntu (17.10, i7-4790 Intel HD 4600 GPU), the same code works fine , with/without the call to 'cornerScore'.
So could you give me some advice ?
Here is the OpenCL version get from OpenCV3:
Platform : msm8996_64, adreno 530, Android 7.1
haveOpenCL=1, useOpenCL=1
Total 1 platforms
Platform: Snapdragon(TM)
Vendor: QUALCOMM
Version: OpenCL 2.0 QUALCOMM build: commit #cce9f40 changeid #Ifcb662fdfd Date: 07/26/17 Wed Local Branch: Remote Branch:
Devices: 1
Device[0]: QUALCOMM Adreno(TM)
Vendor: QUALCOMM
Version: OpenCL 2.0 Adreno(TM) 530
OpenCLVersion: cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_read_image_2x2 cl_qcom_android_native_buffer_host_ptr cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image
DriverVersion: OpenCL 2.0 QUALCOMM build: commit #cce9f40 changeid #Ifcb662fdfd Date: 07/26/17 Wed Local Branch: Remote Branch: Compiler E031.31.00.03
Extensions: cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_read_image_2x2 cl_qcom_android_native_buffer_host_ptr cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image
Type: 4
MaxWorkGroupSize: 1024
MaxWorkItemDims: 3
MaxComputeUnits: 4
Available: 1
Original file of fast.cl can be found from here:
https://github.com/opencv/opencv/blob/master/modules/features2d/src/open...
Thanks a lot!
Zhu
A new test confirmed that some functions (at least one, 'cv::FAST' in OpenCV, as the OpenCL function 'cornerScore' from 'fast.cl', link in previous post) of OpenCV does not work correctly with Adreno 530.
And the performance of GPU is worse than CPU....
Let's check the test result first (Test Image is grayscale png pictures):
FAST (1241 x 376), keypoints:4607, time:0.010624 cpu
FAST (1241 x 376), keypoints:1225, time:0.016855 gpu
FAST (1034 x 313), keypoints:3285, time:0.006814 cpu
FAST (1034 x 313), keypoints:1059, time:0.018633 gpu
FAST ( 862 x 261), keypoints:2377, time:0.004954 cpu
FAST ( 862 x 261), keypoints:1001, time:0.018343 gpu
FAST ( 718 x 218), keypoints:1723, time:0.003611 cpu
FAST ( 718 x 218), keypoints: 766, time:0.022926 gpu
Here is my test program (Using OpenCV3 from Android 7.1 AOSP source tree 'external/opencv3', with a little modification to enable OpenCL support on Android):
#include <iostream>
#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <ctime>
#include <vector>
#include <chrono>
void cvFast(cv::Mat image, bool useGPU)
{
std::vector<cv::KeyPoint> keypoints;
auto start = std::chrono::steady_clock::now();
if (useGPU) {
cv::UMat uimage;
image.copyTo(uimage);
start = std::chrono::steady_clock::now();
cv::FAST(uimage, keypoints, 20, true);
} else {
cv::FAST(image, keypoints, 20, true);
}
auto end =std::chrono::steady_clock::now();
std::chrono::duration<double> diff1 = end - start;
printf("FAST (%4d x %4d), keypoints:%4zu, time:%6f %s\n",
image.cols, image.rows,
keypoints.size(),
diff1.count(),
useGPU ? "gpu" : "cpu");
}
int main(int argc, char *argv[])
{
(void)argv;
(void)argc;
cv::ocl::setUseOpenCL(true);
if (!cv::ocl::useOpenCL()) {
std::cerr << "Enable OpenCL failed!\n";
return;
}
cv::Mat image = cv::imread("0.png", cv::IMREAD_UNCHANGED);
if (image.empty()) {
std::cerr << "Open image 0.png failed" << std::endl;
return -1;
}
cvFast(image, false);
cvFast(image, true);
return 0;
}
Again, the same program runs good in my host pc. Confusing....
Response from qualcomm:
"The root reason for this issue is negative offset issue on QUALCOMM's llvm compiler ."
But unfortunately Adreno 530 (820 platform) we are using, is a little old and qualcomm engineer said "the resolution couldn't be merged to it".
So finally we gived up on OpenCL and tried other ways.
I‘ve encountered similar problem as you do, and I finally solved it by passing build option "-cl-opt-disable" to the clBuildProgram function, which disables compile optimization of kernel. I believe the compilation optimizer is to blame.