Forums - Adreno 530: OpenCL get wrong image data

4 posts / 0 new
Last post
Adreno 530: OpenCL get wrong image data
karl.zhu
Join Date: 6 Feb 18
Posts: 3
Posted: Fri, 2018-02-23 00:59

I'm trying to make orb-slam2 run on Android with OpenCL  acceleration.

But some OpenCL program doesnot work as expected, 'v0 = img[tofs];' will get a '0', but 'v1 = img[-tofs]' get the correct value.

Here is the part of OpenCL program FAST.cl (from opencv3 with some modification at the bottom):

#define UPDATE_MASK(idx, ofs) \
        tofs = ofs; v0 = img[tofs]; v1 = img[-tofs]; \
        if (i==40 && j==19) printf("UPDATE MASK:(%d,%d), after, v0:%d, v1:%d, tofs:%d, img:%p\n", idx, ofs, v0, v1, tofs, img); \
        m0 |= ((v0 < t0) << idx) | ((v1 < t0) << (8 + idx)); \
        m1 |= ((v0 > t1) << idx) | ((v1 > t1) << (8 + idx))

        UPDATE_MASK(0, 3);
        if( (m0 | m1) == 0 ) {
            return;
        }

        UPDATE_MASK(2, -step*2+2);
        UPDATE_MASK(4, -step*3);
        UPDATE_MASK(6, -step*2-2);
         .....

        {
#if 1
            int s = cornerScore(img, step);
            printf("s:%d\n", s);
#endif
        }
 

The problem is : if the '#if  ... #endif' statement is enabled, the macro 'UPDATE_MASK' got wrong image data 'v0':
UPDATE MASK:(0,3), after, v0:56, v1:0, tofs:3, img:00000007ff8ae9c3
UPDATE MASK:(2,-122), after, v0:0, v1:46, tofs:-122, img:00000007ff8ae9c3
UPDATE MASK:(4,-186), after, v0:0, v1:44, tofs:-186, img:00000007ff8ae9c3
UPDATE MASK:(6,-126), after, v0:0, v1:42, tofs:-126, img:00000007ff8ae9c3
 

If I remove the call to 'cornerScore', it works fine.

And on my Ubuntu (17.10, i7-4790  Intel HD 4600 GPU),  the same code works fine , with/without the call to 'cornerScore'.

So could you give me some advice ?

Here is the OpenCL version get from OpenCV3:

Platform : msm8996_64, adreno 530, Android 7.1

haveOpenCL=1, useOpenCL=1
Total 1 platforms
Platform: Snapdragon(TM)
Vendor: QUALCOMM
Version: OpenCL 2.0 QUALCOMM build: commit #cce9f40 changeid #Ifcb662fdfd Date: 07/26/17 Wed Local Branch:  Remote Branch:
Devices: 1
    Device[0]: QUALCOMM Adreno(TM)
    Vendor: QUALCOMM
    Version: OpenCL 2.0 Adreno(TM) 530
    OpenCLVersion: cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_read_image_2x2 cl_qcom_android_native_buffer_host_ptr cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image
    DriverVersion: OpenCL 2.0 QUALCOMM build: commit #cce9f40 changeid #Ifcb662fdfd Date: 07/26/17 Wed Local Branch:  Remote Branch:  Compiler E031.31.00.03
    Extensions: cl_khr_3d_image_writes cl_img_egl_image cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_ion_host_ptr cl_qcom_perf_hint cl_qcom_read_image_2x2 cl_qcom_android_native_buffer_host_ptr cl_qcom_compressed_yuv_image_read cl_qcom_compressed_image
    Type: 4
    MaxWorkGroupSize: 1024
    MaxWorkItemDims: 3
    MaxComputeUnits: 4
    Available: 1
 

Original file of fast.cl can be found from here:

https://github.com/opencv/opencv/blob/master/modules/features2d/src/open...

Thanks a lot!

Zhu

  • Up0
  • Down0
karl.zhu
Join Date: 6 Feb 18
Posts: 3
Posted: Sat, 2018-02-24 01:35

A new test confirmed that some functions (at least one, 'cv::FAST' in OpenCV, as the OpenCL function 'cornerScore' from 'fast.cl', link in previous post) of OpenCV does not work correctly with Adreno 530.

And the performance of GPU is worse than CPU....

Let's check the test result first (Test Image is grayscale png pictures):

FAST (1241 x  376), keypoints:4607, time:0.010624 cpu
FAST (1241 x  376), keypoints:1225, time:0.016855 gpu
FAST (1034 x  313), keypoints:3285, time:0.006814 cpu
FAST (1034 x  313), keypoints:1059, time:0.018633 gpu
FAST ( 862 x  261), keypoints:2377, time:0.004954 cpu
FAST ( 862 x  261), keypoints:1001, time:0.018343 gpu
FAST ( 718 x  218), keypoints:1723, time:0.003611 cpu
FAST ( 718 x  218), keypoints: 766, time:0.022926 gpu
 

Here is my test program (Using OpenCV3 from Android 7.1 AOSP source tree 'external/opencv3', with a little modification to enable OpenCL support on Android):

#include <iostream>
#include <malloc.h>
#include <stdio.h>
#include <stdlib.h>
#include <fstream>
#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <ctime>
#include <vector>
#include <chrono>

void cvFast(cv::Mat image, bool useGPU)
{
    std::vector<cv::KeyPoint> keypoints;

    auto start = std::chrono::steady_clock::now();
    if (useGPU) {
        cv::UMat uimage;
        image.copyTo(uimage);
        start = std::chrono::steady_clock::now();
        cv::FAST(uimage, keypoints, 20, true);
    } else {
        cv::FAST(image, keypoints, 20, true);
    }

    auto end =std::chrono::steady_clock::now();
    std::chrono::duration<double> diff1 = end - start;

    printf("FAST (%4d x %4d), keypoints:%4zu, time:%6f %s\n",
            image.cols, image.rows,
            keypoints.size(),
            diff1.count(),
            useGPU ? "gpu" : "cpu");
}

int main(int argc, char *argv[])
{
    (void)argv;
    (void)argc;

    cv::ocl::setUseOpenCL(true);
    if (!cv::ocl::useOpenCL()) {
        std::cerr << "Enable OpenCL failed!\n";
        return;
    }
    cv::Mat image = cv::imread("0.png", cv::IMREAD_UNCHANGED);
    if (image.empty()) {
        std::cerr << "Open image 0.png failed" << std::endl;
        return -1;
    }
    cvFast(image, false);
    cvFast(image, true);
    return 0;

}

Again, the same program runs good in my host pc. Confusing....

  • Up0
  • Down0
karl.zhu
Join Date: 6 Feb 18
Posts: 3
Posted: Wed, 2018-03-14 21:48

Response from qualcomm:

"The root reason for this issue is negative offset issue on QUALCOMM's llvm compiler ."

But unfortunately Adreno 530 (820 platform) we are using, is a little old and qualcomm engineer said "the resolution couldn't be merged to it".

So finally we  gived up on OpenCL and tried other ways.

  • Up0
  • Down0
zk.SPiCa
Join Date: 30 Jul 17
Posts: 1
Posted: Sun, 2018-05-06 06:32

I‘ve encountered similar problem as you do, and I finally solved it by passing build option "-cl-opt-disable" to the clBuildProgram function, which disables compile optimization of kernel. I believe the compilation optimizer is to blame.

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.