Hello SNPE team,
I am trying snpe-1.23.1.245 on Linux Embedded OS base on QCS605 SoC.
I tried the native example:
snpe-1.23.1.245/examples/NativeCpp/SampleCode
with aarch64-oe-linux-gcc6.4 toolchain
make -f Makefile.aarch64-oe-linux-gcc6.4
However, when I run that sample binary on target, it could not use GPU
# snpe-sample -b ITENSOR -d ../dlc/bvlc_alexnet.dlc -i target_raw_list.txt -o output_sample -r gpusnpe-sample -b ITENSOR -d ../dlc/bvlc_alexnet.dlc -i target_raw_list.txt -o output_sample -r gpu
SNPE Version: 1.23.1.245
Selected runtime not present. Falling back to CPU.
Batch size for the container is 1
Processing DNN Input: cropped/plastic_cup.raw
Processing DNN Input: cropped/trash_bin.raw
Processing DNN Input: cropped/notice_sign.raw
Processing DNN Input: cropped/handicap_sign.raw
Processing DNN Input: cropped/chairs.raw
It looks like SNPE GPU runtime based on openCL and I checked that openCL is available on this OS by porting the clinfo
# /data/bin/clinfo -a
Platform #0
Name: Snapdragon(TM)
Vendor: QUALCOMM
Version: OpenCL 2.0 QUALCOMM build: commit #c3dd282 changeid # Date: 01/24/19 Thu Local Branch: Remote Branch:
Profile: FULL_PROFILE
Extensions: (null)
Device #0
Name: QUALCOMM Adreno(TM)
Type: GPU
Vendor: QUALCOMM
Vendor ID: 3209509963
Profile: FULL_PROFILE
Available: Yes
Version: OpenCL 2.0 Adreno(TM) 615
Driver version: OpenCL 2.0 QUALCOMM build: commit #c3dd282 changeid # Date: 01/24/19 Thu Local Branch: Remote Branch: Compiler E031.36.04.01
Compiler available: Yes
Address space size: 64
Little endian: Yes
Error correction support: No
Address alignment (bits): 1024
Smallest alignment (bytes): 128
Resolution of timer (ns): 1000
Max clock frequency (MHz): 1
Max compute units: 1
Max constant args: 8
Max constant buffer size: 64 kB
Max mem alloc size: 256 MB
Max parameter size: 1024
Command-queue supported props: Out of order execution
Profiling
Execution capabilities: OpenCL kernels
Global memory size: 1 GB
Global memory cache size: 64 kB
Global memory line cache size: 64
Local memory size: 32 kB
Local memory type: Local
Global memory cache type: Read write
Max work group size: 1024
Max work item dimensions: 3
Max work item sizes: (1024, 1024, 1024)
Image support: Yes
Max 2D image height: 16384
Max 2D image width: 16384
Max 3D image depth: 2048
Max 3D image height: 16384
Max 3D image width: 16384
Max read image args: 128
Max write image args: 64
Max samplers: 16
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Half precision float capability: Inf and NaNs
Round to nearest even rounding mode
Round to +ve and -ve infinity rounding modes
Single precision float capability: Inf and NaNs
Round to nearest even rounding mode
Round to +ve and -ve infinity rounding modes
Double precision float capability: Not supported
Extensions: cl_khr_3d_image_writes
cl_img_egl_image
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_khr_egl_event
cl_khr_egl_image
cl_khr_fp16
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_image2d_from_buffer
cl_khr_mipmap_image
cl_khr_srgb_image_writes
cl_khr_subgroups
cl_qcom_create_buffer_from_image
cl_qcom_ext_host_ptr
cl_qcom_ion_host_ptr
cl_qcom_perf_hint
cl_qcom_other_image
cl_qcom_subgroup_shuffle
cl_qcom_vector_image_ops
cl_qcom_extract_image_plane
cl_qcom_protected_context
cl_qcom_priority_hint
cl_qcom_compressed_yuv_image_read
cl_qcom_compressed_image
cl_qcom_ext_host_ptr_iocoherent
cl_qcom_accelerated_image_ops
Could you please guide me how to enable GPU runtime?
Thanks and best regards,
CDQ
From logcat, I got this warning
03-08 07:43:01.565 3266 3266 E Adreno-CB: <clGetPlatformIDs:2399>: Fatal: Failed to open libCB from libOpenCL
03-08 07:43:03.469 3266 3266 E Adreno-CB: <clGetPlatformIDs:2399>: Fatal: Failed to open libCB from libOpenCL
But I have libCB.so in /usr/lib64/libCB.so
Looks like there is a problem with dl loader. I found a workaround solution by init the cl runtime first before using SNPE