Snapdragon Neural Processing Engine SDK
Reference Guide
|
This section provides information about compiling UDO packages for all supported runtimes in SNPE.
As explained in Overview of UDO, a set of registration and implementation libraries is collectively referred to as a UDO package. The user has complete control over building these libraries for their desired runtimes using compatible tool-chains. Alternatively, SNPE SDK offers tools and utilities to create and compile a UDO easily. For more information about the tool used to create a UDO package refer to Creating a UDO package. This section explains UDO package compilation based on the directory structure provided by the package generator.
Fundamentally, a UDO is required to be developed using the set of APIs defined in header files located at $SNPE_ROOT/share/SnpeUdo/include. Each runtime may impose additional requirements and provide options for customizing the implementation to suit the runtime. Details of the UDO APIs can be found in the API documentation at C++ API.
This section assumes that a UDO package was generated using the UDO package generator tool described in Creating a UDO package which produces a partial implementation skeleton based on the UDO specification configured by the user.
The UDO package generator tool creates a makefile to compile the package for a specific runtime and target platform combination. The makefile is intended to provide a simple interface to compile for platforms that use make natively or require ndk-build. Using the provided makefile also allows for per library compilation for various targets.
The general form of each make target is <runtime>_<platform>. Targets that are only of the form <runtime> include all possible targets. For instance, running
make cpu
will compile the CPU for both x86 and Android platforms. Additionally, for applicable platforms the PLATFORM make variable can be used to select a specific platform ABI similar to APP_ABI in ndk-build. By default PLATFORM is set to both arm64-v8a and armeabi-v7a. A comprehensive table of available make targets is presented below .
Note: Use of the makefile is optional and not required to generate libraries.
Note: For all following examples, the displayed artifacts are for arm64-v8a target.
A CPU UDO implementation library based on core UDO APIs is required to run a UDO package on CPU runtime. The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of finalize()
, execute()
, and free()
functions in the <OpName>.cpp file generated by the UDO package generator tool.
Note: One important notion to take into account is that the SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData
defined in SnpeUdo_TensorParam_t
but as an opaque pointer. The UDO implementation is expected to get a handle to the raw tensor pointers using the methods in the CustomOp operation object issued by SNPE at the time of execution. Refer to SnpeUdo_CpuInfrastructure_t
for more details on the data structure.The CPU runtime operates only with floating point activation tensors. Therefore, CPU UDO implementations should be implemented to receive and produce only floating point tensors, and set the field data_type in the config file to FLOAT_32. All other data types will be ignored. Refer to Defining a UDO for more details.
Compiling and running the UDO package on host is required for SNPE model quantization tool, snpe-dlc-quantize
. It is necessary to quantize a model using snpe-dlc-quantize, to run a UDO layer that has at least one non-float input on the DSP.
Steps to compile the CPU UDO implementation library on host x86 platform are as below:
Set the environment variable $SNPE_UDO_ROOT
.
export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
Run the make instruction below to compile the UDO package:
make cpu_x86
The expected artifacts after compiling for Host CPU are
Note: The command must be run from the package root.
Steps to compile the CPU UDO implementation library on Android platform are as below:
Set the environment variable $SNPE_UDO_ROOT
.
export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
$NDK_BUILD
must be set for the Android NDK build toolchain.
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
Run the make instruction below to compile the UDO package:
make cpu_android
The shared C++ standard library is required for the NDK build to run. Make sure libc++_shared.so is present on the device at LD_LIBRARY_PATH
.
The expected artifacts after compiling for Android CPU are
Similar to the CPU runtime, a GPU UDO implementation library based on core UDO APIs is required to run a UDO package on GPU runtime. The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of setKernelInfo()
and <OpName>Operation()
function, and adding the GPU kernel implementations in the <OpName>.cpp file generated by the UDO package generator tool.
SNPE GPU UDO supports 16-bit floating point activations in the network. Users should expect input/output OpenCL buffer memory from SNPE GPU UDO to be in 16-bit floating point (or OpenCL half) data format as the storage type. For increased accuracy, users may choose to implement internal math operations of the kernel using 32-bit floating point data, and converting to half precision when reading input buffers or writing output buffers from the UDO kernel.
SNPE GPU allows users the option to cache OpenCL program associated with multiple GPU UDO instances of similar type. It provides APIs to retrieve and store OpenCL program through SnpeUdo_GpuInfrastructure_t
. Caching improves JIT compilation time of OpenCL program during subsequent invocations in the network.
Note: SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData
but as an opaque pointer. The UDO implementation is expected to convert it to SnpeUdo_GpuTensorData_t
and which holds OpenCL memory pointer for tensor. Refer to SnpeUdo_GpuTensorData_t
for more details. Per-op factory infrastructure object issued by SNPE at the time of creating the UDO op factory will provide users OpenCL context and OpenCL command queue. Refer to SnpeUdo_GpuOpFactoryInfrastructure_t
for more details on the data structure.
Steps to compile the GPU UDO implementation library on Android platform are as below:
Set the environment variable $SNPE_UDO_ROOT
.
export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
$NDK_BUILD
must be set for the Andorid NDK build toolchain.
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
$CL_LIBRARY_PATH
must be set for the libOpenCL.so library location.
export CL_LIBRARY_PATH=<absolute_path_to_OpenCL_library>
The OpenCL shared library is not distributed as part of SNPE SDK.
Run the make instruction below to compile the UDO package:
make gpu_android
Note: The shared OpenCL library is target specific. It should be discoverable in CL_LIBRARY_PATH
.
The expected artifacts after compiling for Android GPU are
SNPE utilizes QNN to run UDO layers on DSP. Therefore, a DSP implementation library based on QNN SDK APIs is required to run a UDO package on DSP runtime. The UDO package generator tool will create the template file <OpName>.cpp and the user will need to implement the execution logic in the <OpName>_executeOp()
function in the template file.
SNPE UDO provides the support for multi-threading of the operation using worker threads, Hexagon Vector Extensions (HVX) code and VTCM support.
The DSP runtime only propagates unsigned 8-bit activation tensors between the network layers. But it has the ability to de-quantize data to floating point if required. Therefore users developing DSP kernels can expect either UINT_8 or FLOAT_32 activation tensors in and out of the operation, and thus can set the field data_type in the config file to one of these two settings. Refer to Defining a UDO for more details.
This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 3.5.x.
Set the environment variables $SNPE_UDO_ROOT
export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
Hexagon-SDK needs to be installed and set up. For details, follow the setup instructions on $HEXAGON_SDK_ROOT/docs/readme.html
page, where HEXAGON_SDK_ROOT
is the location of the Hexagon-SDK installation. Make sure $HEXAGON_SDK_ROOT
is set to use the Hexagon-SDK build toolchain. Also set $HEXAGON_TOOLS_ROOT
and SDK_SETUP_ENV
export HEXAGON_SDK_ROOT=<path to hexagon sdk installation> export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07 export ANDROID_NDK_ROOT=<path to Android NDK installation> export SDK_SETUP_ENV=Done
$NDK_BUILD
must be set for the Andorid NDK build toolchain.
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
Target architecture can also be specified when compiling the package. If no target architecture is supplied both arm64-v8a and armeabi-v7a are targeted.
export UDO_APP_ABI=<target_architecture>
Run the make instruction below to compile the UDO DSP implementation library:
make dsp PLATFORM=$UDO_APP_ABI
Note: For DSP, PLATFORM will only determine the ABI of the registration library.
The expected artifacts after compiling for DSP are
Note: The command must be run from the package root.
SNPE utilizes QNN to run UDO layers on DSP v68 or later. Therefore, a DSP implementation library based on QNN SDK APIs is required to run a UDO package on DSP runtime. The UDO package generator tool will create the template file <OpName>ImplLibDsp.cpp and the user will need to implement the execution logic in the <OpName>Impl()
function in the template file.
SNPE UDO provides the support for Hexagon Vector Extensions (HVX) code and cost based scheduling.
The DSP runtime propagates unsigned 8-bit or unsigned 16-bit activation tensors between the network layers. But it has the ability to de-quantize data to floating point if required. Therefore users developing DSP kernels can expect either UINT_8, UINT_16 or FLOAT_32 activation tensors in and out of the operation, and thus can set the field data_type in the config file to one of these three settings. Refer to QNN SDK for more details.
This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 4.x and QNN SDK.
Set the environment variables $SNPE_UDO_ROOT
export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>
Hexagon-SDK 4.0+ needs to be installed and set up. For Hexagon-SDK details, follow the setup instructions on $HEXAGON_SDK4_ROOT/docs/readme.html
page, where HEXAGON_SDK4_ROOT
is the location of the Hexagon-SDK installation. Make sure $HEXAGON_SDK4_ROOT
is set to use the Hexagon-SDK build toolchain. Also, set $HEXAGON_TOOLS_ROOT
and SDK_SETUP_ENV
. Additionally, we need an extracted QNN-SDK (no need of QNN-SDK setup) for building the libraries. For QNN-SDK details, refer to the QNN documentation at $QNN_SDK_ROOT/docs/index.html
page, where QNN_SDK_ROOT
is the location of the QNN-SDK installation. Set the $QNN_SDK_ROOT
to the unzipped QNN-SDK location.
export HEXAGON_SDK_ROOT=<path to hexagon sdk installation> export HEXAGON_SDK4_ROOT=<path to hexagon sdk 4.x installation> export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07/ export QNN_SDK_ROOT=<path to QNN sdk installation> export ANDROID_NDK_ROOT=<path to Android NDK installation> export SDK_SETUP_ENV=Done
$NDK_BUILD
must be set for the Andorid NDK build toolchain.
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
Target architecture can also be specified when compiling the package. If no target architecture is supplied both arm64-v8a and armeabi-v7a are targeted.
export UDO_APP_ABI=<target_architecture>
Run the make instruction below to compile the UDO DSP implementation library:
make dsp PLATFORM=$UDO_APP_ABI
Run the make instruction below to generate a library for offline cache generation:
make dsp_x86 X86_CXX=<path_to_x86_64_clang>
Run the make instruction below to generate a library that can be use on Android ARM architecture:
make dsp_aarch64
The expected artifacts after compiling for DSP are
The expected artifact after compiling for offline cache generation is
The expected artifact after compiling for Android ARM architecture is
Note: The command must be run from the package root.
Make Target | Runtime | Platform | Misc. |
---|---|---|---|
all | CPU, GPU, DSP | x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
all_x86 | CPU | x86 | |
all_android | CPU, GPU, DSP | Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
reg | - | x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
reg_x86 | - | x86 | |
reg_android | - | Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
cpu | CPU | x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
cpu_x86 | CPU | x86 | Same as all_x86 |
cpu_android | CPU | Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
gpu | GPU | Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | |
gpu_android | GPU | Specified by PLATFORM (default arm64-v8a and armeabi-v7a) | Same as gpu |
dsp | DSP | - | |
dsp_android | DSP | - | Same as dsp |
dsp_x86 | DSP | - | |
dsp_aarch64 | DSP | - |
Note: By default, compiling for a runtime additionally compiles the corresponding registration library