Introduction

This section provides information about compiling UDO packages for all supported runtimes in SNPE.

As explained in Overview of UDO, a set of registration and implementation libraries is collectively referred to as a UDO package. The user has complete control over building these libraries for their desired runtimes using compatible tool-chains. Alternatively, SNPE SDK offers tools and utilities to create and compile a UDO easily. For more information about the tool used to create a UDO package refer to Creating a UDO package. This section explains UDO package compilation based on the directory structure provided by the package generator.

Implementing a User-defined operation

Fundamentally, a UDO is required to be developed using the set of APIs defined in header files located at $SNPE_ROOT/share/SnpeUdo/include. Each runtime may impose additional requirements and provide options for customizing the implementation to suit the runtime. Details of the UDO APIs can be found in the API documentation at C++ API.
This section assumes that a UDO package was generated using the UDO package generator tool described in Creating a UDO package which produces a partial implementation skeleton based on the UDO specification configured by the user.

Make Targets for Package Compilation

The UDO package generator tool creates a makefile to compile the package for a specific runtime and target platform combination. The makefile is intended to provide a simple interface to compile for platforms that use make natively or require ndk-build. Using the provided makefile also allows for per library compilation for various targets.

The general form of each make target is <runtime>_<platform>. Targets that are only of the form <runtime> include all possible targets. For instance, running

make cpu

will compile the CPU for both x86 and Android platforms. Additionally, for applicable platforms the PLATFORM make variable can be used to select a specific platform ABI similar to APP_ABI in ndk-build. By default PLATFORM is set to both arm64-v8a and armeabi-v7a. A comprehensive table of available make targets is presented below .

Note: Use of the makefile is optional and not required to generate libraries.

Note: For all following examples, the displayed artifacts are for arm64-v8a target.

Implementing a UDO for CPU

A CPU UDO implementation library based on core UDO APIs is required to run a UDO package on CPU runtime. The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of finalize(), execute(), and free() functions in the <OpName>.cpp file generated by the UDO package generator tool.

Note: One important notion to take into account is that the SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData defined in SnpeUdo_TensorParam_t but as an opaque pointer. The UDO implementation is expected to get a handle to the raw tensor pointers using the methods in the CustomOp operation object issued by SNPE at the time of execution. Refer to SnpeUdo_CpuInfrastructure_t for more details on the data structure.The CPU runtime operates only with floating point activation tensors. Therefore, CPU UDO implementations should be implemented to receive and produce only floating point tensors, and set the field data_type in the config file to FLOAT_32. All other data types will be ignored. Refer to Defining a UDO for more details.

Compiling and running the UDO package on host is required for SNPE model quantization tool, snpe-dlc-quantize. It is necessary to quantize a model using snpe-dlc-quantize, to run a UDO layer that has at least one non-float input on the DSP.

Compiling a UDO for CPU on host

Steps to compile the CPU UDO implementation library on host x86 platform are as below:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Run the make instruction below to compile the UDO package:
```
make cpu_x86
```

The expected artifacts after compiling for Host CPU are

The UDO CPU implementation library: <UDO-Package>/libs/x86-64_linux_clang/libUdo<UDO-Package>ImplCpu.so
The UDO package registration library: <UDO-Package>/libs/x86-64_linux_clang/libUdo<UDO-Package>Reg.so

Note: The command must be run from the package root.

Compiling a UDO for CPU on device

Steps to compile the CPU UDO implementation library on Android platform are as below:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

$NDK_BUILD must be set for the Android NDK build toolchain.
```
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
```
Run the make instruction below to compile the UDO package:
```
make cpu_android
```
The shared C++ standard library is required for the NDK build to run. Make sure libc++_shared.so is present on the device at LD_LIBRARY_PATH.

The expected artifacts after compiling for Android CPU are

The UDO CPU implementation library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>ImplCpu.so
The UDO package registration library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>Reg.so
A copy of shared standard C++ library: <UDO-Package>/libs/arm64-v8a/libc++_shared.so

Implementing a UDO for GPU

Similar to the CPU runtime, a GPU UDO implementation library based on core UDO APIs is required to run a UDO package on GPU runtime. The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of setKernelInfo() and <OpName>Operation() function, and adding the GPU kernel implementations in the <OpName>.cpp file generated by the UDO package generator tool.

SNPE GPU UDO supports 16-bit floating point activations in the network. Users should expect input/output OpenCL buffer memory from SNPE GPU UDO to be in 16-bit floating point (or OpenCL half) data format as the storage type. For increased accuracy, users may choose to implement internal math operations of the kernel using 32-bit floating point data, and converting to half precision when reading input buffers or writing output buffers from the UDO kernel.

SNPE GPU allows users the option to cache OpenCL program associated with multiple GPU UDO instances of similar type. It provides APIs to retrieve and store OpenCL program through SnpeUdo_GpuInfrastructure_t. Caching improves JIT compilation time of OpenCL program during subsequent invocations in the network.

Note: SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData but as an opaque pointer. The UDO implementation is expected to convert it to SnpeUdo_GpuTensorData_t and which holds OpenCL memory pointer for tensor. Refer to SnpeUdo_GpuTensorData_t for more details. Per-op factory infrastructure object issued by SNPE at the time of creating the UDO op factory will provide users OpenCL context and OpenCL command queue. Refer to SnpeUdo_GpuOpFactoryInfrastructure_t for more details on the data structure.

Compiling a UDO for GPU on device

Steps to compile the GPU UDO implementation library on Android platform are as below:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

$NDK_BUILD must be set for the Andorid NDK build toolchain.
```
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
```
$CL_LIBRARY_PATH must be set for the libOpenCL.so library location.
```
export CL_LIBRARY_PATH=<absolute_path_to_OpenCL_library>
```
The OpenCL shared library is not distributed as part of SNPE SDK.
Run the make instruction below to compile the UDO package:
```
make gpu_android
```

Note: The shared OpenCL library is target specific. It should be discoverable in CL_LIBRARY_PATH.

The expected artifacts after compiling for Android GPU are

The UDO GPU implementation library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>ImplGpu.so
The UDO package registration library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>Reg.so
A copy of shared standard C++ library: <UDO-Package>/libs/arm64-v8a/libc++_shared.so
A copy of shared OpenCL library: <UDO-Package>/libs/arm64-v8a/libOpenCL.so

Implementing a UDO for DSP V65 and V66

SNPE utilizes QNN to run UDO layers on DSP. Therefore, a DSP implementation library based on QNN SDK APIs is required to run a UDO package on DSP runtime. The UDO package generator tool will create the template file <OpName>.cpp and the user will need to implement the execution logic in the <OpName>_executeOp() function in the template file.

SNPE UDO provides the support for multi-threading of the operation using worker threads, Hexagon Vector Extensions (HVX) code and VTCM support.

The DSP runtime only propagates unsigned 8-bit activation tensors between the network layers. But it has the ability to de-quantize data to floating point if required. Therefore users developing DSP kernels can expect either UINT_8 or FLOAT_32 activation tensors in and out of the operation, and thus can set the field data_type in the config file to one of these two settings. Refer to Defining a UDO for more details.

Compiling a UDO for DSP V65 and V66 on device

This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 3.5.x.

Set the environment variables $SNPE_UDO_ROOT

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Hexagon-SDK needs to be installed and set up. For details, follow the setup instructions on $HEXAGON_SDK_ROOT/docs/readme.html page, where HEXAGON_SDK_ROOT is the location of the Hexagon-SDK installation. Make sure $HEXAGON_SDK_ROOT is set to use the Hexagon-SDK build toolchain. Also set $HEXAGON_TOOLS_ROOT and SDK_SETUP_ENV
```
export HEXAGON_SDK_ROOT=<path to hexagon sdk installation>
export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07
export ANDROID_NDK_ROOT=<path to Android NDK installation>
export SDK_SETUP_ENV=Done
```
$NDK_BUILD must be set for the Andorid NDK build toolchain.
```
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
```
Target architecture can also be specified when compiling the package. If no target architecture is supplied both arm64-v8a and armeabi-v7a are targeted.
```
export UDO_APP_ABI=<target_architecture>
```
Run the make instruction below to compile the UDO DSP implementation library:
```
make dsp PLATFORM=$UDO_APP_ABI
```

Note: For DSP, PLATFORM will only determine the ABI of the registration library.

The expected artifacts after compiling for DSP are

The UDO DSP implementation library: <UDO-Package>/libs/dsp_<dsp_arch_type>/libUdo<UDO-Package>ImplDsp.so
The UDO package registration library: <UDO-Package>/libs/$UDO_APP_ABI/libUdo<UDO-Package>Reg.so

Note: The command must be run from the package root.

Implementing a UDO for DSP V68 or later

SNPE utilizes QNN to run UDO layers on DSP v68 or later. Therefore, a DSP implementation library based on QNN SDK APIs is required to run a UDO package on DSP runtime. The UDO package generator tool will create the template file <OpName>ImplLibDsp.cpp and the user will need to implement the execution logic in the <OpName>Impl() function in the template file.

SNPE UDO provides the support for Hexagon Vector Extensions (HVX) code and cost based scheduling.

The DSP runtime propagates unsigned 8-bit or unsigned 16-bit activation tensors between the network layers. But it has the ability to de-quantize data to floating point if required. Therefore users developing DSP kernels can expect either UINT_8, UINT_16 or FLOAT_32 activation tensors in and out of the operation, and thus can set the field data_type in the config file to one of these three settings. Refer to QNN SDK for more details.

Compiling a UDO for DSP_V68 or later on device

This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 4.x and QNN SDK.

Set the environment variables $SNPE_UDO_ROOT

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Hexagon-SDK 4.0+ needs to be installed and set up. For Hexagon-SDK details, follow the setup instructions on $HEXAGON_SDK4_ROOT/docs/readme.html page, where HEXAGON_SDK4_ROOT is the location of the Hexagon-SDK installation. Make sure $HEXAGON_SDK4_ROOT is set to use the Hexagon-SDK build toolchain. Also, set $HEXAGON_TOOLS_ROOT and SDK_SETUP_ENV. Additionally, we need an extracted QNN-SDK (no need of QNN-SDK setup) for building the libraries. For QNN-SDK details, refer to the QNN documentation at $QNN_SDK_ROOT/docs/index.html page, where QNN_SDK_ROOT is the location of the QNN-SDK installation. Set the $QNN_SDK_ROOT to the unzipped QNN-SDK location.
```
export HEXAGON_SDK_ROOT=<path to hexagon sdk installation>
export HEXAGON_SDK4_ROOT=<path to hexagon sdk 4.x installation>
export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07/
export QNN_SDK_ROOT=<path to QNN sdk installation>
export ANDROID_NDK_ROOT=<path to Android NDK installation>
export SDK_SETUP_ENV=Done
```
$NDK_BUILD must be set for the Andorid NDK build toolchain.
```
export NDK_BUILD=<absolute_path_to_android_ndk_directory>
```
Target architecture can also be specified when compiling the package. If no target architecture is supplied both arm64-v8a and armeabi-v7a are targeted.
```
export UDO_APP_ABI=<target_architecture>
```
Run the make instruction below to compile the UDO DSP implementation library:
```
make dsp PLATFORM=$UDO_APP_ABI
```
Run the make instruction below to generate a library for offline cache generation:
```
make dsp_x86 X86_CXX=<path_to_x86_64_clang>
```
Run the make instruction below to generate a library that can be use on Android ARM architecture:
```
make dsp_aarch64
```

The expected artifacts after compiling for DSP are

The UDO DSP implementation library: <UDO-Package>/libs/dsp_v68/libUdo<UDO-Package>ImplDsp.so
The UDO package registration library: <UDO-Package>/libs/$UDO_APP_ABI/libUdo<UDO-Package>Reg.so

The expected artifact after compiling for offline cache generation is

The UDO DSP implementation library: <UDO-Package>/libs/x86-64_linux_clang/libUdo<UDO-Package>ImplDsp.so

The expected artifact after compiling for Android ARM architecture is

The UDO DSP implementation library: <UDO-Package>/libs/$UDO_APP_ABI/libUdo<UDO-Package>ImplDsp_AltPrep.so

Note: The command must be run from the package root.

Table of Make Targets

Make Target	Runtime	Platform	Misc.
all	CPU, GPU, DSP	x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
all_x86	CPU	x86
all_android	CPU, GPU, DSP	Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
reg	-	x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
reg_x86	-	x86
reg_android	-	Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
cpu	CPU	x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
cpu_x86	CPU	x86	Same as all_x86
cpu_android	CPU	Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
gpu	GPU	Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
gpu_android	GPU	Specified by PLATFORM (default arm64-v8a and armeabi-v7a)	Same as gpu
dsp	DSP	-
dsp_android	DSP	-	Same as dsp
dsp_x86	DSP	-
dsp_aarch64	DSP	-

Note: By default, compiling for a runtime additionally compiles the corresponding registration library