Snapdragon Neural Processing Engine SDK Reference Guide
Compiling a UDO package

# Introduction

This section provides information about compiling UDO packages for all supported runtimes in SNPE.

As explained in Overview of UDO, a set of registration and implementation libraries is collectively referred to as a UDO package. The user has complete control over building these libraries for their desired runtimes using compatible tool-chains. Alternatively, SNPE SDK offers tools and utilities to create and compile a UDO easily. For more information about the tool used to create a UDO package refer to Creating a UDO package. This section explains UDO package compilation based on the directory structure provided by the package generator.

Fundamentally, a UDO is required to be developed using the set of APIs defined in header files located at $SNPE_ROOT/share/SnpeUdo/include. Each runtime may impose additional requirements and provide options for customizing the implementation to suit the runtime. Details of the UDO APIs can be found in the API documentation at C++ API. This section assumes that a UDO package was generated using the UDO package generator tool described in Creating a UDO package which produces a partial implementation skeleton based on the UDO specification configured by the user. # Make Targets for Package Compilation The UDO package generator tool creates a makefile to compile the package for a specific runtime and target platform combination. The makefile is intended to provide a simple interface to compile for platforms that use make natively or require ndk-build. Using the provided makefile also allows for per library compilation for various targets. The general form of each make target is <runtime>_<platform>. Targets that are only of the form <runtime> include all possible targets. For instance, running make cpu  will compile the CPU for both x86 and Android platforms. Additionally, for applicable platforms the PLATFORM make variable can be used to select a specific platform ABI similar to APP_ABI in ndk-build. By default PLATFORM is set to both arm64-v8a and armeabi-v7a. A comprehensive table of available make targets is presented below . Note: Use of the makefile is optional and not required to generate libraries. Note: For all following examples, the displayed artifacts are for arm64-v8a target. # Implementing a UDO for CPU A CPU UDO implementation library based on core UDO APIs is required to run a UDO package on CPU runtime. The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of createOp() and snpeUdoExecute() functions in the <UDO-type>ImplLibCpu.cpp file generated by the UDO package generator tool. Note: One important notion to take into account is that the SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData defined in SnpeUdo_TensorParam_t but as an opaque pointer. The UDO implementation is expected to get a handle to the raw tensor pointers using the methods in the per-op factory infrastructure object issued by SNPE at the time of creating the UDO op factory. Refer to SnpeUdo_CpuInfrastructure_t for more details on the data structure. The CPU runtime operates only with floating point activation tensors. Therefore CPU UDO implementations should be implemented to receive and produce only floating point tensors, and set the field data_type in the config file to FLOAT_32. All other data types will be ignored. Refer to Defining a UDO for more details. Compiling and running the UDO package on host is required for SNPE model quantization tool, snpe-dlc-quantize. It is necessary to quantize a model using snpe-dlc-quantize, to run a UDO layer that has at least one non-float input on the DSP. # Compiling a UDO for CPU on host Steps to compile the CPU UDO implementation library on host x86 platform are as below: 1. Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

2. Run the make instruction below to compile the UDO package:

make cpu_x86

The expected artifacts after compiling for Host CPU are

• The UDO CPU implementation library: <UDO-Package>/libs/x86-64_linux_clang/libUdo<UDO-Package>ImplCpu.so
• The UDO package registration library: <UDO-Package>/libs/x86-64_linux_clang/libUdo<UDO-Package>Reg.so

Note: The command must be run from the package root.

# Compiling a UDO for CPU on device

Steps to compile the CPU UDO implementation library on Android platform are as below:

1. Set the environment variable $SNPE_UDO_ROOT. export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory> 2. $NDK_BUILD must be set for the Android NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

3. Run the make instruction below to compile the UDO package:

make cpu_android

The shared C++ standard library is required for the NDK build to run. Make sure libc++_shared.so is present on the device at LD_LIBRARY_PATH.

The expected artifacts after compiling for Android CPU are

• The UDO CPU implementation library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>ImplCpu.so
• The UDO package registration library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>Reg.so
• A copy of shared standard C++ library: <UDO-Package>/libs/arm64-v8a/libc++_shared.so

# Implementing a UDO for GPU

Similar to the CPU runtime, a GPU UDO implementation library based on core UDO APIs is required to run a UDO package on GPU runtime.
The UDO package generator tool will create a skeleton containing blank constructs in the required format, but the core logic of creating and execution of the operation needs to be filled in by the user. This can be done by completing the implementation of createOp() function and adding the GPU kernel implementations in the <UDO-type>ImplLibCpu.cpp file generated by the UDO package generator tool.

SNPE GPU UDO supports 16-bit floating point activations in the network. Users should expect input/output OpenCL buffer memory from SNPE GPU UDO to be in 16-bit floating point (or OpenCL half) data format as the storage type. For increased accuracy, users may choose to implement internal math operations of the kernel using 32-bit floating point data, and converting to half precision when reading input buffers or writing output buffers from the UDO kernel.

SNPE GPU allows users the option to cache OpenCL program associated with multiple GPU UDO instances of similar type. It provides APIs to retrieve and store OpenCL program through SnpeUdo_GpuInfrastructure_t. Caching improves JIT compilation time of OpenCL program during subsequent invocations in the network.

Note: SNPE provides tensor data corresponding to all the inputs and outputs of a UDO not directly inside tensorData but as an opaque pointer. The UDO implementation is expected to convert it to SnpeUdo_GpuTensorData_t and which holds OpenCL memory pointer for tensor. Refer to SnpeUdo_GpuTensorData_t for more details. Per-op factory infrastructure object issued by SNPE at the time of creating the UDO op factory will provide users OpenCL context and OpenCL commandqueue. Refer to SnpeUdo_GpuOpFactoryInfrastructure_t for more details on the data structure.

# Compiling a UDO for GPU on device

Steps to compile the GPU UDO implementation library on Android platform are as below:

1. Set the environment variable $SNPE_UDO_ROOT. export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory> 2. $NDK_BUILD must be set for the Andorid NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

3. $CL_LIBRARY_PATH must be set for the libOpenCL.so library location. export CL_LIBRARY_PATH=<absolute_path_to_OpenCL_library> The OpenCL shared library is not distributed as part of SNPE SDK. 4. Run the make instruction below to compile the UDO package: make gpu_android Note: The shared OpenCL library is target specific. It should be discoverable in CL_LIBRARY_PATH. The expected artifacts after compiling for Android GPU are • The UDO GPU implementation library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>ImplGpu.so • The UDO package registration library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>Reg.so • A copy of shared standard C++ library: <UDO-Package>/libs/arm64-v8a/libc++_shared.so • A copy of shared OpenCL library: <UDO-Package>/libs/arm64-v8a/libOpenCL.so # Implementing a UDO for DSP SNPE utilizes Hexagon-NN to run UDO layers on DSP. Therefore, a DSP implementation library based on core UDO APIs for Hexagon-NN is required to run a UDO package on DSP runtime. The UDO package generator tool will create the template file <UDO-type>ImplLibDsp.c and <UDO-type>ImplLibDsp.h and the user will need to implement the execution logic in the SnpeUdo_executeOp() function in the .c file as well as the <UDO-type>OpInfo struct in the header file for storing operation execute information. SNPE UDO provides the support for multi-threading of the operation using worker threads, Hexagon Vector Extensions (HVX) code and VTCM support. The DSP runtime only propagates unsigned 8-bit activation tensors between the network layers. But it has the ability to de-quantize data to floating point if required. Therefore users developing DSP kernels can expect either UINT_8 or FLOAT_32 activation tensors in and out of the operation, and thus can set the field data_type in the config file to one of these two settings. Refer to Defining a UDO for more details. Note: Only C files are supported for UDO on DSP runtime. # Compiling a UDO for DSP on device This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 3.5.1/3.5.2. 1. Set the environment variables $SNPE_UDO_ROOT

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

2. Hexagon-SDK needs to be installed and set up. For details, follow the setup instructions on $HEXAGON_SDK_ROOT/docs/readme.html page, where HEXAGON_SDK_ROOT is the location of the Hexagon-SDK installation. Make sure $HEXAGON_SDK_ROOT is set to use the Hexagon-SDK build toolchain. Also set $HEXAGON_TOOLS_ROOT and SDK_SETUP_ENV export HEXAGON_SDK_ROOT=<path to hexagon sdk installation> export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07
export SDK_SETUP_ENV=Done

3. Run the make instruction below to compile the UDO DSP implementation library:

make dsp

The expected artifacts after compiling for DSP are

• The UDO DSP implementation library: <UDO-Package>/libs/dsp/libUdo<UDO-Package>ImplDsp.so
• The UDO package registration library: <UDO-Package>/libs/arm64-v8a/libUdo<UDO-Package>Reg.so

Note: The command must be run from the package root.

# Table of Make Targets

Make Target Runtime Platform Misc.
all CPU, GPU, DSP x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
all_x86 CPU x86
all_android CPU, GPU, DSP Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
reg - x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
reg_x86 - x86
reg_android - Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
cpu CPU x86, Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
cpu_x86 CPU x86 Same as all_x86
cpu_android CPU Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
gpu GPU Specified by PLATFORM (default arm64-v8a and armeabi-v7a)
gpu_android GPU Specified by PLATFORM (default arm64-v8a and armeabi-v7a) Same as gpu
dsp DSP -
dsp_android DSP - Same as dsp

Note: By default compiling for a runtime additionally compiles the corresponding registration library