Prerequisites

The SNPE SDK has been set up following the SNPE Setup chapter.
The Tutorials Setup has been completed.

Introduction

This tutorial demonstrates how to build a C++ sample application that can execute neural network models on the PC or target device.

Note: While this sample code does not do any error checking, it is strongly recommended that users check for errors when using the SNPE APIs.

Most applications will follow the following pattern while using a neural network:

static zdl::DlSystem::Runtime_t runtime = checkRuntime();
std::unique_ptr<zdl::DlContainer::IDlContainer> container = loadContainerFromFile(dlc);
std::unique_ptr<zdl::SNPE::SNPE> snpe = setBuilderOptions(container, runtime, useUserSuppliedBuffers);
std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // ITensor
loadInputUserBuffer(applicationInputBuffers, snpe, fileLine); // User Buffer
executeNetwork (snpe , inputTensor, OutputDir, inputListNum); // ITensor
executeNetwork(snpe, inputMap, outputMap, applicationOutputBuffers, OutputDir, inputListNum); // User Buffer

The sections below describe how to implement each step described above. For more details, please refer to the collection of source code files located at $SNPE_ROOT/examples/NativeCpp/SampleCode/jni.

Get Available Runtime

The code excerpt below illustrates how to check if a specific runtime is available using the native APIs (the GPU runtime is used as an example).

zdl::DlSystem::Runtime_t checkRuntime()
{
    static zdl::DlSystem::Version_t Version = zdl::SNPE::SNPEFactory::getLibraryVersion();
    static zdl::DlSystem::Runtime_t Runtime;
    std::cout << "SNPE Version: " << Version.asString().c_str() << std::endl; //Print Version number
    if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) {
        Runtime = zdl::DlSystem::Runtime_t::GPU;
    } else {
        Runtime = zdl::DlSystem::Runtime_t::CPU;
    }
    return Runtime;
}

Load Network

The code excerpt below illustrates how to load a network from the SNPE container file (DLC).

std::unique_ptr<zdl::DlContainer::IDlContainer> loadContainerFromFile(std::string containerPath)
{
    std::unique_ptr<zdl::DlContainer::IDlContainer> container;
    container = zdl::DlContainer::IDlContainer::open(containerPath);
    return container;
}

Load UDO

The code excerpt below illustrates how to load UDO package(s).

bool loadUDOPackage(const std::string& UdoPackagePath)
{
    std::vector<std::string> udoPkgPathsList;
    split(udoPkgPathsList, UdoPackagePath, ',');
    for (const auto &u : udoPkgPathsList)
    {
       if (false == zdl::SNPE::SNPEFactory::addOpPackage(u))
       {
          std::cerr << "Error while loading UDO package: "<< u << std::endl;
          return false;
       }
    }
    return true;
}

SNPE can execute network with user-defined operations (udo). Please refer to UDO Tutorial to implement an udo.

Then adding "-u" option after snpe-sample to execute.

Set Network Builder Options

The following code demonstrates how to instantiate a SNPE Builder object, which will be used to execute the network with the given parameters.

std::unique_ptr<zdl::SNPE::SNPE> setBuilderOptions(std::unique_ptr<zdl::DlContainer::IDlContainer>& container,
                                                   zdl::DlSystem::RuntimeList runtimeList,
                                                   bool useUserSuppliedBuffers)
{
    std::unique_ptr<zdl::SNPE::SNPE> snpe;
    zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
    snpe = snpeBuilder.setOutputLayers({})
       .setRuntimeProcessorOrder(runtimeList)
       .setUseUserSuppliedBuffers(useUserSuppliedBuffers)
       .build();
    return snpe;
}

Load Network Inputs

Network inputs and outputs can be either user-backed buffers or ITensors (built-in SNPE buffers), but not both. The advantage of using user-backed buffers is that it eliminates an extra copy from user buffers to create ITensors. Both methods of loading network inputs are shown below.

Using User Buffers

SNPE can create its network inputs and outputs from user-backed buffers. Note that SNPE expects the values of the buffers to be present and valid during the duration of its execution.

Here is a function for creating a SNPE UserBuffer from a user-backed buffer and storing it in a zdl::DlSystem::UserBufferMap. These maps are a convenient collection of all input or output user buffers that can be passed to SNPE to execute the network.

Disclaimer: The strides of the buffer should already be known by the user and should not be calculated as shown below. The calculation shown is solely used for executing the example code.

void createUserBuffer(zdl::DlSystem::UserBufferMap& userBufferMap,
                      std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers,
                      std::vector<std::unique_ptr<zdl::DlSystem::IUserBuffer>>& snpeUserBackedBuffers,
                      std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                      const char * name)
{
   // get attributes of buffer by name
   auto bufferAttributesOpt = snpe->getInputOutputBufferAttributes(name);
   if (!bufferAttributesOpt) throw std::runtime_error(std::string("Error obtaining attributes for input tensor ") + name);
   // calculate the size of buffer required by the input tensor
   const zdl::DlSystem::TensorShape& bufferShape = (*bufferAttributesOpt)->getDims();
   // Calculate the stride based on buffer strides, assuming tightly packed.
   // Note: Strides = Number of bytes to advance to the next element in each dimension.
   // For example, if a float tensor of dimension 2x4x3 is tightly packed in a buffer of 96 bytes, then the strides would be (48,12,4)
   // Note: Buffer stride is usually known and does not need to be calculated.
   std::vector<size_t> strides(bufferShape.rank());
   strides[strides.size() - 1] = sizeof(float);
   size_t stride = strides[strides.size() - 1];
   for (size_t i = bufferShape.rank() - 1; i > 0; i--)
   {
      stride *= bufferShape[i];
      strides[i-1] = stride;
   }
   const size_t bufferElementSize = (*bufferAttributesOpt)->getElementSize();
   size_t bufSize = calcSizeFromDims(bufferShape.getDimensions(), bufferShape.rank(), bufferElementSize);
   // set the buffer encoding type
   zdl::DlSystem::UserBufferEncodingFloat userBufferEncodingFloat;
   // create user-backed storage to load input data onto it
   applicationBuffers.emplace(name, std::vector<uint8_t>(bufSize));
   // create SNPE user buffer from the user-backed buffer
   zdl::DlSystem::IUserBufferFactory& ubFactory = zdl::SNPE::SNPEFactory::getUserBufferFactory();
   snpeUserBackedBuffers.push_back(ubFactory.createUserBuffer(applicationBuffers.at(name).data(),
                                                              bufSize,
                                                              strides,
                                                              &userBufferEncodingFloat));
   // add the user-backed buffer to the inputMap, which is later on fed to the network for execution
   userBufferMap.add(name, snpeUserBackedBuffers.back().get());
}

The following function then shows how to load input data from file(s) to user buffers. Note that the input values are simply loaded onto user-backed buffers, on top of which SNPE can create SNPE UserBuffers, as shown above.

void loadInputUserBuffer(std::unordered_map<std::string, std::vector<uint8_t>>& applicationBuffers,
                               std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                               const std::string& fileLine)
{
    // get input tensor names of the network that need to be populated
    const auto& inputNamesOpt = snpe->getInputTensorNames();
    if (!inputNamesOpt) throw std::runtime_error("Error obtaining input tensor names");
    const zdl::DlSystem::StringList& inputNames = *inputNamesOpt;
    assert(inputNames.size() > 0);
    // treat each line as a space-separated list of input files
    std::vector<std::string> filePaths;
    split(filePaths, fileLine, ' ');
    if (inputNames.size()) std::cout << "Processing DNN Input: " << std::endl;
    for (size_t i = 0; i < inputNames.size(); i++) {
        const char* name = inputNames.at(i);
        std::string filePath(filePaths[i]);
        // print out which file is being processed
        std::cout << "\t" << i + 1 << ") " << filePath << std::endl;
        // load file content onto application storage buffer,
        // on top of which, SNPE has created a user buffer
        loadByteDataFile(filePath, applicationBuffers.at(name));
    };
}

Using ITensors

std::unique_ptr<zdl::DlSystem::ITensor> loadInputTensor (std::unique_ptr<zdl::SNPE::SNPE> & snpe , std::string& fileLine)
{
    std::unique_ptr<zdl::DlSystem::ITensor> input;
    const auto &strList_opt = snpe->getInputTensorNames();
    if (!strList_opt) throw std::runtime_error("Error obtaining Input tensor names");
    const auto &strList = *strList_opt;
    // Make sure the network requires only a single input
    assert (strList.size() == 1);
    // If the network has a single input, each line represents the input file to be loaded for that input
    std::string filePath(fileLine);
    std::cout << "Processing DNN Input: " << filePath << "\n";
    std::vector<float> inputVec = loadFloatDataFile(filePath);
    /* Create an input tensor that is correctly sized to hold the input of the network. Dimensions that have no fixed size will be represented with a value of 0. */
    const auto &inputDims_opt = snpe->getInputDimensions(strList.at(0));
    const auto &inputShape = *inputDims_opt;
    /* Calculate the total number of elements that can be stored in the tensor so that we can check that the input contains the expected number of elements.
       With the input dimensions computed create a tensor to convey the input into the network. */
    input = zdl::SNPE::SNPEFactory::getTensorFactory().createTensor(inputShape);
    /* Copy the loaded input file contents into the networks input tensor.SNPE's ITensor supports C++ STL functions like std::copy() */
    std::copy(inputVec.begin(), inputVec.end(), input->begin());
    return input;
}

Execute the Network & Process Output

The following snippets of code use the native API to execute the network (in UserBuffer or ITensor mode) and show how to iterate through the newly populated output tensor.

Using User Buffers

void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                    zdl::DlSystem::UserBufferMap& inputMap,
                    zdl::DlSystem::UserBufferMap& outputMap,
                    std::unordered_map<std::string,std::vector<uint8_t>>& applicationOutputBuffers,
                    const std::string& outputDir,
                    int num)
{
    // Execute the network and store the outputs in user buffers specified in outputMap
    snpe->execute(inputMap, outputMap);
    // Get all output buffer names from the network
    const zdl::DlSystem::StringList& outputBufferNames = outputMap.getUserBufferNames();
    // Iterate through output buffers and print each output to a raw file
    std::for_each(outputBufferNames.begin(), outputBufferNames.end(), [&](const char* name)
    {
       std::ostringstream path;
       path << outputDir << "/Result_" << num << "/" << name << ".raw";
       SaveUserBuffer(path.str(), applicationOutputBuffers.at(name));
    });
}
// The following is a partial snippet of the function
void SaveUserBuffer(const std::string& path, const std::vector<uint8_t>& buffer) {
   ...
   std::ofstream os(path, std::ofstream::binary);
   if (!os)
   {
      std::cerr << "Failed to open output file for writing: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
   if (!os.write((char*)(buffer.data()), buffer.size()))
   {
      std::cerr << "Failed to write data to: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
}

Using ITensors

void executeNetwork(std::unique_ptr<zdl::SNPE::SNPE>& snpe,
                    std::unique_ptr<zdl::DlSystem::ITensor>& input,
                    std::string OutputDir,
                    int num)
{
    //Execute the network and store the outputs that were specified when creating the network in a TensorMap
    static zdl::DlSystem::TensorMap outputTensorMap;
    snpe->execute(input.get(), outputTensorMap);
    zdl::DlSystem::StringList tensorNames = outputTensorMap.getTensorNames();
    //Iterate through the output Tensor map, and print each output layer name
    std::for_each( tensorNames.begin(), tensorNames.end(), [&](const char* name)
    {
        std::ostringstream path;
        path << OutputDir << "/"
        << "Result_" << num << "/"
        << name << ".raw";
        auto tensorPtr = outputTensorMap.getTensor(name);
        SaveITensor(path.str(), tensorPtr);
    });
}
// The following is a partial snippet of the function
void SaveITensor(const std::string& path, const zdl::DlSystem::ITensor* tensor)
{
   ...
   std::ofstream os(path, std::ofstream::binary);
   if (!os)
   {
      std::cerr << "Failed to open output file for writing: " << path << "\n";
      std::exit(EXIT_FAILURE);
   }
   for ( auto it = tensor->cbegin(); it != tensor->cend(); ++it )
   {
      float f = *it;
      if (!os.write(reinterpret_cast<char*>(&f), sizeof(float)))
      {
         std::cerr << "Failed to write data to: " << path << "\n";
         std::exit(EXIT_FAILURE);
      }
   }
}

Using IOBufferDataTypeMap

The IOBufferDataTypeMap is used to specify the intended data type for input/output of a network. The data type values include zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32, zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_8 and zdl::DlSystem::IOBufferDataType_t::FIXED_POINT_16.
If the output of a network is of type FIXED_POINT_8, however the user intends to access the output in FLOATING_POINT_32 format. The dequantization operation is performed on the arm end. By specifying the data type as FLOATING_POINT_32 using the IOBufferDataTypeMap API, the dequantization operation is added directly to the graph.

The Following snippet of code shows how to specify the data type for a buffer using the native API.

void setBufferDataType(zdl::DlSystem::IOBufferDataTypeMap& bufferDataTypeMap, std::string bufferName, zdl::DlSystem::IOBufferDataType_t dataType)
{
    bufferDataTypeMap.add(bufferName.c_str(), dataType);
}
setBufferDataType(bufferDataTypeMap, "output_1", zdl::DlSystem::IOBufferDataType_t::FLOATING_POINT_32);
zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
snpeBuilder.setBufferDataType(bufferDataTypeMap);

Building the C++ Application

Building and Running on x86 Linux and Embedded Linux

Start by going to the snpe-sample base directory.

cd $SNPE_ROOT/examples/NativeCpp/SampleCode

Note the different makefiles associated with the different Linux platform. Note that the $CXX would need to be set according to the target platform. Here is a table of the supported targets, and their corresponding settings for $CXX and the Makefiles to use.

Target	Makefile	Possible CXX value	Output Location
arm-oe-linux (gcc 6.4hf)	Makefile.arm-oe-linux-gcc6.4hf	arm-oe-linux-g++	arm-oe-linux-gcc6.4hf
aarch64-oe-linux (gcc 6.4)	Makefile.aarch64-oe-linux-gcc6.4	aarch64-oe-linux-g++	aarch64-oe-linux-gcc6.4
x86_64-linux	Makefile.x86_64-linux-clang	g++	x86_64-linux-clang

export CXX=<Name of c++ cross compiler>
make -f <Makefile for the target>

Note: Ensure that the path to the compiler binary is already set in $PATH.

Along with the sample executable, all other libraries need to be pushed onto their respective targets. The $LD_LIBRARY_PATH may also need to be updated to point to the support libraries. You can run the executable with -h argument to see its description.

snpe-sample -h

The description should look like the following:

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C++ API.


REQUIRED ARGUMENTS:
-------------------
  -d  <FILE>   Path to the DL container containing the network.
  -i  <FILE>   Path to a file listing the inputs for the network.
  -o  <PATH>   Path to directory to store output results.

OPTIONAL ARGUMENTS:
-------------------
  -b  <TYPE>    Type of buffers to use [USERBUFFER, ITENSOR] (default is USERBUFFER).
  -u  <VAL,VAL> Path to UDO package with registration library for UDOs.
                Optionally, user can provide multiple packages as a comma-separated list.

Running the snpe-sample assumes one of the examples Running the AlexNet Model or Running the Inception v3 Model has been previously setup.

Run snpe-sample with the AlexNet model:

cd $SNPE_ROOT/models/alexnet/data
$SNPE_ROOT/examples/NativeCpp/SampleCode/obj/local/x86_64-linux-clang/snpe-sample -b ITENSOR -d ../dlc/bvlc_alexnet.dlc -i target_raw_list.txt -o output

The results are stored in the output directory. To process the output run the following script to generate the classifiscation results.

python $SNPE_ROOT/models/alexnet/scripts/show_alexnet_classifications.py -i target_raw_list.txt -o output/ -l ilsvrc_2012_labels.txt
Classification results
cropped/trash_bin.raw     0.949348 412 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin
cropped/chairs.raw        0.365685 831 studio couch, day bed
cropped/plastic_cup.raw   0.749103 647 measuring cup
cropped/notice_sign.raw   0.722709 458 brass, memorial tablet, plaque
cropped/handicap_sign.raw 0.188248 919 street sign

Building and Running on ARM Android

Prerequisite: You will need the Android NDK to build the Android C++ executable. The tutorial assumes that you can invoke 'ndk-build' from the shell.

First move to snpe-sample's base directory.

cd $SNPE_ROOT/examples/NativeCpp/SampleCode

To build snpe-sample with clang/libc++ SNPE binaries (i.e., arm-android-clang6.0 and aarch64-android-clang6.0), use the following command:

cd $SNPE_ROOT/examples/NativeCpp/SampleCode
ndk-build NDK_TOOLCHAIN_VERSION=clang APP_STL=c++_shared

The ndk-build command will build both armeabi-v7a and arm64-v8a binaries of snpe-sample.

$SNPE_ROOT/examples/NativeCpp/SampleCode/obj/local/armeabi-v7a/snpe-sample
$SNPE_ROOT/examples/NativeCpp/SampleCode/obj/local/arm64-v8a/snpe-sample

To run the Android C++ executable, push the appropriate SNPE libraries and the executable onto the Android target.

export SNPE_TARGET_ARCH=arm-android-clang6.0
export SNPE_TARGET_ARCH_OBJ_DIR=armeabi-v7a
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/ /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
adb push $SNPE_ROOT/lib/dsp/ /data/local/tmp/snpeexample/dsp/lib
adb push $SNPE_ROOT/examples/NativeCpp/SampleCode/obj/local/$SNPE_TARGET_ARCH_OBJ_DIR/snpe-sample /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin

Run snpe-sample with the Alexnet model on the target. This assumes that you have done the setup steps for running Run on Android Target to push to the target all the sample data files and Alexnet model.

adb shell
export SNPE_TARGET_ARCH=arm-android-clang6.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
cd /data/local/tmp/alexnet
snpe-sample -b ITENSOR -d bvlc_alexnet.dlc -i target_raw_list.txt -o output_sample
exit

Pull the target output into a host side output directory.

cd $SNPE_ROOT/models/alexnet/
adb pull /data/local/tmp/alexnet/output_sample output_sample

Again, we can run the interpret script to see the classification results.

python $SNPE_ROOT/models/alexnet/scripts/show_alexnet_classifications.py -i data/target_raw_list.txt -o output_sample/ -l data/ilsvrc_2012_labels.txt
Classification results
cropped/trash_bin.raw     0.949348 412 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin
cropped/chairs.raw        0.365685 831 studio couch, day bed
cropped/plastic_cup.raw   0.749103 647 measuring cup
cropped/notice_sign.raw   0.722709 458 brass, memorial tablet, plaque
cropped/handicap_sign.raw 0.188248 919 street sign

Similar example results can also be used using the Inception v3 model from Running the Inception v3 Model.