Qualcomm Neural Processing SDK for AI

The Qualcomm Neural Processing SDK for AI (also formally known as the Snapdragon Neural Processing Engine (SNPE)) is a software accelerated, inference-only runtime engine for the execution of deep neural networks. With the SDK, users can:

  • Execute an arbitrarily deep neural network
  • Execute the network on the Kryo CPU, Adreno GPU, or Hexagon DSP with the Hexagon Tensor Accelerator (HTA)
  • Debug network execution on x86 Ubuntu Linux
  • Convert Caffe, Caffe2, ONNX, and TensorFlow models to a Deep Learning Container (DLC) file
  • Quantize DLC files to 8-bit fixed point for running on the Hexagon DSP
  • Debug and analyze the performance of the network with Qualcomm Neural Processing SDK tools
  • Integrate a network into applications and other code via C++ or Java

A generic workflow involving Qualcomm Neural Processing SDK runtime engine is shown in the diagram. In this workflow:

  1. A developer/data scientist develops a model and trains it to meet requirements (training phase).
  2. Once a model is trained and frozen, this model with static weights and biases is converted to Qualcomm Neural Processing SDK native format known as (DLC).
  3. Once a model file DLC is generated, the optional offline optimization tools can be used to optimize it. These optimizations may include quantization and compression techniques.
  4. A developer writes an ML application (Qualcomm Neural Processing SDK-enabled app) using Qualcomm Neural Processing SDK C++/Java API or uses the GStreamer plugin (qtimlesnpe) to quickly execute the converted model.
  5. The model performance and accuracy may need to be debugged.  Debug tools are provided as part of Qualcomm Neural Processing SDK.

A complete reference guide to Qualcomm Neural Processing SDK and its developmental workflow can be found on the QTI website.

Developers can quickly integrate their ML model by providing their converted DLC model to the available GStreamer plugins. The GStreamer plugin does the following functions directly:
  • Loads the .dlc file
  • Preprocessing and postprocessing of video frames
  • onfigure Qualcomm Neural Processing SDK to run on DSP, CPU, GPU, or HTA
  • Abstraction from Qualcomm Neural Processing SDK

Upon GStreamer launch, inference frames from either a camera source (YUV) or a file source are delivered to the GST-SNPE sink (GStreamer-SNPE plugin) along with a model DLC (generated by following the Qualcomm Neural Processing SDK workflow). The GStreamer SNPE Sink will in turn use Qualcomm Neural Processing SDK runtime to offload model computation to the requested runtime (DSP, GPU, or CPU). Inference results are gathered back in the GST-SNPE sink for postprocessing (for example, overlaying bounding boxes and class IDs on a detected object in frame if the model is an object detection model).The following example shows nthe use of inferencing via a live camera 1080P stream with a bounding box overlay applied inline on a YUV stream. The render is on weston display. Hence, the set of XDG_RUNTIME_DIR. Push the corresponding labels and models in the folders you refer to in the labels and config variables respectively.

export XDG_RUNTIME_DIR=/usr/bin/weston_socket && gst-launch-1.0 qtiqmmfsrc ! video/x-raw, format=NV12, width=1280, height=720, framerate=30/1, camera=0 ! qtimlesnpe config=/data/misc/camera/mle_snpe.config postprocessing=detection ! queue ! qtioverlay ! waylandsink x=960 y=0 width=960 height=540 async=true sync=false enable-last-sample=false