AI Resources Overview
Introduction
Below are a few terms and concepts that you may come across while using Artificial Intelligence tools and resources.
Deep Learning Container (DLC) File
The latest Snapdragon® mobile platform supports various ML Frameworks like TensorFlow, TFLite, Pytorch, ONNX, Caffe, Caffe2, etc. In order to support all these frameworks on the hardware, the model files like .pb, .onnx, .pth, etc. should be converted to .dlc files using the conversion tools shared along with the SDKs. Many layer level optimizations are also done during the conversion process so that the models can run efficiently on our platforms.
Quantization
Quantization techniques are applied on the trained model to reduce the size of the model along with improving the performance of the model. The DLC Files are converted from FP32 precision to lower precisions like INT4, INT8,FP16 etc. This is designed to reduce the size of the model and is also faster to execute. Static quantization of weights, biases, and activations are done with support for asymmetric dynamic range and arbitrary step size. Quantization is necessary for running the model on the AI Accelerator. The tools for this are shared along with the SDKs.
User Defined Operations (UDO)
Developers can define their own Ops as well and compile them to run on CPU/GPU/AI Accelerator (HTP). Developers can use a language like C/C++ for developing an Op for CPU. For GPU, OpenCL can be used. For HTP, the Assembly instructions of Hexagon DSP are used. There could potentially be a context switch between Accelerator Core and CPU/GPU while executing the UDO, so developers should choose the compilation target of UDO as per their performance requirements. More details can be found here.
Operations (Ops)
Ops are nodes in the Graph of a ML Model. Examples of Ops are Argmax, Conv2d, etc.
AI Hardware Cores/Accelerator overview
Qualcomm Innovators Development Kit supports running the AI/ML models on the following three hardware Cores/Accelerators:
- Qualcomm® Hexagon™ Tensor Processor (HTP)
The HTP is an AI Accelerator that is suited for running computationally intensive AI workloads. To get improved performance and run AI/ML model on HTP, model must be quantized to one of the supported precisions:INT4, INT8, INT16 or FP16. - Qualcomm® Adreno™ GPU
The GPUs can be used to run unquantized FP32/FP16 models with a higher throughput compared to CPU. The GPU can also be used for running UDOs implemented using OpenCL. - Qualcomm® Kryo™ CPU
The CPU supports unquantized models with FP32 precision. CPUs can be used to run the UDOs or ops which are not optimized for execution on HTP. It can also be used for model benchmarking purposes.
Below table summarizing the properties of hardware/accelerators available on Snapdragon for executing AI/ML Models:
Accelerator | Supported Data Types | Quantization/Activation | Power | Throughput | Features |
---|---|---|---|---|---|
HTP | INT4, INT8, INT16, FP16 | Needed | Low | High |
|
GPU | INT8, INT16, FP16, FP32 | Not Needed | Medium | Medium |
|
CPU | INT8, INT16, FP16, FP32 | Not Needed | High | Low |
|
Table-1: Guide to select HW accelerator
AI Software Accelerator Framework
Qualcomm Technologies, Inc. provides the following SDKs and tools to support hardware acceleration:
- Qualcomm Neural Processing SDK
- AI Model Efficiency Toolkit (AIMET)
Qualcomm Neural Processing SDK:
The Qualcomm Neural Processing SDK is designed to quickly allow developers to integrate AI/ML models to their Android apps. This is done by abstracting hardware complexities, providing the advantage of fast, portable AI application development.
Qualcomm Neural Processing SDK can be used for the following purposes:
- Convert Caffe, Caffe2, TensorFlow, PyTorch and TFLite models to a Deep Learning Container (DLC) file
- Quantize DLC files to 8bit/16bit fixed point for execution on the Hexagon Tensor Processor
- Integrate a network into Android apps via C++ or Java
- Execute the network on the Kryo CPU, the Adreno GPU, or the HTP
- Debug and analyze the performance of the ML model
Below figure shows a typical workflow for AI development, model training is performed on any popular deep learning framework that is supported by the SDK. After training is complete the trained model is converted into a DLC file that can be loaded into the SDK runtime that runs on the target device.

AIMET - AI Model Efficiency Toolkit
AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models.
It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy.
AIMET is designed to work with PyTorch and TensorFlow models.
Qualcomm Innovation Center also maintains an open-source repository called the AIMET Model Zoo at https://github.com/quic/aimet-model-zoo
AIMET Model Zoo is a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET.
For more details on AIMET, please visit: https://github.com/quic/aimet
For more details of AIMET model zoo, please visit: https://github.com/quic/aimet-model-zoo
Snapdragon, Qualcomm Kryo, Qualcomm Adreno, Qualcomm Hexagon, Qualcomm Neural Processing SDK, and Qualcomm Innovators Development Kit are products of Qualcomm Technologies, Inc. and/or its subsidiaries. AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc.