AI hardware cores/accelerators

Qualcomm® Innovators Development Kit supports running the AI/ML models on the following three hardware Cores/Accelerators:

  • Qualcomm® Hexagon™ Tensor Processor (HTP) - The Qualcomm HTP is an AI Accelerator that is suited for running computationally intensive AI workloads. To get improved performance and run AI/ML model on HTP, model must be quantized to one of the supported precisions:INT4, INT8, INT16 or FP16.
  • Qualcomm® Adreno™ GPU The GPUs can be used to run unquantized FP32/FP16 models with a higher throughput compared to CPU. The GPU can also be used for running UDOs implemented using OpenCL.
  • Qualcomm® Kryo™ CPU The CPU supports unquantized models with FP32 precision. CPUs can be used to run the UDOs or ops which are not optimized for execution on HTP. It can also be used for model benchmarking purposes. Below table summarizing the properties of hardware/accelerators available on Snapdragon for executing AI/ML Models:
AcceleratorSupported Data TypesQuantization/ ActivationPowerThroughputFeatures
HTPINT4, INT8, INT16, FP16NeededLowHigh
  • Dedicated for AI applications.
  • HW Accelerated Convolution Engine
GPUINT8, INT16, FP16, FP32Not neededMediumMedium
  • Suitable for use cases which require high accuracy but low usage.
  • Can run unquantized models with FP32/FP16 precision.
  • UDOs written in OpenCL can be compiled for GPU
CPUINT8, INT16, FP16, FP32Not neededHighLow
  • Reference for accuracy verification and debugging.
  • Used for quantization Process.
  • Can be used to run ops which are not supported on HTP, and UDOs implemented in languages like C/C++, JAVA.

Datatype Details
INT4 4-bit weights + 8-bit activations
INT8 8-bit weights + 8-bit activations
INT16 8-bit weights + 16-bit activations
FP16 16-bit floating point precession
FP32 32-bot floating point precision

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.