AI hardware cores/accelerators

Qualcomm® Innovators Development Kit supports running the AI/ML models on the following three hardware Cores/Accelerators:

Qualcomm® Hexagon™ Tensor Processor (HTP) - The Qualcomm HTP is an AI Accelerator that is suited for running computationally intensive AI workloads. To get improved performance and run AI/ML model on HTP, model must be quantized to one of the supported precisions:INT4, INT8, INT16 or FP16.
Qualcomm® Adreno™ GPU The GPUs can be used to run unquantized FP32/FP16 models with a higher throughput compared to CPU. The GPU can also be used for running UDOs implemented using OpenCL.
Qualcomm® Kryo™ CPU The CPU supports unquantized models with FP32 precision. CPUs can be used to run the UDOs or ops which are not optimized for execution on HTP. It can also be used for model benchmarking purposes. Below table summarizing the properties of hardware/accelerators available on Snapdragon for executing AI/ML Models:

Accelerator	Supported Data Types	Quantization/ Activation	Power	Throughput	Features
HTP	INT4, INT8, INT16, FP16	Needed	Low	High	Dedicated for AI applications. HW Accelerated Convolution Engine
GPU	INT8, INT16, FP16, FP32	Not needed	Medium	Medium	Suitable for use cases which require high accuracy but low usage. Can run unquantized models with FP32/FP16 precision. UDOs written in OpenCL can be compiled for GPU
CPU	INT8, INT16, FP16, FP32	Not needed	High	Low	Reference for accuracy verification and debugging. Used for quantization Process. Can be used to run ops which are not supported on HTP, and UDOs implemented in languages like C/C++, JAVA.

Datatype	Details
INT4	4-bit weights + 8-bit activations
INT8	8-bit weights + 8-bit activations
INT16	8-bit weights + 16-bit activations
FP16	16-bit floating point precession
FP32	32-bot floating point precision

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.