Snapdragon Neural Processing Engine SDK
Reference Guide
AIP Runtime

SNPE
Overview

The AIP (AI Processor) Runtime is a software abstraction of Q6, HVX and HTA into a single entity (AIP) for the execution of a model across all three.
A user, who loads a model into Snapdragon NPE and selects the AIP runtime as a target, will have parts of the model running on HTA, and parts on HVX, orchestrated by the Q6.
Note: In order to execute parts of the model on HTA, the model needs to be analyzed offline, and binaries for the relevant parts need to be embedded into the DLC. See Adding HTA sections for details.

Snapdragon NPE loads a library on the DSP which communicates with the AIP Runtime.
This DSP library contains an executor (which manages the execution of models across HTA & HVX), the HTA Driver for running subnets on HTA and Hexagon NN for running subnets using HVX.

The executor uses a model description which also contains partitioning information - description of which parts of the model will run on HTA, and which on HVX. The partitioned parts are referred below as "subnets".

The DSP executor executes the subnets on respective cores and coordinates buffer exchanges and format conversions as necessary to return proper outputs to the Snapdragon runtime running on the ARM CPU (including dequantization if needed).


Model execution on AIP Runtime

Let us use an illustrative example of the following model that is embedded inside a DL Container created by one of the Snapdragon NPE snpe-framework-to-dlc conversion tools.

  • The circles represent operations in the model
  • The Rectangles represent Layers which contain and implement these operations

SNPE


The top-level Snapdragon NPE runtime breaks down the execution of a model to subnets that will run on different cores based on the layer affinity.

One of the cases may be when the entire network is executed using the AIP runtime altogether, as shown below:

SNPE


Alternatively the Snapdragon NPE runtime may create multiple partitions - a few of which are executed on the AIP Runtime and the rest to fall back to the CPU Runtime as shown below:

SNPE


The Snapdragon NPE runtime will automatically add a CPU runtime to execute the rest of the section identified to fall back to the CPU.

Let's examine the AIP runtime execution more closely using the example above where the entire model is executed using AIP as a reference, for simplicity.

The AIP Runtime further decomposes the AIP subnet into the following:

  • HTA subnets: parts of the subnet which were compiled by the HTA Compiler, and whose metadata generated by the HTA compiler appears in the HTA sections of the DLC.
  • HNN subnets: The rest of the subnet which can run on the DSP using Hexagon NN library, whose metadata appears in the HVX sections of the DLC.

Several possible combinations can arise from partitioning within the AIP runtime. Here are some representative cases:

A. The AIP subnet can run completely on HTA

SNPE


In this case, the entire AIP subnet is compatible with HTA.
When loading the DLC to Snapdragon NPE and selecting AIP runtime, the runtime identifies that there is an HTA section with a single HTA subnet that equals the entire AIP subnet.

B. Part of the AIP subnet can run on HTA, and the rest can run on HNN

SNPE


There may be cases when the entire AIP subnet cannot be processed on HTA. In such cases the HTA compiler generates HTA sections only for a smaller subset of the layers in the network.
Alternatively, users may want to manually partition a network to pick a subnet that they desire to process on the HTA by providing additional options to the snpe-dlc-quantize tool (learn about partitioning a network for HTA on Adding HTA sections).
In both these cases this smaller HTA subnet is successfully processed by the HTA compiler and the corresponding HTA section for this range is embedded in the DLC.
When loading the DLC to Snapdragon NPE and selecting the AIP runtime, the runtime identifies that there is an HTA section with a single HTA subnet which covers only part of the AIP subnet, and that the rest can run using Hexagon NN.

C. The AIP subnet is broken into multiple partitions

SNPE


As an extension to the previous case it may turn out that the offline HTA compiler can only process certain portions of the identified AIP subnet, leaving the rest of the sections to be covered by multiple HNN subnets as shown above. Alternatively users may want to manually partition a network into several HTA subnets by providing additional options to the snpe-dlc-quantize tool (learn about partitioning a network for HTA on Adding HTA sections).