Snapdragon Neural Processing Engine SDK
Reference Guide
Revision History

Version Date Description
2.10.0 April 2023 GPU Runtime: Support Pack operation with 1 input.
Core: Updated C API documentation for ITensor/Userbuffer creation indicating data size.
Core: setLogLevel() API hooked up to the runtimes for updating logging level after creating logger handle.
Tools: snpe-throughput-net-run now supports –userbuffer_auto option (similar to snpe-net-run) for automatic IO tensor data type detection.
Tools: Converters: Added a new optimization sequence to squash BatchNorm into FullyConnected.
HTP: Fixed issue with ElementwiseSin.
Tools: Fix the converter issue for GRU op.
SNPE AIP: Fixed perf profile setting for multithread scenario.
2.9.0 March 2023 Core: Added new C API Snpe_SNPE_GetInputDimensionsOfFirstTensor() to facilitate retrieving Input dimension without Input tensor name.
Tools: ONNX converter: Added support for NonMaxSuppression op.
Tools: snpe-dlc-graph-prepare fix benign error message during offline prepare for v68 based SoC s (–htp_socs sm8350, sm7350 etc)
2.8.0 February 2023 Tools: Converters: Onnx: Added support for Sign.
HTP: solve vtcm overflow issue happened when change data layout: from uint8 flat to uint8 crouton in tcm.
Tool:ONNX Converter: Fixed TransposeOp input axis format NT issue.
2.7.0 January 2023 Tools: Converters: Fixed a bug in the optimization that merges Matmul + Reshape + Add to FC Op that would incorrectly insert the FC Op before the Constant Bias Op.
2.6.0 December 2022 Tools: onnx converter: support conv's input data is Initializer.
DSP: Improve execute time of dynamic depthwise convolution with uint8 weights.
Core: Added error handling based on buffer data size in execute().
2.5.0 November 2022 Tools: Added new options for snpe-net-run and snpe-parallel-run –use_native_input_files and –use_native_output_files to support inputs in their native format as opposed to default float32 format.
Tools: Added new flag –userbuffer_auto in snpe-parallel-run to automatically detect and use the right buffer type based on tensor data type in the model.
Documentation: SNPE1 to SNPE2 migration guide is added.
Tools: snpe-throughput-net-run - capturing the status of lost thread in the result summary.
Tools: snpe-dlc-quant: Fixed abnormal DLC size increase when axis quantization is used.
Tools: Tensorflow Converter: Fixed issues with per-channel quantization of weights: set is_symmetric = true by default, added param "axis" and "is_symmetric" into weight encodings info.
HTP: solve vtcm overflow for transposeconv2d layer whose groups > 1, in depth= out depth, padding =0 and groups != in depth.
2.4.1 October 2022 Tools: New tools - snpe-architecture-checker & snpe-quantization-checker are added.
snpe-net-run: Added new flag –userbuffer_auto to automatically detect and use the right buffer type based on tensor data type in the model
SNPE Core: Enabled logging in Op validation.
SDK: Added missing documentation files for snpe-quantization-checker.
GPU Runtime: Improved network initialization time in subsequent runs on GPU when using setInitCacheMode.
Tools: ONNX Converter: fixed issue related to missing Cast operation.
Tools: Missing files for snpe-quantization-checker have been added to the SDK.
Tools: Fixed functional failure for snpe-architecture-checker.
Tools: Quantizer: Improve Error handling to remove 'uncaught exception' errors.
Tools: Fixed bug in snpe-dlc-quantize with option –axis_quant and –enable_htp when multiple socs are passed using –htp_socs.
GPU Runtime: Fixed validation errors for Concat op with large dimensions.
GPU Runtime: Improved accuracy in models having Concat op with large dimensions.
DSP Runtime: Bug fix in running HTP FP16 networks on non fp16 supported SoCs (like sm8350, sm7350)
GPU Runtime: Fixed verifier issue in Softmax2UdoPackage.
GPU Runtime: Improved network initialization time in subsequent runs on GPU when using netrun –storage_dir option.
2.3.1 September 2022 Tools: Converters: Onnx: Added 5D tensor support for PoolMax3d.
Tools: GoogleNAS: Added support for utilizing the GoogleNAS service with SNPE hardware in the loop (HIL).
Tools: Quantizer: Added fix to use default activation bitwidth for static tensors instead of default parameter, except for static tensor that are known to be parameters like convolution weights and bias
SNPE Core: Fix online dequantization of int4 axis quant dlc when ran on CPU/GPU.
SNPE Core: Fixed stability with concurrency use cases.
GPU Runtime: Fixed accuracy issues related to tensor memory optimization.
Tools: Quantizer: Fixed issue observed with int4 weight override support.
2.2.1 August 2022 Core: Added userlogs ( –userlogs=warn) for Op validation failures for both offline
and online prepare thereby making it easier to track fallback.
Core: HTP Offline Cache Blob backward compatibility - Snpe Version check relaxed
from SNPE-2.2.1 onwards.
Tool: Converters: Added DepthToSpace DCR/CRD pattern that matched reshape, transpose, reshape nodes.
Core: Fixed dlc-info to display per axis encoding information for axis_quant dlcs.
Tools: Quantizer: Added support for CLE quantization algorithm.
Core: snpe-dlc-graph-prepare bug fixes-
bound –vtcm_override to the maximum VTCM for each SOC chipset requested instead of a hardcoded 8MB.
limit to 1 cache record per SoC in the dlc.
Core: Fix runtime de-quantization of weights and biases for axis quantized dlcs when
executing in floating point backends (CPU/GPU).
Tool: Onnx Converter: Added axis tracking edge case fixes for Concat and MatMul operations.
Core: Added protection for loading malicious dlc file.
Converter: change the output dims as the node output axis format order.
Core: SNPE::Execute() API updated to validate input/output buffer map size before proceeding.
Core: snape-dlc-quantize - fixed error in handling % in input list.
Tools: snpe-dlc-quantize miscellaneous bug fixes with –output_dlc option.
Tools: Converter: Resolved bug that caused failure to override weight encodings for Conv Ops.
Tools: Quantizer: Fixed issues related to axis quantization when the model contains TransposeConv2D.
Tools: Converter: Fixed bug in elementwise min and max sequence optimization.
2.1.1 July 2022 Core: Re-Enable LSTM support for CPU, GPU (HTP will follow).
DSP Runtime: Implemented rules for coexistence and selection of multiple cache records for HTP based on VTCM size, DSP Architecture, and SoC
Tools: Converter: Added optimization to fold scalar min + max to ReluMinMax.
Tools: Quantizer: Re-enabled support for overriding activation quantization (overriding weight quantization will follow).
Tools: Quantizer: Fixed missing skip_quantization command line argument in the new snpe-dlc-quantize shell script.
Tools: Quantizer: Fixed axis quantization failure.
Tools: Quantizer: Fixed issues with quantizing inputs to the gather op.
Tools: Converter & Quantizer: Update converter and quantizer to persist the command used in the DLC that can be displayed in snpe-dlc-info.
Tools: DLC Viewer: Fixed to support the new DLC Format.
C API: Added new Snpe_DlContainer_OpenBuffer() to support loading a model from a buffer.
Docs: Fixed C API documentation related to creating a User Buffer.
Core: Change default option for SNPEFactory::isRuntimeAvailable() to UNSIGNEDPD_CHECK from NORMAL_CHECK. Note that this also affects the C API.
Core: Re-enable NV21 input processing support.
2.0.1 June 2022 Added support for SM8550.
Added new C API. This API is in addition to the C++ API. Note that the APIs cannot be mixed, all code should use one or the other.
Updated the DLC internal format to use ‘ops’ rather than ‘layers’ to more closely align the graph definition with QNN.
1.64.0 June 2022 Tool: Onnx Converter : Reenabled converter command line input dtype to take precedence over model specified.
GPU: Improved accuracy in deepsort model, Resolved issues with Conv + elu op fusion.
Tools: Quantizer: Fixed issue observed with applying 8-bit overrides using 16-bit default activation quantization encodings.
SNPE Core: Fixed failure to select HTP offline cache for certain multi-subnet network topologies.
1.63.0 May 2022 SNPE Core: Support PRELU bias broadcasting in SNPE.
SNPE Core : snpe-diagview tool updated to display actual units (like cycles) instead of usec by default.
SNPE Core: Open GL buffers supported for GPU backend.
SNPE Core : Fixed Zip utility's std::istream index to internal extensible array to be const for every container(DLC) load.
1.62.0 April 2022 DSP Runtime: Perf improvement for FP16 models on HTP.
Added GatherV2 support for SNPE-QNN-DSP.
Tools: Converters: Added 5D tensor annotations NCDHW and NDHWC support.
Tools: Converters: TF: Fixed issue with translating explicit padding from Conv Op.
Tools: Converters: Onnx: Fixed Onnx Concat axis.
Tools: ONNX Converters: Fixed implementation details for Conv1D and Pool1D Ops.
Tools: Converters: Onnx: Added optimization folding continuous reshapes.
1.61.0 March 2022 Tools: Converters: Onnx: Enabled support to handle custom op inputs correctly when the default values are provided.
Tools: ONNX Converter: Added support to resolve static ONNX Cast operation as Constant.
CPU Runtime: Supported CRD mode for depthtospace(pixelshuffle).
Improved performance of loading DLC from a memory buffer.
Fixed scale calculation for ONNX Resize Operator for align_corner mode.
Also overrides Resize input axis format as per source axis order.
1.60.0 February 2022 Tools: Converter: Added ONNX Gemm transA and transB support.
Native sample code is updated to take static quantization parameters for quantized input buffers.
libSNPE.so, libcalculator.so, libplatformValidatorShared.so, libnpe_dsp_domains_v2.so - libraries generated with gcc8.2 and gcc9.3 toolchain - are now compiled with additional read-only relocation compiler flags.
HTP: Fixed issue with Cast op usage in certain configurations.
ONNX Converter: Improvements to handle different input axis layouts.
1.59.0 January 2022 DSP Runtime : Added support for edge padding from SNPE side.
Tools: ONNX Converter: Limited support for Expand operator when it can be interpreted as a noop from operator attributes.
Tools: ONNX Converter: Added support for ScatterND.
Tool: Quantizer: Fixed duplicate Convert layer Id issue observed in generated DLC when multiple Convert layers feed into a single layer.
Tool: ONNX Converter: Fixed handling of models with inputs of unknown shape.
Tools: ONNX Converter: Resolves issue where Shape operator translation could fail if the input was part of the initializer list.
1.58.0 December 2021 Tools: Converter: Enabled broadcasting of weights and bias for BatchNorm layer to match channel dimensions.
1.57.0 November 2021 Tool: Onnx Converter: Added support in dry-run mode to handle reporting ops that are not in onnx schema domain.
Tool: Converter: - Updated inaccurate macs/params calculations for Ops per re-analysis.
CPU Runtime: Set the max detections to keep top K for Caffe SSD network.
Tool Converter: Removed obsolete ssd_permute_param parameter in caffe converter permute translation.
SNPE DSP: Fix axis quantization not adding all the fixedPointParam of output to bufferDeltas.
Tool Converter: Fixed coefficient input broadcasting issue for ONNX Prelu operation.
Tool Converter: Fixed axis tracking bug for permute when input is btf format.
1.56.2 October 2021 DSP Runtime: Caffe SSD models can now run fully on HTP, but show some performance issues. Tool: Converter: Added new layernorm sequence for pattern matching and added a constraint to enforce MatMul layer's constant second input to 8-bit tensor in quantized model.
1.55.0 September 2021 Added support for the OneHot operation across the SNPE converters with runtime support available on SNPE CPU.
Tool: ONNX Converter: Added support for LSTM & CRNN.
DSP Runtime: Added support for LSTM.
Tools: Converters: Added support for Caffe Power scale/shift parameters.
SNPE DSP: Fixed the issue of invalid cache record added to DLC while doing offline prepare for HTP.
Tools: Converters: Fixed Softmax and Reduction Ops to have default case for output_buf axis format.
1.54.0 August 2021 Tool: TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs.
Tools: Converters: Adds support for Caffe Reduction layer Sum and Mean Ops.
Tool: Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free.
ONNX Converter: Add support for LSTM & CRNN in converters.
DSP Runtime: Add support for LSTM.
Tool: Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization.
Tool: ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias.
Add support for reverse engineering SAME padding mode from the explicit pad values.
1.53.2 July 2021 Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize.
Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select.
Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes.
Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation.
1.52.0 June 2021 Tools: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output.
Tool: Converter: Added Converter support for Nd Reshape layer.
Tool: Converter: Added CastOp support for TF.
Tool: Converter: Added support for static subgraph resolutions at conversion time.
Tool: Converter: Added support for tensor dtype for TF fill op translation.
SNPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP.
SNPE GPU : Added optimized kernel for ReduceMean Operation.
Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order.
SNPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash.
SNPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP.
1.51.0 May 2021 Tool:Converter: Added supported for Onnx WhereOp.
Added support for edge padding type for pad operation in GPU runtime.
SNPE DSP: Enabled support for ElementWiseUnary abs layer on HTP.
GPU Runtime: Added support for asymmetric reflect padding for pad operation.
UDO: Allow users to specify a different datatype for each core in single config file.
UDO: HTML documentation & sample app is updated to provide example for loading UDO package.
DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios.
SNPE: Init stage is optimized to be done faster.
SNPE DSP: Optimized maxpool with stride 2x1 on HTP.
SNPE DSP: Optimized the big sized concat ops to fit into memory.
SNPE DSP: Optimized the init on HTP.
SNPE DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs.
SNPE DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized.
1.50.0 April 2021 Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification.
DSP Runtime: Improved instance norm op accuracy when input size is big.
DSP Runtime: Enabled edge padding support for v65/v66 targets.
Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly.
1.49.0 March 2021 ONNX Converter: Added support for ONNX 1.6 (Opset 11)
TF Converter: Added support for TF2.3 models.
TFLite Converter: Add initial TFLite converter.
ONNX Converter: Add support for YOLOv2, YOLOv3, tiny-YOLOv3, and YOLOv5.
DSP Runtime: Optimize conversion performance to/from 16-bit quantized values on HTP.
Converters: Improve detection and removal of unconnected nodes.
ONNX Converter: Add support for DETR model.
AIP Runtime: Optimized the input and output data format conversion times for specific depth configurations for models having 16bit activations.
DSP Runtime: Enabled support for Matmul on HTP.
snpe-throughput-net-run: Fix input_list processing when using multiple batches.
TF Converter: Fixed inconsistent network topology flow differing between runs for larger models with forking nodes.
DLC Quantizer: Fixed a race condition that might result in integer overflow.
Android Sample App: Fix to work correctly when multiple models are packaged, with only some requiring UDO.
snpe-diagview: Fixed crash bug when using circumstances AIP runtime networks with UB_FLOAT and UB_TF8 buffer modes with init caching.
DSP Runtime: Additional model support with offline prepare.
1.48.0 February 2021 SDK: Migrated to use Ubuntu 18.04 as the host platform.
SDK: Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9.
SDK: Removed the system variants for the DSP stub libraries.
Tool: Switched diagnostic logging (SNPEDiag.log) to a new file format.
Added static buffer mapping support for frequently used buffers in DSP runtime.
Improved instance norm op accuracy when input size is big in DSP runtime.
Added support for Unsigned PD with the AIP runtime.
Tools: Converters: Fixed a bug which might prevent applying quantization overrides to a model.
Fixed NMS op code in HTP core.
Input node followed by concat node is optimized in HTP.
SNPE DSP: Fixed Unpack Layer indexing error on HTP.
DSP Core: Fixed overflow issue in instance norm op when variance is too small.
1.47.0 January 2021 Added support for TF 1.15 for NonMaxSuppressionV3 translation in TF converter.
Added support for Normalize layer translation in Caffe converter.
1.46.0 December 2020 Improved CDSP power voting by using client specific context id.
SNPE DSP: Improved argmax op performance by optimizing l2 cache prefetch and replacing int to float cast op.
Improved the input/output data conversion times on AIP runtime for specific depth configurations.
Enabled the support for random inputs for networks having more than one input layer in SDK benchmarking scriptsTool: qnn-tensorflow-converter: Removed –allow_unconsumed_nodes option from TF converter as it is now the default.
Enabled elementwise sub and div on HTP.
1.45.0 November 2020 Beginning with SNPE 1.45.0, users must install libc++1-8 using apt-get or other package manager in order to perform offline cache generation for HTP.
Optimized shallow convolution (depth <= 4) in inputsupernode for v66 DSP.
Remapped previous converter translation of Caffe Tile layer as ConcatOp to TileOp in Caffe Converter.
Fixed small accuracy regression on VGG and Flownet models in DSP runtime.
Named the HTA threads created on CDSP appropriately.
Improved logging when libcdsprpc cannot be found.
Fixed the issue with AIP runtime being unavailable on the Android R platforms.
Fixed the issues with HTA metadata generation of conv2d op.
1.44.0 October 2020 Optimized concat performance when input size is very big in DSP runtime.
Optimized slice performance when split 3 channel RGB input in DSP runtime.
Removed support for proposal layer in DSP runtime.
Added SNPE converter support for Softmax axis parameter.
Added support for consuming AIMET/custom quantization encodings to override quantizer generated encodings.
Fixed an issue on graphs where the final node in a graph was an elementwise operation with more than two inputs.
Fixed bug where output_shapes were calculated as float values for DepthToSpace and SpaceToDepth Ops.
Changed ONNX converter to not allow negative or placeholder dimensions.
Fixed potential issues with some models where QAT nodes may not get propagated properly to the final converted model.
1.43.0 September 2020 Improved the input/output conversion times for models having depth as 4 on AIP runtime.
Enabled initial support for constant layers along with elementwise Op on HTA.
Added support for opaque float concat operation in SNPE DSP concat layer.
Added support for Caffe's "Clip" layer in the caffe converter.
Added int16 example to snpe-sample app.
Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime.
Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator.
Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs.
Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.
Fixed the issue with generation of HTA enabled dlc for denoise model.
Fixed the segmentation fault issue during dlc generation with specific inputs on HTA.
Fixed issue with PlatformValidator.hpp reference to non-existent #include.
1.42.2 September 2020 Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.
1.42.0 August 2020 Removed V60 DSP libs from SNPE SDK.
Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler.
Enabled multithread for re-quantize process in DSP runtime.
Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime.
Added support for opaque float concat operation in SNPE DSP concat layer.
Fixed bug in UserBufferTF8 where retrieving the encoding would always return null.
Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime.
Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime.
1.41.0 July 2020 Added MatMul support on the CPU runtime.
Added support for new version of 7250 with integrated PMIC module.
User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively.
Optimized tile Op in DSP runtime, that used 2d memcpy for w-d plane tiling and HVX for tiling along depth.
Fixed stack overflow issue in concat layer in DSP runtime.
Fixed issue with input for multibatch in DSP runtime.
Fixed issue in TF converter that prevented FusedBatchNorm operations from being merged into previous Convolution layer.
Fixed DSP crash issue due to stack overflow Concat layer preparation.
1.40.0 June 2020 Added DSP Graph Caching support for AIP models with HVX subnets.
Upgraded DSP to use Hexagon SDK 3.5.2 toolchain.
Added support for 16 bit UDO layers in DSP.
Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size.
Fixed the issue with buffer ordering during the execution of batched models on AIP runtime.
Fixed issue with SsdDetectionOut when number of classes is only 1.
Fixed accuracy issue with Correlation 1D op.
Fixed improper processing when 16bit input quantization is used in certain cases.
Fixed scaling logic in convert_16 op.
1.39.1 May 2020 Fixed the performance regression of Mobilenet SSD model on AIP runtime.
1.39.0 May 2020 The SNPE license (LICENSE.pdf) has been updated, please review it for more details. Additionally the REDIST.txt has been removed, as redistribution is covered in the license.
Added graph caching support which improves init times for DSP & AIP networks. (DSP subnet with in AIP is not supported)
Optimized Prelu to reduce saturation loss during re-quantization at prelu by using cubic approximation in AIP runtime.
Fixed the input conversions to allocate the required buffers during initialization itself, to improve the inference time for AIP runtime.
Fixed potential bug with freeing threads in DSP runtime.
Added additional logging messages for debugging in DSP runtime.
Added support for the AIP runtime in the SNPE sample "snpe-sample".
Added support for BBox transform layer in Caffe2 converter.
Added new opset support in the ONNX converter: ArgMax, ArgMin, Concat, PRelu, ReduceMean, ReduceMax, ReduceMin, ReduceSum, Squeeze, Unsqueeze, MatMul, Flatten, Max, Split, Clip.
Added support for the fixed-point version of the MobileNetV3 model with H-Swish neuron in TF converter.
Improved support of resizing in Crop layer for TF and Caffe converter by introducing new “counts” parameter.
Fixed issue of incorrect UDO tensor datatype in quantizer.
Fixed issue with setting the performance profile mode for HTA from AIP runtime in multi-threading use cases that could cause performance to drop.
Fixed issue with snpe_bench.py memory profiling.
1.38.0 April 2020 Enabled FC/MatMul to use VTCM if available in DSP.
Optimized 16-bit MeanVarianceNormalize in DSP runtime.
Added support batchwise scalar divide operation in DSP runtime.
Optimized Hard-swish operator for mobilenetV3.
Added support for EltwiseMin layer for ONNX converter and CPU runtime.
Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
ONNX softmax converter support only existed for rank <= 2. Support for tensors rank <= 4 was added.
Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices.
Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
Fixed a bug during graph transformation while folding the batchnorm on HTA.
Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.
1.37.0 March 2020 Enabled the online compiler support for HTA 1.x family of devices.
AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity.
ONNX Converter: Added support for Onnx Pad layer (OpVer 11).
Added support for the h-swish layer used by MobileNet V3.
Removed support for the Generate Proposals, ROI Align, and ROI Proposal layers.
Added improved support for the reporting of Exceptions in the Java API.
Updated the DSP UDO header file to be compatible with SNPE 1.37.0.
The DSP UDO support is updated to be compatible with Hexagon SDK 3.5.1.
The network creation action was moved onto another thread to avoid impacting the affinity for the main thread of the calling program.
Snpe-dlc-info: Fixed issue in MACs calculation error for deconvolution layer.
Avoid crash on SDM845 and other v65 targets when unable to retrieve VTCM memory.
Fixed an issue in the TensorFlow converter where the weights in the Fully Connected layer were incorrectly transposed.
Fixed the support for using DSP UDO with the AIP runtime. Previously, the UDO packages would not be properly loaded in the AIP runtime.
Fixed DiagLog data for a UDO on GPU, where it did not report proper values for start and stop.
Enable support for keras batchnorm with empty mean and variance to a default values.
Fixed a memory leak when using IsRuntimeAvailable() with the VOLATILE_CHECK for the DSP runtime.
1.36.0 February 2020 Added Java API extension to register UDO package with SNPE.
snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable.
Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters.
Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics.
Upgraded DSP to use Hexagon SDK 3.5.1 toolchain.
Extend Platform Validator to detect HTA API version.
Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result.
Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime.
Fixed bug with propagation of model version during conversion.
Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format.
Fixed the issue with allocation of layer descriptor while loading the network on HTA.
1.35.0 January 2020 Introduce the User-Defined Operations (UDO) feature.
Added support for SDM720G/SM7125.
Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16).
Input batching support is added for networks that can run completely on AIP runtime.
Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes.
Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter.
Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so.
Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides.
Fix ssd_detection CDSP crash on DSP runtime.
Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition.
Improved the tiling configuration support for depth wise convolution layer.
1.34.0 January 2020 Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
Add support for the depth2space operation on the GPU.
Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
Truncate detection output on DSP to return valid data only.
Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
Fix support for NV21 encoding in the DSP runtime.
1.33.2 November 2019 Address accuracy issues for Deconvolution in the AIP runtime.
Changed behavior of Crop layer resize, so it retains the number of copied elements on each dimension.
Make quantizer –override_params work for AIP.
Reordered PerformanceProfile_t to be ABI compatible with 1.32.0.
Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
1.33.1 November 2019 Fixed a build issue that incorrectly removed Symphony.
1.33.0 November 2019 New performance modes have been added:

  • LOW_POWER_SAVER: Run in lower clock than POWER_SAVER, at the expense of performance.
  • HIGH_POWER_SAVER: Run in higher clock and provides better performance than POWER_SAVER.
  • LOW_BALANCED: Run in lower balanced mode, provides lower performance than BALANCED.

snpe-dlc-info adds a summary of the layer types in use in the model.
Updated to use new BLAS functionality that leverages OpenMP. This adds a new dependency on the OpenMP shared library for Linux platforms.
Added 32-bit bias support.
Support init caching for SSD output layer on DSP.
Fix memory leak causing increasing init time on DSP.
Add converter support for dilated convolution when used with fakequant nodes.
Multiple bugs fixed in snpe-onnx-to-dlc that were causing errors for models having torch.Mul op.
Extends TF converter support to NMSv1 Op in addition to existing support for v2 and v3 NMS Ops.
Tensorflow conversion bug fixed in infer_shape for StridedSlice Op. output_shape should not be a list of shapes but the shape of the one output.
Fix bug with propagation of model version during conversion.
If burst mode is set, set thread affinity to Big Cores during init and de-init, and restore to the previous setting after the actions are complete.
Fix segfault when using user buffers with a resizable dimension.

1.32.0 Oct 2019 Add Caffe MVN Layer support in the Caffe Converter, CPU Runtime, and DSP Runtime
snpe-dlc-quantize: Enable the use of quantization parameters calculated during training when using dlc quantizer. To override the SNPE generated quantization parameters pass –override_params to snpe-dlc-quantize.
Removed deprecated command line arguments from converters. All three converters now require passing -i/–input_network for model input paths.
snpe-dlc-diff: Added command-line option [–diff_by_id/-i] to snpe-dlc-diff. This option allows users to compare 2 models in order(sorted by id)
Added support for L2Norm layer to TensorFlow converter
Optimized the DSP performance for the 'Space To Depth' layer
Add support in the Java API for setInitCacheEnabled(), and setStorageDirectory() to enable DLC caching support.
Allow graceful recovery after a fastrpc error - Recreate the userPD after the cDSP crashes so that the user can continue on the SNPE process with subsequent instances, instead of having to close the SNPE process. Note: all the instance associated to the previous userPD will be lost.
snpe-dlc-viewer: Associate each layer type to a fixed color for consistency when using snpe-dlc-viewer
Split the SNPE isRuntimeAvailable method into two separate functions to improve backward compatibility with existing client binaries that were built against the older signature.
TF Converter: Fix Elementwise Broadcast support
ONNX Converter: Fixed bug where output dimension was incorrect when keep_dims parameter was set to False for Argmax, ReduceSum and ReduceMax.
ONNX Converter: Fixed bug where pad attribute was not properly parsed for Deconv Op.
Caffe Converter: Fixed bug when converting SSD-based models when using Python 3.
TF Converter: Fixed bug where converter was removing const Op input to reshape op when passed through identity op(s). i.e const-> identity -> reshape.
Fixed bug where getOutputSize() would give the wrong result on output tensors in UserBuffer mode
1.31.0 September 2019 New patterns were added to enable running the CLE algorithm on more op patterns and model architectures.
Added Tensorflow converter support for Caffe-style SSD networks.
Added support for HeatmapMaxKeypoint layer in the CPU runtime.
Added support for ROI Align layer in CPU runtime.
Added initial L2Norm layer support in CPU runtime. No support for axis parameter yet: normalization is performed along the inner-most dimension of the input tensor.
Support for single-input Concatenation layers was added to CPU, GPU and DSP.
Changed determination of number of batch dimensions in the Fully Connected layer so rank greater than 1 is always assumed to mean that there is 1 batch dimension.
Removed constraint on the LSTM layer in the GPU runtime that prevented batch mode operation.
Added support for Leaky-RELU in the TensorFlow converter. Both the actual Leaky-Relu op and the elementwise op representation are supported and map to SNPE's Prelu op.
Added Argmax support to the Caffe converter, and optimized performance on the DSP runtime.
Added new column to snpe-dlc-info that displays the supported runtimes for each layer.
Fixed an edge case where in certain conditions OpenCL would return CL_INVALID_WORK_GROUP_SIZE.
Made isRuntimeAvailable Java API thread-safe.
Replace unstable image from sample Android classifier application data set with an image that is more consistent.
1.30.0 August 2019 Documentation has been added to reflect the new common converter command line options for input processing; Converters now propagate required batchnorm information for performing quantization optimizations; Support for the new bias correction quantization optimization which adjusts biases by analyzing float vs quantized activation errors and adjusting the model to compensate; ONNX converter now filters single input Concats as a no ops as SNPE didn’t support them; Converter input processing now uniformly handles different input types and encodings; ONNX converter now supports the ConvTranspose ‘output_padding’ attribute by adding an additional pad layer after the ConvTranspose op; Integrates the latest flatbuffer 1.11 library which brings speed improvements and options for model size reduction; GPU size limitations with the ArgMax op (when setting the keepDims op attribute to false) can be worked around by enabling CPU fallback; Fixed DSP error with MobileNet SSD on QCS403 and QCS405; Fixed the issue with partitioning of deconv layer in HTA;
1.29.0 July 2019 Added support for dlc reorder tool;Optimization of HTA d32 conversions;Added tf space_to_depth op for SNPE CPU and DSP runtime;Benchmarking scripts enhanced for showing further break down of execution time, across various components;Added support for additional ONNX binary element-wise ops;Optimized deconv layer for improving performance;Fixed an issue related to runtime error in DSP runtime;Performance Optimization of SNPE GPU Runtime for Shufflenet V2 by using profiling level config
1.28.0 June 2019 Added an optional argument to isRuntimeAvailable for the DSP runtime so that it doesn't activate the DSP; Allow UB_T8 and UB_FLOAT output for snpe-net-run; Added a new command line option for snpe-dlc-diff to check layer names; Updated the –dlc argument to –output_path for snpe-caffe-to-dlc to align with the ONNX converter; Added –dry_run argument to snpe-onnx-to-dlc to allow evaluation for successful conversion on an ONNX model; Added support for the gather op in the DSP runtime; Added support to convert the TF MobileNet-V1-FPN-SSD model; Fixed a memory leak in the DSP runtime that is seen when repeatedly loading and unloading a network; Addressed issues on V66 DSPs related to acquiring VTCM memory; Fixed an issue related to multiple inputs for the Caffe converter; Fixed an issue in the TF converter related to element-wise sun and the atrous parameter; Fixed an issue in the TF converter related to tf.crop_and_resize when there are only 2 inputs.; Fixed additional cases of uncaught exceptions with the aarch64-android-clang6.0 platform;
1.27.0 May 2019 Added new APIs support for setting output tensor names to snpeBuilder and to fetch output tensor names for a given output layer name; Improved the peak memory usage with DLC v3 format; Fixed few issues with performance and runtime failures on DSP runtime; Fixed few issues and improved error handling for platform validator; Fixed the issues with Pooling and Instance norm layers of Tensorflow converter; Removed *-android-gcc4.9 platform support. This compiler has been retired for the Android NDK, so all support is transitioning to using Clang for Android; Removed arm-linux-gcc4.8hf platform. The development platform has been retired;
1.26.0 Apr 2019 Added support for the ONNX Gather Op in the ONNX Converter and CPU runtime; Optimized DeConvolution Layer for the DSP runtime; Support for tf.nn.moments in the TF converter, CPU and DSP runtimes; Added TF Reflect Pad support for the DSP runtime; Add symmetric quantizer option in snpe-dlc-quantize; Add support for batch > 1 when using the Scale Layer on the DSP runtime; Updated Platform Validator python script to be OS-independent; Added additional optimizations for HTA input conversion;
1.25.0 Mar 2019 Updated DLC format to improve load time performance and memory consumption. Old DLCs will continue to work as is, but new DLCs generated from 1.25 will use the new format; Added support for optimized; MultiClassNms and ArgMax ops on DSP runtime; Added option to request larger memory allocations on the DSP for improved init time, at the expense of more memory use; Improved concurrency for multiple; SNPE objects running simultaneously on DSP; Improvements when using priority control on DSP; Added support for channel shuffle and ArgMax in the ONNX converter; Support multiple subnets within the AIP runtime;
1.24.0 Feb 2019 Adding setProfilingLevel API support for AIP and CPU runtimes; Various stability issues on aip runtimes are addressed;Added support for Snapdragon 712;Support multi inputs and multiple outputs on each SNPE AIP’s subnet
1.23.0 Jan 2019 Upgrade to Android NDK r17c to build SNPE; Improving initialization and de-initialization times; Various DSP timing fixes; Addressed some DSP concurrency edge cases that could impact output values; TF converter support for non max suppression, crop and resize Ops
1.22.0 Nov 2018 Support for several new ops on DSP runtime; Upgrade to Android NDK r16b to build SNPE; setProfilingLevel API support in DSP runtime; Added new tool snpe-throughput-net-run
1.21.0 Oct 2018 Tensorflow converter and CPU runtime support for various ops; DSP runtime support for Eltwise Realdiv and Square ops; GPU support for resize_align_corners layer
1.20.0 Sep 2018 Support for QCS605 LE platform; NDK version upgrade to r14b; Tensorflow converter support for elementwise sqrt and softmax with dimension > 2; Platform validation command line tool
1.19.0 Aug 2018 ELU op support for Tensorflow/Onnx Converters and CPU/GPU runtimes; BoxWithNMSLimit and BBoxTransform ops support in caffe2 converter; Support for Caffe Power Layer in GPU
1.18.0 Jul 2018 Support for pad and elementwise subtraction on GPU; ONNX converter support for shape and pad ops; Tensorflow converter support for additional ops
1.17.0 Jun 2018 Support for Scale Layer in Caffe converter and DSP runtime, DSP support for batch>1 and ChannelShuffle, Updated SDK examples for Inception v3 2016 model
1.16.2 May 2018 Remove linkage to libstdc++.so in DSP loader libraries
1.16.1 May 2018 Remove linkage to libstdc++.so, DSP runtime fixes, fix for 1D BatchNorm
1.16.0 May 2018 Batch>1 support (except DSP runtime); layer optimizations for DSP runtime; Caffe2 ChannelShuffle support (except DSP runtime)
1.15.2 Mar 2018 Fix for GPU runtime memory leak and reshape to/from 1D
1.15.1 Apr 2018 Fix for converter for instance normalization followed by scale
1.15.0 Apr 2018 Support for instance normalization for Caffe and Caffe2, MobilenetSSD (Caffe)
1.14.1 Mar 2018 Minor fixes
1.14.0 Mar 2018 ONNX converter (alpha), multiple enhancements and fixes
1.13.0 Feb 2018 GPU and DSP v65 performance improvements. GPU floating point 16 support.
1.12.0 Jan 2018 Support for Android LLVM/libc++, MobilenetSSD (TensorFlow)
1.10.1 Dec 2017 Fix a bug in the DSP runtime when using mixed userbuffer input types
1.10.0 Dec 2017 Support for Mobilenet on DSP, enhanced DSP runtime, Snapdragon Flight Board, updates for UserBuffers
1.8.0 Nov 2017 Mobilenet support on CPU, GPU, Support for Snapdragon 636 and Android 64 bit
1.6.0 Oct 2017 Support for Snapdragon 450, minor updates and fixes
1.4.0 Aug 2017 Support for Snapdragon 630, FasterRCNN and ADSP on AGL
1.2.2 July 2017 QDN release
1.2.0 June 2017 Beta Caffe2 Converter
1.0.2 May 2017 Support for 820AGL platform, Snapdragon 660, and Compute DSP on Android
1.0.1 Apr 2017 Documentation update only
1.0 Apr 2017