Archives

What's in Qualcomm Neural Processing SDK v1.61.0?

NEW FEATURES

Converters: Onnx: Enabled support to handle custom op inputs correctly when the default values are provided
ONNX Converter: Added support to resolve static ONNX Cast operation as Constant
CPU Runtime: Supported CRD mode for depthtospace(pixelshuffle)
ONNX Converter: Fix simplifier behavior with given input dimensions
DSP Runtime: Added support for LayerNorm for V65/V66
Converters: Added new pattern to fold ReduceL2 + Div as L2Norm
Converters: Added support for Relay IR's requantize op that can be seen in framework quantized models

BUG FIXES

Core: Improved performance of loading DLC from a memory buffer
ONNX Converter: Fixes scale calculation for ONNX Resize Operator for align_corner mode. Also overrides Resize input axis format as per source axis order
Caffe Converter: Added support for Caffe Scale where the scale weights are of shape [batch,channels] and axis == 0
ONNX Converter: Fixed issues for Axis Tracking related to L2 Norm
SDK: Update Sample Code to demonstrate handling multiple ITensor inputs
AIP Runtime: Fixed low accuracy issue on mobilenet variant for Multi-class NMS layer
ONNX Converters: Added support for combination of Nearest and Half_pixel modes for ResizeOp

Known Issues:

SNPE DSP: Observing error if second input to scale layer is having rank equal to 1
Higher De-init time is observed on QRB5165 platform with CPU runtime for models like Mobilenet

What's in Qualcomm Neural Processing SDK v1.60.0?

NEW FEATURES

Tools: Converter: Added ONNX Gemm transA and transB support
Native sample code is updated to take static quantization parameters for quantized input buffers
libSNPE.so, libcalculator.so, libplatformValidatorShared.so, libnpe_dsp_domains_v2.so – libraries generated with gcc7.5, gcc8.2 and gcc9.3 toolchain - are now compiled with additional read-only relocation compiler flags
Documentation update: User Logging API documentation added in Application Tips section

Bug Fixes

HTP: Fixed issue with Cast op usage in certain configurations
ONNX Converter: Improvements to handle different input axis layouts

Known Issues:

Minor reduction in accuracy for VGG16 is observed
Error: Model Validation fails for FC layer with error that there is a mismatch between weights and input dimensions

Characteristic: Typically seen with ONNX models where the FC layer (with 4D input A and 2D input B) input follows Reshape layer either immediately or after some trivial eltwise layers
Workaround: Insert reshape Op before FC to the input A with shape (orig_4D_shape[0], -1)

Error: ONNX models with LSTM layer will have validation error related to input shape or will cause significant drop in accuracy

Characteristic: LSTM models that have initial h/c input tensors will generally fail due to this issues
Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op

Error: AssertionError: LSTM h/c input buffer needs to have format NONTRIVIAL, got NFC'

Characteristic: Failure seen with bidirectional lstm Layers
Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op

Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.59.0?

NEW FEATURES

DSP Runtime: Added support for edge padding from SNPE side
TensorFlow Converter: Added support for the beta and gamma parameters for InstanceNorm
ONNX Converter: Limited support for Expand operator when it can be interpreted as a noop from operator attributes
ONNX Converter: Added support for ScatterND
DSP Runtime: Added graph identifier into minidm logs to enhance debugging

BUG FIXES

Quantizer: Fixed duplicate Convert layer Id issue observed in generated DLC when multiple Convert layers feed into a single layer
ONNX Converter: Fixed handling of models with inputs of unknown shape
UDO: Fixed typo in generation of UDO template cod
ONNX Converter: Resolved issue where Shape operator translation could fail if the input was part of the initializer list

Known Issues:

Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.58.0?

NEW FEATURES

Converter: Enabled broadcasting of weights and bias for BatchNorm layer to match channel dimensions
DSP: Enabled the support for Elementwise Log and Neg Ops on HTP
DSP: Enabled support for all axis values in reduce mean with axis size of 2

Known Issues:

Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.57.0?

NEW FEATURES

ONNX Converter: Added support in dry-run mode to handle reporting ops that are not in ONNX schema domain
Converters: Updated inaccurate macs/params calculations for Ops per re-analysis
CPU Runtime: Set the max detections to keep top K for Caffe SSD network
Converters: Fixed axis tracking bug for permute when input is btf format
Converters: Removed obsolete ssd_permute_param parameter in caffe converter permute translation
Converters: Added command line argument to override data type for inputs
TFlite Converter: Enabled conversion of BERT style models

BUG FIXES

Converters: Fixed coefficient input broadcasting issue for ONNX Prelu operation
DSP Runtime: Optimized performance for strided_slice operator for V65/V66
DSP Runtime: Fixed axis quantization not adding all the fixedPointParam of output to bufferDeltas
DSP Runtime: Accuracy Fixes and Improvements for PReLU on V65/V66
ONNX Converter: Fixed issue with improper handling of Elementwise Div during conversion
DSP Runtime: Improved handling of Pad ops during prepare for HTP

Known Issues:

Graph preparation fails when –enable_init_cache option is used in DSP Runtime for HTP

What's in Qualcomm Neural Processing SDK v1.56.2?

NEW FEATURES

Clarified the documentation/example for specifying UDO data types

BUG FIXES

Converter: Added new layernorm sequence for pattern matching and added a constraint to enforce MatMul layer's constant second input to 8-bit tensor in quantized model
ONNX Converter: Adds support for scale/offset quantization overrides
ONNX Converter: Fix warning for Const operator from Opset 11
ONNX Converter: Fix scale factor calculation error due to mixing height&width dimensions
CPU Runtime: Support LayerNorm layer
CPU Runtime: Set the max detections to keep top K for Caffe SSD networks

What's in Qualcomm Neural Processing SDK v1.55.0?

NEW FEATURES

Added support for the OneHot operation across the SNPE converters with runtime support available on SNPE CPU
ONNX Converter: Added support for LSTM & CRNN
ONNX Converter: Added support for Exp layer
ONNX Converter: Change to stop parsing at specified output nodes
Caffe Converter: Changed the conversion of SSD models to use the DetectionOutput layer. Re-converting these models is strongly recommended as the old layer will be removed in the future.
Caffe Converter: Added support for Caffe Power scale/shift parameters
Caffe Converter: Warn for uninitialized weight/bias parameters in the BatchNorm layer and initialize them to default values
Converters: Enabled support for conversion of Cast Op in TFlite and Pytorch converters
DSP Runtime: Added support for LSTM
DSP Runtime: Optimized mirror padding for better vtcm utilization on HTP

BUG FIXES

DSP Runtime: Fixed the issue of invalid cache record added to DLC while doing offline prepare for HTP
Converters: Fixed Softmax and Reduction Ops to have default case for output_buf axis format
ONNX Converter: Fixed an issue with the Slice layer when the end is set to INT_MIN

Known Issues:

Slight accuracy drop is observed for Caffe Mobilenet SSD for GPU runtime
Performance regressions are observed on HTP for Caffe based detection models
Random issue is observed on stability on CPU runtime

What's in Qualcomm Neural Processing SDK v1.54.2?

NEW FEATURES

TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs
Pytorch Converter: Add initial Pytorch converter, and documentation for it
Converters: Adds support for Caffe Reduction layer Sum and Mean Ops
Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free
ONNX Converter: Add support for LSTM & CRNN in converters
Converters: A change was made that ensures that when the outputs of a network are represented by an Identity op, the network will retain original output names, even when the ops are removed from the network. Previously Identity ops were stripped and the output name used would default to the name of the input to the identity. With the new change the identity ops are still stripped, but the name of the input will be updated to represent the original Identity output name. This change will only impact customers which use outputs of a network that are Identity ops. In this case, the output name will now match the original framework model's output name rather than the previous node's output name.
DSP Runtime: Add support for LSTM

BUG FIXES

Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization
ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias
Converters: Add support for reverse engineering SAME padding mode from the explicit pad values
DSP runtime: Fixed scratch buffer over access issue in RoiAlign
DSP Runtime: Fixed graph prepare issue for reshape const node
DSP Runtime: Optimized reduce mean performance when reduced on channel
Java API: Added protection when removing tensors to avoid crashing in a multithreaded application

Known Issues:

Slight accuracy regressions Mobilenet_v1_quantaware and Mobilenet_V2_SSD_quantaware on HTP Runtime are observed
Performance regression is seen on DeeplabV3 model with online graph preparation

What's in Qualcomm Neural Processing SDK v1.53.2?

NEW FEATURES

Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize
Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select
Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes

BUG FIXES

Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation
Tool: Tf Converter: Fixed issue where tf model conversion was resulting in one static node even though start of network provided as input from command line

Known Issues:

Accuracy regressions on VGG16, Alexnet and Mobilenet_v1 on HTP Runtime are observed. Mobilenet_v1 accuracy regression can be recovered by using Enhanced Quantization

What's in Qualcomm Neural Processing SDK v1.52.0?

NEW FEATURES

Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
Tool: Converter: Added Converter support for Nd Reshape layer
Tool: Converter: Added CastOp support for TF
Tool: Converter: Added support for static subgraph resolutions at conversion time
Tool: Converter: Added support for tensor dtype for TF fill op translation
NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers

BUG FIXES

NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
NPE GPU : Added optimized kernel for ReduceMean Operation
Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
NPE AIP: Optimized the input conversion for the models involving padding along width dimension

What's in Qualcomm Neural Processing SDK v1.51.0?

NEW FEATURES

Converter Tool: Added support for Onnx WhereOp
Added support for edge padding type for pad operation in GPU runtime
Neural Processing Engine DSP: Enabled support for ElementWiseUnary abs layer on HTP
GPU Runtime: Added support for asymmetric reflect padding for pad operation
UDO: Allow users to specify a different datatype for each core in single config file
UDO: HTML documentation & sample app is updated to provide example for loading UDO package

BUG FIXES

DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios
Neural Processing Engine: Init stage is optimized to be done faster
Neural Processing Engine DSP: Optimized maxpool with stride 2x1 on HTP
Neural Processing Engine DSP: Optimized the big sized concat ops to fit into memory
Neural Processing Engine DSP: Optimized the init on HTP
Neural Processing Engine DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs
Neural Processing Engine DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized
Fixed issues related to HMX hysteresis management on HTP which include correct timer expiry handling, and deadlock avoidance when both hysteresis timeout and de-init happens around the same time

What's in Qualcomm Neural Processing SDK v1.50.0?

NEW FEATURES

Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification
DSP Runtime: Improved instance norm op accuracy when input size is big
DSP Runtime: Enabled edge padding support for v65/v66 targets

BUG FIXES

Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly
Fixed issues with offline prepare of DeepLabV3 for SOCs with 2MB VTCM
Tool: Tensorflow Converter: Resolves issue where TF Mul was not being translated correctly
Fixed issue in HTP prepare with certain combinations of Conv followed by other layers
Improved Convolution performance on HTP when horizontal and vertical stride are not equal
Improved accuracy of Instance Norm on DSP
Fixed DSP clock drop issue by adding clock vote hysteresis support
Fixed issue with quantization of ArgMax layers of certain input type
Bug fixed that caused failure when running an INT8 network followed by an INT16 network in DSP runtime
Tool: Tensorflow Converter: fixed issue with constant coeff input to multiple Prelu layers
Enhanced split logic for ConvLayer for certain input type
Fixed issue with elementwise add for certain input type in DSP runtime
Fixed issue in HTP prepare with certain combinations of addsub_op followed by other layers
Resolved performance issue of multiple concurrent executions using common HW resource in DSP runtime
Fixed HTP prepare issue with MobileBERT

What's in Qualcomm Neural Processing SDK v1.49.0?

Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
Enabled Matmul support in SNPE HTP
ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
Switched diagnostic logging (SNPEDiag.log) to a new file format
Removed use of and dependency on OpenMP in CPU runtime
Resolved issues with layer dump feature using --debug flag on HTP
Optimized performance of the PreLU layer on HTP
Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
Improve instance norm op accuracy when input size is big in DSP runtime
Added support for using Unsigned PD in AIP runtime
Added support for the MobileDetEdgeTPU SSD model
Added support for models with UDO in the Android Sample App
Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
Fixed support for processing tf.layers.Dense in TF Converter
Fixed the issues with UDO fallback to CPU on HTP
Fixed a shape issue with certain structures including FC in Onnx Converter
Fix Unpack Layer indexing error on HTP
Fix overflow issue in instance norm op when variance is too small in DSP runtime
Optimized input node followed by concat on HTP
Added session reset logic to handle the case when DSP goes to SSR
Improved the performance of 7x7 Depthwise Conv2D op on HTP
Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
Added support for TF 1.15 NonMaxSuppressionV3 translation
Added necessary restriction when optimizing graph having matmul in DSP runtime
Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
Added scale layer in offline prepare in DSP runtime
Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
Added opset 11 support for the ConvTranspose op in Onnx converter
Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
Fixed 16bit dequantization issue when output data length does not align to 128

What's in Qualcomm Neural Processing SDK v1.46.0?

Optimized argmax op l2 cache prefetch in DSP runtime
Fixed issue of Lrn_d32 op fails for window size 1 in DSP runtime
Fixed issue of InputSupernode Fails in an edge case in DSP runtime

What's in Qualcomm Neural Processing SDK v1.45.3?

Accuracy fixes for various Layers on HTP
Init/De-init time improvements
Inference Performance Improvements

What's in Qualcomm Neural Processing SDK v1.43.0?

Improved the input/output conversion times for models having depth as 4 on AIP runtime
Enabled initial support for constant layers along with elementwise Op on HTA
Added support for opaque float concat operation in SNPE DSP concat layer
Added support for Caffe's "Clip" layer in the caffe converter
Added int16 example to snpe-sample app
Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
Fixed the issue with generation of HTA enabled dlc for denoise model
Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.

What's in Qualcomm Neural Processing SDK v1.42.0?

Removed V60 DSP libs from SNPE SDK
Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler
Enabled multithread for re-quantize process in DSP runtime
Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime
Added support for opaque float concat operation in SNPE DSP concat layer
Fixed bug in UserBufferTF8 where retrieving the encoding would always return null
Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime
Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime

What's in Qualcomm Neural Processing SDK v1.41.0?

Added MatMul support on the CPU runtime
Added support for new version of 7250 with integrated PMIC module
User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively

What's in Qualcomm Neural Processing SDK v1.40.0?

Added DSP Graph Caching support for AIP models with HVX subnets
Upgraded DSP to use Hexagon SDK 3.5.2 toolchain
Added support for 16bit UDO layers in DSP
Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size
Fixed the issue with buffer ordering during the execution of batched models on AIP runtime
Fixed issue with SsdDetectionOut when number of classes is only 1
Fixed accuracy issue with Correlation 1D op
Fixed improper processing when 16bit input quantization is used in certain cases
Fixed scaling logic in convert_16 op

What's in Qualcomm Neural Processing SDK v1.39.1?

Update to v1.39.0 to address performance regression of Mobilenet SSD model on AIP runtime

What's in Qualcomm Neural Processing SDK v1.39.0?

Added graph caching support which improves init times for DSP & AIP networks. (DSP subnet with in AIP is not supported)
Optimized Prelu to reduce saturation loss during re-quantization at prelu by using cubic approximation
Added additional logging messages for debugging in DSP runtime
Fixed the issue with setting the performance profile for AIP runtime in multithreading scenarios
Fixed issue with incorrect dlc generation problem when multiple instances of snpe-dlc-quantize running in parallel for AIP runtime
Fixed potential bug with freeing threads in DSP runtime
Fixed issue of incorrect UDO tensor datatype in quantizer

What's in Qualcomm Neural Processing SDK v1.38.0?

Enabled FC/MatMul to use VTCM if available in DSP.
Optimized 16-bit MeanVarianceNormalize in DSP runtime.
Added support for batchwise scalar divide operation in DSP runtime.
Optimized Hard-swish operator for mobilenetV3.
Added support for EltwiseMin layer for ONNX converter and CPU runtime.
Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
ONNX softmax converter support only existed for rank <= 2. Support for tensors rank <= 4 was added.
Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices.
Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
Fixed a bug during graph transformation while folding the batchnorm on HTA.
Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.

What's in Qualcomm Neural Processing SDK v1.37.0?

Enabled the online compiler support for HTA 1.x family of devices
AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity
ONNX Converter: Added support for Onnx Pad layer (OpVer 11)
Bug fix. Snpe-dlc-ino: Fixed issue in MACs calculation error for deconvolution layer

What's in Qualcomm Neural Processing SDK v1.36.0?

Added Java API extension to register UDO package with SNPE
snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
Extend Platform Validator to detect HTA API versio
Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
Fixed bug with propagation of model version during conversion
Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
Fixed the issue with allocation of layer descriptor while loading the network on HTA

What's in Qualcomm Neural Processing SDK v1.35.0?

Introduce the User-Defined Operations (UDO) feature
Added support for SDM720G/SM7125
Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16)
Input batching support is added for networks that can run completely on AIP runtime
Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes
Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter
Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so
Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides
Fix ssd_detection CDSP crash on DSP runtime
Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition
Improved the tiling configuration support for depth wise convolution layer

What's in Qualcomm Neural Processing SDK v1.34.0?

Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
Add support for the depth2space operation on the GPU.
Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
Truncate detection output on DSP to return valid data only.
Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
Fix support for NV21 encoding in the DSP runtime.

What's in Qualcomm Neural Processing SDK v1.33.2?

Address accuracy issues for Deconvolution in the AIP runtime
Changed behavior of Crop layer resize, so it retains the number of copied elements on each dimension
Make quantizer --override_params work for AIP
Reordered PerformanceProfile_t to be ABI compatible with 1.32.0
Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements

What's in Qualcomm Neural Processing SDK v1.33.1?

New performance modes have been added:
LOW_POWER_SAVER: Run in lower clock than POWER_SAVER, at the expense of performance
HIGH_POWER_SAVER: Run in higher clock and provides better performance than POWER_SAVER
LOW_BALANCED: Run in lower balanced mode, provides lower performance than BALANCED
snpe-dlc-info adds a summary of the layer types in use in the model
Updated to use new BLAS functionality that leverages OpenMP. This adds a new dependency on the OpenMP shared library for Linux platforms
Added 32-bit bias support
Support init caching for SSD output layer on DSP
Bugs:
Fix memory leak causing increasing init time for DSP
Add converter support for dilated convolution when used with fakequant nodes
Multiple bugs fixed in snpe-onnx-to-dlc that were causing errors for models having torch.Mul op
Extends TF converter support to NMSv1 Op in addition to existing support for v2 and v3 NMS Ops
Tensorflow conversion bug fixed in infer_shape for StridedSlice Op. output_shape should not be a list of shapes but the shape of the one output
Fix bug with propagation of model version during conversion
If burst mode is set, set thread affinity to Big Cores during init and de-init, and restore to the previous setting after the actions are complete
Fix segfault when using user buffers with a resizable dimension

What's in Qualcomm Neural Processing SDK v1.32?

Add Caffe MVN Layer support in the Caffe Converter, CPU Runtime, and DSP Runtime
snpe-dlc-quantize: Enable the use of quantization parameters calculated during training when using dlc quantizer. To override the SNPE generated quantization parameters simply pass -- override_params to snpe-dlc-quantize.
Removed deprecated command line arguments from converters. All three converters now require passing -i/--input_network for model input paths. Help menus are updated for each converter
snpe-dlc-diff: Added command-line option [--diff_by_id/-i] to snpe-dlc-diff. This option allows users to compare 2 models in order(sorted by id); as oppose to only diffing common layers
Added support for L2Norm layer to TensorFlow converter
Optimized the DSP performance for the 'Space To Depth' layer
Add support in the Java API for setInitCacheEnabled(), and setStorageDirectory() to enable DLC caching support.
Allow graceful recovery after a fastrpc error - Recreate the userPD after the cDSP crashes so that the user can continue on the SNPE process with subsequent instances, instead of having to close the SNPE process. Note: all the instance associated to the previous userPD will be lost.
snpe-dlc-viewer: Associate each layer type to a fixed color for consistency when using snpe-dlc-viewer
Split the SNPE isRuntimeAvailable method into two separate functions to improve backward compatibility with existing client binaries that were built against the older signature.
Bugs:
TF Converter: Fix Elementwise Broadcast support
ONNX Converter: Fixed bug where output dimension was incorrect when keep_dims parameter was set to False for Argmax, ReduceSum and ReduceMax.
ONNX Converter: Fixed bug where pad attribute was not properly parsed for Deconv Op.
Caffe Converter: Fixed bug when converting SSD-based models when using Python 3.
TF Converter: Fixed bug where converter was removing const Op input to reshape op when passed through identity op(s). i.e const-> identity -> reshape.
Fixed bug where getOutputSize() would give the wrong result on output tensors in UserBuffer mode

What's in Qualcomm Neural Processing SDK v1.31?

New patterns were added to enable running the CLE algorithm on more op patterns and model architectures
Added support for HeatmapMaxKeypoint and the ROI Align layer in the CPU runtime
Added initial L2Norm layer support in CPU runtime. No support for axis parameter yet: normalization is performed along the inner-most dimension of the input tensor
Support for single-input Concatenation layers was added to CPU, GPU and DSP
Added support for Detection Output layer on DSP runtime. Currently, only a batch of 1 is supported
Changed determination of number of batch dimensions in the Fully Connected layer so rank greater than 1 is always assumed to mean that there is 1 batch dimension
Enhanced dlc-info tool to support runtimes available per layer. Removed constraint on the LSTM layer in the GPU runtime that prevented batch mode operation.
Added Tensorflow converter support for Caffe-style SSD networks
Added support for Leaky-RELU in the TensorFlow converter. Both the actual Leaky-Relu op and the elementwise op representation are supported and map to SNPE's Prelu op.
Added Argmax support to the Caffe converter, and optimized performance on the DSP runtime
Added new column to snpe-dlc-info that displays the supported runtimes for each layer. F12 Initial support for per-layer statistics from AIP/HTA subnets

What's in Qualcomm Neural Processing SDK v1.30?

Documentation has been added to reflect the new common converter command line options for input processing
Converters now propagate required batchnorm information for performing quantization optimizations
Support for the new bias correction quantization optimization which adjusts biases by analyzing float vs quantized activation errors and adjusting the model to compensate
ONNX converter now filters single input Concats as a no ops as SNPE didn’t support them
Converter input processing now uniformly handles different input types and encodings
ONNX converter now supports the ConvTranspose ‘output_padding’ attribute by adding an additional pad layer after the ConvTranspose op
Integrates the latest flatbuffer 1.11 library which brings speed improvements and options for model size reduction
GPU size limitations with the ArgMax op (when setting the keepDims op attribute to false) can be worked around by enabling CPU fallback
Fixed DSP error with MobileNet SSD on QCS403 and QCS405
Fixed the issue with partitioning of deconv layer in HTA

What's in Qualcomm Neural Processing SDK v1.29?

Added support for dlc reorder tool
Optimization of HTA d32 conversions
Added tf space_to_depth op for SNPE CPU and DSP runtime
Benchmarking scripts enhanced for showing further break down of execution time, across various components
Added support for additional ONNX binary element-wise ops
Optimized deconv layer for improving performance
Fixed an issue related to runtime error in DSP runtime
Performance Optimization of SNPE GPU Runtime for Shufflenet V2 by using profiling level config

What's in Qualcomm Neural Processing SDK v1.28?

Added an optional argument to isRuntimeAvailable for the DSP runtime so that it doesn't activate the DSP
Allow UB_T8 and UB_FLOAT output for snpe-net-run
Added a new command line option for snpe-dlc-diff to check layer names
Updated the --dlc argument to --output_path for snpe-caffe-to-dlc to align with the ONNX converter
Added --dry_run argument to snpe-onnx-to-dlc to allow evaluation for successful conversion on an ONNX model
Added support for the gather op in the DSP runtime
Added support to convert the TF MobileNet-V1-FPN-SSD model
Fixed a memory leak in the DSP runtime that is seen when repeatedly loading and unloading a network
Addressed issues on V66 DSPs related to acquiring VTCM memory
Fixed an issue related to multiple inputs for the Caffe converter
Fixed an issue in the TF converter related to element-wise sun and the atrous parameter
Fixed an issue in the TF converter related to tf.crop_and_resize when there are only 2 inputs
Fixed additional cases of uncaught exceptions with the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27.2?

Added support for SM8150P
Fixed memory leak issue on AIP runtime
Fixed additional cases of uncaught exceptions with the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27.1?

Updated the AIP runtime to support new features and to fix critical bugs related to HTA. On new Android builds, HTA can support new layers, Bilinear Resize and Prelu
Fixed issues relating to uncaught exceptions on the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27?

Added new APIs support for setting output tensor names to snpeBuilder and to fetch output tensor names for a given output layer name
Improved the peak memory usage with DLC v3 format
Fixed few issues with performance and runtime failures on DSP runtime
Fixed few issues and improved error handling for platform validator
Fixed the issues with Pooling and Instance norm layers of Tensorflow converter

What's in Qualcomm Neural Processing SDK v1.26?

Added support for the ONNX Gather Op in the ONNX Converter and CPU runtime
Optimized DeConvolution Layer for the DSP runtime
Support for tf.nn.moments in the TF converter, CPU and DSP runtimes
Added TF Reflect Pad support for the DSP runtime
Added symmetric quantizer option in snpe-dlc-quantize
Added support for batch > 1 when using the Scale Layer on the DSP runtime
Updated Platform Validator python script to be OS-independent
Added additional optimizations for HTA input conversion

What's in Qualcomm Neural Processing SDK v1.25.1?

This release focuses on a few key bug fixes for the AIP runtime.

Fixed accuracy issues on the AIP runtime
Added support for UB_TF8 with the AIP runtime
Added support for dilated depthwise convolution on GPU runtime

What's in Qualcomm Neural Processing SDK v1.25?

This release focuses on adding the support for multiple subnets within the AIP runtime and upgrading the DLC format to improve load time performance and memory consumption. In addition, this release fixes critical issues on DSP runtime and adds support for new operations on Tensorflow, ONNX converters and on DSP runtime.

There is a known issue with mobilenet benchmark performance regression due to variance in benchmarks and changes for improving accuracy
Added option to request larger memory allocations on DSP for improved init time, at the expense of more memory use
AIP runtime does not support ub_tf8 data mode currently
Support for Android GCC build variants will be discontinued after the 1.25.0 release
The last release for the Qualcomm Flight platform (arm-linux-gcc4.8hf) will be the 1.25.0 release
x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04 after the 1.27.0 release
The x86 binaries will move to clang 7 after the 1.26.0 release
Few performance improvements on DSP numbers, as measurements are reported on quantized DLCs from 1.25.0 release

What's in Qualcomm Neural Processing SDK v1.24?

This release focuses on adding the support for multiple inputs and multiple outputs on each subnet of AIP runtime and allows the setProfilingLevel API support for AIP and CPU runtimes.

There is a known conversion issue with the snpe-caffe-to-dlc-udl tool for converting a custom UDL layer which will be resolved in next release
Support for Android GCC build variants will be discontinued after the 1.25.0 release
x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04, after the 1.27.0
The x86 binaries will move to clang 7 after the 1.26.0 release

What's in Qualcomm Neural Processing SDK v1.23.1?

This release focuses on improving the initialization/de-initialization times along with adding important timing/accuracy fixes for various Ops.

Added support for non max suppression, crop and resize layers on Tensorflow converter
Fixed the output inconsistency when multiple instances running concurrently on DSP runtime
Support for Android GCC build variants will be discontinued after the 1.25.0 release
x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04, after the 1.27.0

What's in Qualcomm Neural Processing SDK v1.22.0?

This is a major release that adds support for two new Snapdragon Mobile Platforms, Snapdragon 855 and Snapdragon 675. We introduce support for the Qualcomm® Hexagon™ Tensor Accelerator (“HTA”) though the new “AIP” runtime that executes neural networks on HTA and falls back to HVX where necessary. The following are the major features that complete the usual collection of bug fixes and smaller features:

Support for the Snapdragon 855 mobile platform on the Hexagon DSP with Tensor Accelerator and Vector eXtensions, Adreno GPU and CPU
Support for the Snapdragon 675 mobile platform on the Hexagon DSP, Adreno GPU and CPU
Added new AIP runtime for 855
Added priority control for DSP workloads
Support for manually setting quantization ranges
Added new ‘snpe-throughput-net-run’ tool with support for simultaneous execution on different cores

What's in Qualcomm Neural Processing SDK v1.19.2?

The focus of this release is to add new operations and to fill gaps in operators support and to optimize existing operations such as Deconvolution.

Support for the Qualcomm QCS605 SoC on the Hexagon DSP (Android, Linux) and on Adreno GPU and CPU
Added support for the ELU operation for TensorFlow and ONNX on GPU and CPU
Added support for the Power operation for Caffe2 on GPU
Added support for Python 3.4
Optimized the Deconvolution, Slice and large Softmax operations on DSP

What's in Qualcomm Neural Processing SDK v1.18.0?

This release brings in support for three Snapdragon Mobile Platforms, broadens compatibility with MobileNet SSD networks and expands the supported operations on TensorFlow and ONNX converters. In addition, this release optimizes support for batching, especially when executing MobileNets on the DSP runtime.

Support for the Snapdragon 632 mobile platform on the Hexagon DSP, Adreno GPU and CPU
Support for the Snapdragon 439 and 429 mobile platforms on Adreno GPU and CPU
Improved compatibility of MobileNets networks, including an extended support for MobileNet SSD variations
Support for the TensorFlow ‘pad’ and elementwise subtraction on Adreno GPUs
Added support for ChannelShuffle to the TensorFlow converter
Added support for Shape and Pad to the ONNX converter

What's in Qualcomm Neural Processing SDK v1.17.0?

This release completes a few features and focuses on quality and stability while bringing some minor optimizations with it.

Added batching support to DSP. All runtimes have basic batching support now.
Extended batching support to the ChannelShuffle layer
Extended Caffe Scale layer support to Snapdragon DSPs
Optimizations around effective utilization of the DSPs
Updated SDK examples

What's in Qualcomm Neural Processing SDK v1.16.0?

The major addition of this release is support for input batching, which means being able to process input tensors with more than one element on the ‘batch’ dimension. This applies to models in Caffe, Caffe2, TensorFlow and ONNX models and when run on the Snapdragon GPU and CPU cores.

Input batching on Snapdragon GPU and CPU
Support for a new layer: ChannelShuffle (on GPU and CPU, for Caffe2 models)
Optimized the Sigmoid, Batch Normalization and Instance Normalization layers
Added the Inception-v3 model to the example APP

What's in Qualcomm Neural Processing SDK v1.15.0?

This release adds support for Caffe-based MobileNet SSD networks, and introduces accelerated Instance Normalization, and initial support for Grouped Deconvolutions and per-channel Batch Normalization and a Power layer. See the Layers and Limitations sections of the Reference Guide (available online and in the SDK) for more details.

Support for Caffe-based MobileNet SSD
Support for new layers: Instance Normalization
Extended support with Grouped Deconvolution and 1D Batch normalization
MobileNet SSD is 49% faster on GPU 16-bit
On average networks are 9% faster across supported chipsets and acceleration cores

What's new in Qualcomm Neural Processing SDK v1.14.0?

The ONNX 1.0 open format for deep learning models is welcomed in our March SDK release. For the list of supported operations please refer to the documentation in the SDK, or to the Documentation section of this website. This release also adds support for two new layers and a new performance profile mode.

Support for ONNX 1.0 models (Beta)
Support for new layers: Generate Proposals, and RoIAlign
Added a manual performance mode

What's new in Qualcomm Neural Processing SDK v1.13.0?

This update increases inference performance, and in particular adds support for the new digital signal processor included in the Snapdragon 845 mobile platform. This release also adds optimization to the 16-bit floating point runtime.

Support for the digital signal processor in the Snapdragon 845 mobile platform
Performance increase on the 16-bit floating point runtime
Performance improvements on the GPU runtimes
Initial support for Generate Proposals and RoiAlign layers for Caffe2, on the DSP runtime

What's new in Qualcomm Neural Processing SDK v1.12.0?

This large update introduces a full new accelerated runtime for 16-bit GPU computation, and support for a TensorFlow-style SSD network with MobileNets. We also introduce new library variations optimization.

Support for MobileNet SSD support on CPU and GPU
Added a GPU 16-bit floating-point runtime
Optimizations to the DSP runtime for the Snapdragon 845 mobile platform
Added Android LLVM libraries
Support for shared Symphony System Manager SDK libraries

What's new in Qualcomm Neural Processing SDK v1.10.1?

This release adds support for new Snapdragon platforms, deploys a fully new DSP runtime, fixes bugs and completes MobileNets support.

Initial support for the Snapdragon 845 mobile platform
Support for MobileNets on DSP; note that 8-bit quantization may not work well on this network structure
Upgraded the DSP acceleration runtime for greater performance and broader compatibility
Fixed Faster R-CNN UserBuffers operation
Support for Snapdragon Flight boards

Archives

What's in Qualcomm Neural Processing SDK v1.61.0?

What's in Qualcomm Neural Processing SDK v1.60.0?

What's in Qualcomm Neural Processing SDK v1.59.0?

What's in Qualcomm Neural Processing SDK v1.58.0?

What's in Qualcomm Neural Processing SDK v1.57.0?

What's in Qualcomm Neural Processing SDK v1.56.2?

What's in Qualcomm Neural Processing SDK v1.55.0?

What's in Qualcomm Neural Processing SDK v1.54.2?

What's in Qualcomm Neural Processing SDK v1.53.2?

What's in Qualcomm Neural Processing SDK v1.52.0?

What's in Qualcomm Neural Processing SDK v1.51.0?

What's in Qualcomm Neural Processing SDK v1.50.0?

What's in Qualcomm Neural Processing SDK v1.49.0?

What's in Qualcomm Neural Processing SDK v1.48.0?

What's in Qualcomm Neural Processing SDK v1.47.0?

What's in Qualcomm Neural Processing SDK v1.46.0?

What's in Qualcomm Neural Processing SDK v1.45.3?

What's in Qualcomm Neural Processing SDK v1.43.0?

What's in Qualcomm Neural Processing SDK v1.42.2?

What's in Qualcomm Neural Processing SDK v1.42.0?

What's in Qualcomm Neural Processing SDK v1.41.0?

What's in Qualcomm Neural Processing SDK v1.40.0?

What's in Qualcomm Neural Processing SDK v1.39.1?

What's in Qualcomm Neural Processing SDK v1.39.0?

What's in Qualcomm Neural Processing SDK v1.38.0?

What's in Qualcomm Neural Processing SDK v1.37.0?

What's in Qualcomm Neural Processing SDK v1.36.0?

What's in Qualcomm Neural Processing SDK v1.35.0?

What's in Qualcomm Neural Processing SDK v1.34.0?

What's in Qualcomm Neural Processing SDK v1.33.2?

What's in Qualcomm Neural Processing SDK v1.33.1?

What's in Qualcomm Neural Processing SDK v1.32?

What's in Qualcomm Neural Processing SDK v1.31?

What's in Qualcomm Neural Processing SDK v1.30?

What's in Qualcomm Neural Processing SDK v1.29?

What's in Qualcomm Neural Processing SDK v1.28?

What's in Qualcomm Neural Processing SDK v1.27.2?

What's in Qualcomm Neural Processing SDK v1.27.1?

What's in Qualcomm Neural Processing SDK v1.27?

What's in Qualcomm Neural Processing SDK v1.26?

What's in Qualcomm Neural Processing SDK v1.25.1?

What's in Qualcomm Neural Processing SDK v1.25?

What's in Qualcomm Neural Processing SDK v1.24?

What's in Qualcomm Neural Processing SDK v1.23.1?

What's in Qualcomm Neural Processing SDK v1.22.0?

What's in Qualcomm Neural Processing SDK v1.19.2?

What's in Qualcomm Neural Processing SDK v1.18.0?

What's in Qualcomm Neural Processing SDK v1.17.0?

What's in Qualcomm Neural Processing SDK v1.16.0?

What's in Qualcomm Neural Processing SDK v1.15.0?

What's new in Qualcomm Neural Processing SDK v1.14.0?

What's new in Qualcomm Neural Processing SDK v1.13.0?

What's new in Qualcomm Neural Processing SDK v1.12.0?

What's new in Qualcomm Neural Processing SDK v1.10.1?

Sort By

Filter Results