Release Notes

What's in Qualcomm Neural Processing SDK v1.58.0?

  • Converter: Enabled broadcasting of weights and bias for BatchNorm layer to match channel dimensions
  • DSP: Enabled the support for Elementwise Log and Neg Ops on HTP
  • DSP: Enabled support for all axis values in reduce mean with axis size of 2
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.57.0?

  • ONNX Converter: Added support in dry-run mode to handle reporting ops that are not in ONNX schema domain
  • Converters: Updated inaccurate macs/params calculations for Ops per re-analysis
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD network
  • Converters: Fixed axis tracking bug for permute when input is btf format
  • Converters: Removed obsolete ssd_permute_param parameter in caffe converter permute translation
  • Converters: Added command line argument to override data type for inputs
  • TFlite Converter: Enabled conversion of BERT style models
  • Converters: Fixed coefficient input broadcasting issue for ONNX Prelu operation
  • DSP Runtime: Optimized performance for strided_slice operator for V65/V66
  • DSP Runtime: Fixed axis quantization not adding all the fixedPointParam of output to bufferDeltas
  • DSP Runtime: Accuracy Fixes and Improvements for PReLU on V65/V66
  • ONNX Converter: Fixed issue with improper handling of Elementwise Div during conversion
  • DSP Runtime: Improved handling of Pad ops during prepare for HTP
Known Issues:
  • Graph preparation fails when –enable_init_cache option is used in DSP Runtime for HTP

What's in Qualcomm Neural Processing SDK v1.56.2?

  • Clarified the documentation/example for specifying UDO data types
  • Converter: Added new layernorm sequence for pattern matching and added a constraint to enforce MatMul layer's constant second input to 8-bit tensor in quantized model
  • ONNX Converter: Adds support for scale/offset quantization overrides
  • ONNX Converter: Fix warning for Const operator from Opset 11
  • ONNX Converter: Fix scale factor calculation error due to mixing height&width dimensions
  • CPU Runtime: Support LayerNorm layer
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD networks

What's in Qualcomm Neural Processing SDK v1.55.0?

  • Added support for the OneHot operation across the SNPE converters with runtime support available on SNPE CPU
  • ONNX Converter: Added support for LSTM & CRNN
  • ONNX Converter: Added support for Exp layer
  • ONNX Converter: Change to stop parsing at specified output nodes
  • Caffe Converter: Changed the conversion of SSD models to use the DetectionOutput layer. Re-converting these models is strongly recommended as the old layer will be removed in the future.
  • Caffe Converter: Added support for Caffe Power scale/shift parameters
  • Caffe Converter: Warn for uninitialized weight/bias parameters in the BatchNorm layer and initialize them to default values
  • Converters: Enabled support for conversion of Cast Op in TFlite and Pytorch converters
  • DSP Runtime: Added support for LSTM
  • DSP Runtime: Optimized mirror padding for better vtcm utilization on HTP
  • DSP Runtime: Fixed the issue of invalid cache record added to DLC while doing offline prepare for HTP
  • Converters: Fixed Softmax and Reduction Ops to have default case for output_buf axis format
  • ONNX Converter: Fixed an issue with the Slice layer when the end is set to INT_MIN
Known Issues:
  • Slight accuracy drop is observed for Caffe Mobilenet SSD for GPU runtime
  • Performance regressions are observed on HTP for Caffe based detection models
  • Random issue is observed on stability on CPU runtime

What's in Qualcomm Neural Processing SDK v1.54.2?

  • TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs
  • Pytorch Converter: Add initial Pytorch converter, and documentation for it
  • Converters: Adds support for Caffe Reduction layer Sum and Mean Ops
  • Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free
  • ONNX Converter: Add support for LSTM & CRNN in converters
  • Converters: A change was made that ensures that when the outputs of a network are represented by an Identity op, the network will retain original output names, even when the ops are removed from the network. Previously Identity ops were stripped and the output name used would default to the name of the input to the identity. With the new change the identity ops are still stripped, but the name of the input will be updated to represent the original Identity output name. This change will only impact customers which use outputs of a network that are Identity ops. In this case, the output name will now match the original framework model's output name rather than the previous node's output name.
  • DSP Runtime: Add support for LSTM
  • Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization
  • ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias
  • Converters: Add support for reverse engineering SAME padding mode from the explicit pad values
  • DSP runtime: Fixed scratch buffer over access issue in RoiAlign
  • DSP Runtime: Fixed graph prepare issue for reshape const node
  • DSP Runtime: Optimized reduce mean performance when reduced on channel
  • Java API: Added protection when removing tensors to avoid crashing in a multithreaded application
Known Issues:
  • Slight accuracy regressions Mobilenet_v1_quantaware and Mobilenet_V2_SSD_quantaware on HTP Runtime are observed
  • Performance regression is seen on DeeplabV3 model with online graph preparation

What's in Qualcomm Neural Processing SDK v1.53.2?

  • Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize
  • Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select
  • Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes
  • Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation
  • Tool: Tf Converter: Fixed issue where tf model conversion was resulting in one static node even though start of network provided as input from command line
  • Known Issues:
  • Accuracy regressions on VGG16, Alexnet and Mobilenet_v1 on HTP Runtime are observed. Mobilenet_v1 accuracy regression can be recovered by using Enhanced Quantization

What's in Qualcomm Neural Processing SDK v1.52.0?

  • Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
  • Tool: Converter: Added Converter support for Nd Reshape layer
  • Tool: Converter: Added CastOp support for TF
  • Tool: Converter: Added support for static subgraph resolutions at conversion time
  • Tool: Converter: Added support for tensor dtype for TF fill op translation
  • NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
  • Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
  • Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers
  • NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
  • NPE GPU : Added optimized kernel for ReduceMean Operation
  • Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
  • NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
  • NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
  • NPE AIP: Optimized the input conversion for the models involving padding along width dimension

What's in Qualcomm Neural Processing SDK v1.51.0?

  • Converter Tool: Added support for Onnx WhereOp
  • Added support for edge padding type for pad operation in GPU runtime
  • Neural Processing Engine DSP: Enabled support for ElementWiseUnary abs layer on HTP
  • GPU Runtime: Added support for asymmetric reflect padding for pad operation
  • UDO: Allow users to specify a different datatype for each core in single config file
  • UDO: HTML documentation & sample app is updated to provide example for loading UDO package
  • DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios
  • Neural Processing Engine: Init stage is optimized to be done faster
  • Neural Processing Engine DSP: Optimized maxpool with stride 2x1 on HTP
  • Neural Processing Engine DSP: Optimized the big sized concat ops to fit into memory
  • Neural Processing Engine DSP: Optimized the init on HTP
  • Neural Processing Engine DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs
  • Neural Processing Engine DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized
  • Fixed issues related to HMX hysteresis management on HTP which include correct timer expiry handling, and deadlock avoidance when both hysteresis timeout and de-init happens around the same time

What's in Qualcomm Neural Processing SDK v1.50.0?

  • Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification
  • DSP Runtime: Improved instance norm op accuracy when input size is big
  • DSP Runtime: Enabled edge padding support for v65/v66 targets
  • Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly
  • Fixed issues with offline prepare of DeepLabV3 for SOCs with 2MB VTCM
  • Tool: Tensorflow Converter: Resolves issue where TF Mul was not being translated correctly
  • Fixed issue in HTP prepare with certain combinations of Conv followed by other layers
  • Improved Convolution performance on HTP when horizontal and vertical stride are not equal
  • Improved accuracy of Instance Norm on DSP
  • Fixed DSP clock drop issue by adding clock vote hysteresis support
  • Fixed issue with quantization of ArgMax layers of certain input type
  • Bug fixed that caused failure when running an INT8 network followed by an INT16 network in DSP runtime
  • Tool: Tensorflow Converter: fixed issue with constant coeff input to multiple Prelu layers
  • Enhanced split logic for ConvLayer for certain input type
  • Fixed issue with elementwise add for certain input type in DSP runtime
  • Fixed issue in HTP prepare with certain combinations of addsub_op followed by other layers
  • Resolved performance issue of multiple concurrent executions using common HW resource in DSP runtime
  • Fixed HTP prepare issue with MobileBERT

What's in Qualcomm Neural Processing SDK v1.49.0?

  • Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
  • Enabled Matmul support in SNPE HTP
  • ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

  • IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
  • Updated and for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
  • Switched diagnostic logging (SNPEDiag.log) to a new file format
  • Removed use of and dependency on OpenMP in CPU runtime
  • Resolved issues with layer dump feature using --debug flag on HTP
  • Optimized performance of the PreLU layer on HTP
  • Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
  • Improve instance norm op accuracy when input size is big in DSP runtime
  • Added support for using Unsigned PD in AIP runtime
  • Added support for the MobileDetEdgeTPU SSD model
  • Added support for models with UDO in the Android Sample App
  • Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
  • Fixed support for processing tf.layers.Dense in TF Converter
  • Fixed the issues with UDO fallback to CPU on HTP
  • Fixed a shape issue with certain structures including FC in Onnx Converter
  • Fix Unpack Layer indexing error on HTP
  • Fix overflow issue in instance norm op when variance is too small in DSP runtime
  • Optimized input node followed by concat on HTP
  • Added session reset logic to handle the case when DSP goes to SSR
  • Improved the performance of 7x7 Depthwise Conv2D op on HTP
  • Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

  • Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
  • Added support for TF 1.15 NonMaxSuppressionV3 translation
  • Added necessary restriction when optimizing graph having matmul in DSP runtime
  • Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
  • Added scale layer in offline prepare in DSP runtime
  • Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
  • Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
  • Added opset 11 support for the ConvTranspose op in Onnx converter
  • Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
  • Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
  • Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
  • Fixed 16bit dequantization issue when output data length does not align to 128