Release Notes

What's in Qualcomm Neural Processing SDK v1.54.2?

    NEW FEATURES
  • TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs
  • Pytorch Converter: Add initial Pytorch converter, and documentation for it
  • Converters: Adds support for Caffe Reduction layer Sum and Mean Ops
  • Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free
  • ONNX Converter: Add support for LSTM & CRNN in converters
  • Converters: A change was made that ensures that when the outputs of a network are represented by an Identity op, the network will retain original output names, even when the ops are removed from the network. Previously Identity ops were stripped and the output name used would default to the name of the input to the identity. With the new change the identity ops are still stripped, but the name of the input will be updated to represent the original Identity output name. This change will only impact customers which use outputs of a network that are Identity ops. In this case, the output name will now match the original framework model's output name rather than the previous node's output name.
  • DSP Runtime: Add support for LSTM
  • BUG FIXES
  • Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization
  • ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias
  • Converters: Add support for reverse engineering SAME padding mode from the explicit pad values
  • DSP runtime: Fixed scratch buffer over access issue in RoiAlign
  • DSP Runtime: Fixed graph prepare issue for reshape const node
  • DSP Runtime: Optimized reduce mean performance when reduced on channel
  • Java API: Added protection when removing tensors to avoid crashing in a multithreaded application
  • KNOWN ISSUES:
  • Slight accuracy regressions Mobilenet_v1_quantaware and Mobilenet_V2_SSD_quantaware on HTP Runtime are observed
  • Performance regression is seen on DeeplabV3 model with online graph preparation

What's in Qualcomm Neural Processing SDK v1.53.2?

    NEW FEATURES
  • Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize
  • Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select
  • Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes
  • BUG FIXES
  • Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation
  • Tool: Tf Converter: Fixed issue where tf model conversion was resulting in one static node even though start of network provided as input from command line
  • Known Issues:
  • Accuracy regressions on VGG16, Alexnet and Mobilenet_v1 on HTP Runtime are observed. Mobilenet_v1 accuracy regression can be recovered by using Enhanced Quantization

What's in Qualcomm Neural Processing SDK v1.52.0?

    NEW FEATURES
  • Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
  • Tool: Converter: Added Converter support for Nd Reshape layer
  • Tool: Converter: Added CastOp support for TF
  • Tool: Converter: Added support for static subgraph resolutions at conversion time
  • Tool: Converter: Added support for tensor dtype for TF fill op translation
  • NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
  • Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
  • Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers
  • BUG FIXES
  • NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
  • NPE GPU : Added optimized kernel for ReduceMean Operation
  • Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
  • NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
  • NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
  • NPE AIP: Optimized the input conversion for the models involving padding along width dimension

What's in Qualcomm Neural Processing SDK v1.51.0?

    NEW FEATURES
  • Converter Tool: Added support for Onnx WhereOp
  • Added support for edge padding type for pad operation in GPU runtime
  • Neural Processing Engine DSP: Enabled support for ElementWiseUnary abs layer on HTP
  • GPU Runtime: Added support for asymmetric reflect padding for pad operation
  • UDO: Allow users to specify a different datatype for each core in single config file
  • UDO: HTML documentation & sample app is updated to provide example for loading UDO package
  • BUG FIXES
  • DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios
  • Neural Processing Engine: Init stage is optimized to be done faster
  • Neural Processing Engine DSP: Optimized maxpool with stride 2x1 on HTP
  • Neural Processing Engine DSP: Optimized the big sized concat ops to fit into memory
  • Neural Processing Engine DSP: Optimized the init on HTP
  • Neural Processing Engine DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs
  • Neural Processing Engine DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized
  • Fixed issues related to HMX hysteresis management on HTP which include correct timer expiry handling, and deadlock avoidance when both hysteresis timeout and de-init happens around the same time

What's in Qualcomm Neural Processing SDK v1.50.0?

    NEW FEATURES
  • Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification
  • DSP Runtime: Improved instance norm op accuracy when input size is big
  • DSP Runtime: Enabled edge padding support for v65/v66 targets
  • BUG FIXES
  • Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly
  • Fixed issues with offline prepare of DeepLabV3 for SOCs with 2MB VTCM
  • Tool: Tensorflow Converter: Resolves issue where TF Mul was not being translated correctly
  • Fixed issue in HTP prepare with certain combinations of Conv followed by other layers
  • Improved Convolution performance on HTP when horizontal and vertical stride are not equal
  • Improved accuracy of Instance Norm on DSP
  • Fixed DSP clock drop issue by adding clock vote hysteresis support
  • Fixed issue with quantization of ArgMax layers of certain input type
  • Bug fixed that caused failure when running an INT8 network followed by an INT16 network in DSP runtime
  • Tool: Tensorflow Converter: fixed issue with constant coeff input to multiple Prelu layers
  • Enhanced split logic for ConvLayer for certain input type
  • Fixed issue with elementwise add for certain input type in DSP runtime
  • Fixed issue in HTP prepare with certain combinations of addsub_op followed by other layers
  • Resolved performance issue of multiple concurrent executions using common HW resource in DSP runtime
  • Fixed HTP prepare issue with MobileBERT

What's in Qualcomm Neural Processing SDK v1.49.0?

  • Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
  • Enabled Matmul support in SNPE HTP
  • ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

  • IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
  • Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
  • Switched diagnostic logging (SNPEDiag.log) to a new file format
  • Removed use of and dependency on OpenMP in CPU runtime
  • Resolved issues with layer dump feature using --debug flag on HTP
  • Optimized performance of the PreLU layer on HTP
  • Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
  • Improve instance norm op accuracy when input size is big in DSP runtime
  • Added support for using Unsigned PD in AIP runtime
  • Added support for the MobileDetEdgeTPU SSD model
  • BUG FIXES
  • Added support for models with UDO in the Android Sample App
  • Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
  • Fixed support for processing tf.layers.Dense in TF Converter
  • Fixed the issues with UDO fallback to CPU on HTP
  • Fixed a shape issue with certain structures including FC in Onnx Converter
  • Fix Unpack Layer indexing error on HTP
  • Fix overflow issue in instance norm op when variance is too small in DSP runtime
  • Optimized input node followed by concat on HTP
  • Added session reset logic to handle the case when DSP goes to SSR
  • Improved the performance of 7x7 Depthwise Conv2D op on HTP
  • Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

  • Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
  • Added support for TF 1.15 NonMaxSuppressionV3 translation
  • Added necessary restriction when optimizing graph having matmul in DSP runtime
  • Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
  • Added scale layer in offline prepare in DSP runtime
  • Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
  • Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
  • Added opset 11 support for the ConvTranspose op in Onnx converter
  • BUG FIXES
  • Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
  • Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
  • Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
  • Fixed 16bit dequantization issue when output data length does not align to 128

What's in Qualcomm Neural Processing SDK v1.46.0?

  • Optimized argmax op l2 cache prefetch in DSP runtime
  • BUG FIXES
  • Fixed issue of Lrn_d32 op fails for window size 1 in DSP runtime
  • Fixed issue of InputSupernode Fails in an edge case in DSP runtime

What's in Qualcomm Neural Processing SDK v1.45.3?

  • Accuracy fixes for various Layers on HTP
  • Init/De-init time improvements
  • Inference Performance Improvements

What's in Qualcomm Neural Processing SDK v1.43.0?

  • Improved the input/output conversion times for models having depth as 4 on AIP runtime
  • Enabled initial support for constant layers along with elementwise Op on HTA
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Added support for Caffe's "Clip" layer in the caffe converter
  • Added int16 example to snpe-sample app
  • BUG FIXES
  • Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
  • Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
  • Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
  • Fixed the issue with generation of HTA enabled dlc for denoise model
  • Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
  • Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.