Release Notes

What's in Qualcomm Neural Processing SDK v1.49.0?

  • Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
  • Enabled Matmul support in SNPE HTP
  • ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

  • IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
  • Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
  • Switched diagnostic logging (SNPEDiag.log) to a new file format
  • Removed use of and dependency on OpenMP in CPU runtime
  • Resolved issues with layer dump feature using --debug flag on HTP
  • Optimized performance of the PreLU layer on HTP
  • Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
  • Improve instance norm op accuracy when input size is big in DSP runtime
  • Added support for using Unsigned PD in AIP runtime
  • Added support for the MobileDetEdgeTPU SSD model
  • BUG FIXES
  • Added support for models with UDO in the Android Sample App
  • Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
  • Fixed support for processing tf.layers.Dense in TF Converter
  • Fixed the issues with UDO fallback to CPU on HTP
  • Fixed a shape issue with certain structures including FC in Onnx Converter
  • Fix Unpack Layer indexing error on HTP
  • Fix overflow issue in instance norm op when variance is too small in DSP runtime
  • Optimized input node followed by concat on HTP
  • Added session reset logic to handle the case when DSP goes to SSR
  • Improved the performance of 7x7 Depthwise Conv2D op on HTP
  • Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

  • Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
  • Added support for TF 1.15 NonMaxSuppressionV3 translation
  • Added necessary restriction when optimizing graph having matmul in DSP runtime
  • Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
  • Added scale layer in offline prepare in DSP runtime
  • Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
  • Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
  • Added opset 11 support for the ConvTranspose op in Onnx converter
  • BUG FIXES
  • Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
  • Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
  • Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
  • Fixed 16bit dequantization issue when output data length does not align to 128

What's in Qualcomm Neural Processing SDK v1.46.0?

  • Optimized argmax op l2 cache prefetch in DSP runtime
  • BUG FIXES
  • Fixed issue of Lrn_d32 op fails for window size 1 in DSP runtime
  • Fixed issue of InputSupernode Fails in an edge case in DSP runtime

What's in Qualcomm Neural Processing SDK v1.45.3?

  • Accuracy fixes for various Layers on HTP
  • Init/De-init time improvements
  • Inference Performance Improvements

What's in Qualcomm Neural Processing SDK v1.43.0?

  • Improved the input/output conversion times for models having depth as 4 on AIP runtime
  • Enabled initial support for constant layers along with elementwise Op on HTA
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Added support for Caffe's "Clip" layer in the caffe converter
  • Added int16 example to snpe-sample app
  • BUG FIXES
  • Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
  • Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
  • Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
  • Fixed the issue with generation of HTA enabled dlc for denoise model
  • Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
  • Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.

What's in Qualcomm Neural Processing SDK v1.42.0?

  • Removed V60 DSP libs from SNPE SDK
  • Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler
  • Enabled multithread for re-quantize process in DSP runtime
  • Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime
  • BUG FIXES
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Fixed bug in UserBufferTF8 where retrieving the encoding would always return null
  • Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime
  • Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime

What's in Qualcomm Neural Processing SDK v1.41.0?

  • Added MatMul support on the CPU runtime
  • Added support for new version of 7250 with integrated PMIC module
  • User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively

What's in Qualcomm Neural Processing SDK v1.40.0?

  • Added DSP Graph Caching support for AIP models with HVX subnets
  • Upgraded DSP to use Hexagon SDK 3.5.2 toolchain
  • Added support for 16bit UDO layers in DSP
  • Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size
  • BUG FIXES
  • Fixed the issue with buffer ordering during the execution of batched models on AIP runtime
  • Fixed issue with SsdDetectionOut when number of classes is only 1
  • Fixed accuracy issue with Correlation 1D op
  • Fixed improper processing when 16bit input quantization is used in certain cases
  • Fixed scaling logic in convert_16 op

What's in Qualcomm Neural Processing SDK v1.39.1?

  • Update to v1.39.0 to address performance regression of Mobilenet SSD model on AIP runtime