Release Notes

What's in Qualcomm Neural Processing SDK v1.52.0?

    NEW FEATURES
  • Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
  • Tool: Converter: Added Converter support for Nd Reshape layer
  • Tool: Converter: Added CastOp support for TF
  • Tool: Converter: Added support for static subgraph resolutions at conversion time
  • Tool: Converter: Added support for tensor dtype for TF fill op translation
  • NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
  • Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
  • Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers
  • BUG FIXES
  • NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
  • NPE GPU : Added optimized kernel for ReduceMean Operation
  • Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
  • NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
  • NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
  • NPE AIP: Optimized the input conversion for the models involving padding along width dimension

What's in Qualcomm Neural Processing SDK v1.51.0?

    NEW FEATURES
  • Converter Tool: Added support for Onnx WhereOp
  • Added support for edge padding type for pad operation in GPU runtime
  • Neural Processing Engine DSP: Enabled support for ElementWiseUnary abs layer on HTP
  • GPU Runtime: Added support for asymmetric reflect padding for pad operation
  • UDO: Allow users to specify a different datatype for each core in single config file
  • UDO: HTML documentation & sample app is updated to provide example for loading UDO package
  • BUG FIXES
  • DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios
  • Neural Processing Engine: Init stage is optimized to be done faster
  • Neural Processing Engine DSP: Optimized maxpool with stride 2x1 on HTP
  • Neural Processing Engine DSP: Optimized the big sized concat ops to fit into memory
  • Neural Processing Engine DSP: Optimized the init on HTP
  • Neural Processing Engine DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs
  • Neural Processing Engine DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized
  • Fixed issues related to HMX hysteresis management on HTP which include correct timer expiry handling, and deadlock avoidance when both hysteresis timeout and de-init happens around the same time

What's in Qualcomm Neural Processing SDK v1.50.0?

    NEW FEATURES
  • Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification
  • DSP Runtime: Improved instance norm op accuracy when input size is big
  • DSP Runtime: Enabled edge padding support for v65/v66 targets
  • BUG FIXES
  • Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly
  • Fixed issues with offline prepare of DeepLabV3 for SOCs with 2MB VTCM
  • Tool: Tensorflow Converter: Resolves issue where TF Mul was not being translated correctly
  • Fixed issue in HTP prepare with certain combinations of Conv followed by other layers
  • Improved Convolution performance on HTP when horizontal and vertical stride are not equal
  • Improved accuracy of Instance Norm on DSP
  • Fixed DSP clock drop issue by adding clock vote hysteresis support
  • Fixed issue with quantization of ArgMax layers of certain input type
  • Bug fixed that caused failure when running an INT8 network followed by an INT16 network in DSP runtime
  • Tool: Tensorflow Converter: fixed issue with constant coeff input to multiple Prelu layers
  • Enhanced split logic for ConvLayer for certain input type
  • Fixed issue with elementwise add for certain input type in DSP runtime
  • Fixed issue in HTP prepare with certain combinations of addsub_op followed by other layers
  • Resolved performance issue of multiple concurrent executions using common HW resource in DSP runtime
  • Fixed HTP prepare issue with MobileBERT

What's in Qualcomm Neural Processing SDK v1.49.0?

  • Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
  • Enabled Matmul support in SNPE HTP
  • ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

  • IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
  • Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
  • Switched diagnostic logging (SNPEDiag.log) to a new file format
  • Removed use of and dependency on OpenMP in CPU runtime
  • Resolved issues with layer dump feature using --debug flag on HTP
  • Optimized performance of the PreLU layer on HTP
  • Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
  • Improve instance norm op accuracy when input size is big in DSP runtime
  • Added support for using Unsigned PD in AIP runtime
  • Added support for the MobileDetEdgeTPU SSD model
  • BUG FIXES
  • Added support for models with UDO in the Android Sample App
  • Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
  • Fixed support for processing tf.layers.Dense in TF Converter
  • Fixed the issues with UDO fallback to CPU on HTP
  • Fixed a shape issue with certain structures including FC in Onnx Converter
  • Fix Unpack Layer indexing error on HTP
  • Fix overflow issue in instance norm op when variance is too small in DSP runtime
  • Optimized input node followed by concat on HTP
  • Added session reset logic to handle the case when DSP goes to SSR
  • Improved the performance of 7x7 Depthwise Conv2D op on HTP
  • Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

  • Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
  • Added support for TF 1.15 NonMaxSuppressionV3 translation
  • Added necessary restriction when optimizing graph having matmul in DSP runtime
  • Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
  • Added scale layer in offline prepare in DSP runtime
  • Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
  • Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
  • Added opset 11 support for the ConvTranspose op in Onnx converter
  • BUG FIXES
  • Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
  • Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
  • Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
  • Fixed 16bit dequantization issue when output data length does not align to 128

What's in Qualcomm Neural Processing SDK v1.46.0?

  • Optimized argmax op l2 cache prefetch in DSP runtime
  • BUG FIXES
  • Fixed issue of Lrn_d32 op fails for window size 1 in DSP runtime
  • Fixed issue of InputSupernode Fails in an edge case in DSP runtime

What's in Qualcomm Neural Processing SDK v1.45.3?

  • Accuracy fixes for various Layers on HTP
  • Init/De-init time improvements
  • Inference Performance Improvements

What's in Qualcomm Neural Processing SDK v1.43.0?

  • Improved the input/output conversion times for models having depth as 4 on AIP runtime
  • Enabled initial support for constant layers along with elementwise Op on HTA
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Added support for Caffe's "Clip" layer in the caffe converter
  • Added int16 example to snpe-sample app
  • BUG FIXES
  • Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
  • Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
  • Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
  • Fixed the issue with generation of HTA enabled dlc for denoise model
  • Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
  • Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.

What's in Qualcomm Neural Processing SDK v1.42.0?

  • Removed V60 DSP libs from SNPE SDK
  • Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler
  • Enabled multithread for re-quantize process in DSP runtime
  • Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime
  • BUG FIXES
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Fixed bug in UserBufferTF8 where retrieving the encoding would always return null
  • Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime
  • Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime