Release Notes

What's in Qualcomm Neural Processing SDK v1.38.0?

  • Enabled FC/MatMul to use VTCM if available in DSP.
  • Optimized 16-bit MeanVarianceNormalize in DSP runtime.
  • Added support for batchwise scalar divide operation in DSP runtime.
  • Optimized Hard-swish operator for mobilenetV3.
  • Added support for EltwiseMin layer for ONNX converter and CPU runtime.
  • Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
  • Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
  • ONNX softmax converter support only existed for rank
  • Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices.
  • BUG FIXES
  • Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
  • Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
  • Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
  • Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
  • Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
  • Fixed a bug during graph transformation while folding the batchnorm on HTA.
  • Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.

What's in Qualcomm Neural Processing SDK v1.37.0?

  • Enabled the online compiler support for HTA 1.x family of devices
  • AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity
  • ONNX Converter: Added support for Onnx Pad layer (OpVer 11)
  • Bug fix. Snpe-dlc-ino: Fixed issue in MACs calculation error for deconvolution layer

What's in Qualcomm Neural Processing SDK v1.36.0?

  • Added Java API extension to register UDO package with SNPE
  • snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
  • Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
  • Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
  • Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
  • Extend Platform Validator to detect HTA API versio
  • Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
  • Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
  • Fixed bug with propagation of model version during conversion
  • Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
  • Fixed the issue with allocation of layer descriptor while loading the network on HTA

What's in Qualcomm Neural Processing SDK v1.35.0?

  • Introduce the User-Defined Operations (UDO) feature
  • Added support for SDM720G/SM7125
  • Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16)
  • Input batching support is added for networks that can run completely on AIP runtime
  • Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes
  • Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter
  • Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so
  • Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides
  • Fix ssd_detection CDSP crash on DSP runtime
  • Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition
  • Improved the tiling configuration support for depth wise convolution layer

What's in Qualcomm Neural Processing SDK v1.34.0?

  • Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
  • New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
  • Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
  • Add support for the depth2space operation on the GPU.
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
  • Truncate detection output on DSP to return valid data only.
  • Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
  • Fix support for NV21 encoding in the DSP runtime.

What's in Qualcomm Neural Processing SDK v1.33.2?

  • Address accuracy issues for Deconvolution in the AIP runtime
  • Changed behavior of Crop layer resize, so it retains the number of copied elements on each dimension
  • Make quantizer --override_params work for AIP
  • Reordered PerformanceProfile_t to be ABI compatible with 1.32.0
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements

What's in Qualcomm Neural Processing SDK v1.33.1?

  • New performance modes have been added:
  • LOW_POWER_SAVER: Run in lower clock than POWER_SAVER, at the expense of performance
  • HIGH_POWER_SAVER: Run in higher clock and provides better performance than POWER_SAVER
  • LOW_BALANCED: Run in lower balanced mode, provides lower performance than BALANCED
  • snpe-dlc-info adds a summary of the layer types in use in the model
  • Updated to use new BLAS functionality that leverages OpenMP. This adds a new dependency on the OpenMP shared library for Linux platforms
  • Added 32-bit bias support
  • Support init caching for SSD output layer on DSP
  • Bugs:
  • Fix memory leak causing increasing init time for DSP
  • Add converter support for dilated convolution when used with fakequant nodes
  • Multiple bugs fixed in snpe-onnx-to-dlc that were causing errors for models having torch.Mul op
  • Extends TF converter support to NMSv1 Op in addition to existing support for v2 and v3 NMS Ops
  • Tensorflow conversion bug fixed in infer_shape for StridedSlice Op. output_shape should not be a list of shapes but the shape of the one output
  • Fix bug with propagation of model version during conversion
  • If burst mode is set, set thread affinity to Big Cores during init and de-init, and restore to the previous setting after the actions are complete
  • Fix segfault when using user buffers with a resizable dimension

What's in Qualcomm Neural Processing SDK v1.32?

  • Add Caffe MVN Layer support in the Caffe Converter, CPU Runtime, and DSP Runtime
  • snpe-dlc-quantize: Enable the use of quantization parameters calculated during training when using dlc quantizer. To override the SNPE generated quantization parameters simply pass -- override_params to snpe-dlc-quantize.
  • Removed deprecated command line arguments from converters. All three converters now require passing -i/--input_network for model input paths. Help menus are updated for each converter
  • snpe-dlc-diff: Added command-line option [--diff_by_id/-i] to snpe-dlc-diff. This option allows users to compare 2 models in order(sorted by id); as oppose to only diffing common layers
  • Added support for L2Norm layer to TensorFlow converter
  • Optimized the DSP performance for the 'Space To Depth' layer
  • Add support in the Java API for setInitCacheEnabled(), and setStorageDirectory() to enable DLC caching support.
  • Allow graceful recovery after a fastrpc error - Recreate the userPD after the cDSP crashes so that the user can continue on the SNPE process with subsequent instances, instead of having to close the SNPE process. Note: all the instance associated to the previous userPD will be lost.
  • snpe-dlc-viewer: Associate each layer type to a fixed color for consistency when using snpe-dlc-viewer
  • Split the SNPE isRuntimeAvailable method into two separate functions to improve backward compatibility with existing client binaries that were built against the older signature.
  • Bugs:
  • TF Converter: Fix Elementwise Broadcast support
  • ONNX Converter: Fixed bug where output dimension was incorrect when keep_dims parameter was set to False for Argmax, ReduceSum and ReduceMax.
  • ONNX Converter: Fixed bug where pad attribute was not properly parsed for Deconv Op.
  • Caffe Converter: Fixed bug when converting SSD-based models when using Python 3.
  • TF Converter: Fixed bug where converter was removing const Op input to reshape op when passed through identity op(s). i.e const-> identity -> reshape.
  • Fixed bug where getOutputSize() would give the wrong result on output tensors in UserBuffer mode