Release Notes

What's in Qualcomm Neural Processing SDK v1.36.0?

  • Added Java API extension to register UDO package with SNPE
  • snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
  • Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
  • Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
  • Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
  • Extend Platform Validator to detect HTA API versio
  • Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
  • Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
  • Fixed bug with propagation of model version during conversion
  • Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
  • Fixed the issue with allocation of layer descriptor while loading the network on HTA

What's in Qualcomm Neural Processing SDK v1.35.0?

  • Introduce the User-Defined Operations (UDO) feature
  • Added support for SDM720G/SM7125
  • Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16)
  • Input batching support is added for networks that can run completely on AIP runtime
  • Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes
  • Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter
  • Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so
  • Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides
  • Fix ssd_detection CDSP crash on DSP runtime
  • Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition
  • Improved the tiling configuration support for depth wise convolution layer

What's in Qualcomm Neural Processing SDK v1.34.0?

  • Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
  • New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
  • Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
  • Add support for the depth2space operation on the GPU.
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
  • Truncate detection output on DSP to return valid data only.
  • Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
  • Fix support for NV21 encoding in the DSP runtime.

What's in Qualcomm Neural Processing SDK v1.33.2?

  • Address accuracy issues for Deconvolution in the AIP runtime
  • Changed behavior of Crop layer resize, so it retains the number of copied elements on each dimension
  • Make quantizer --override_params work for AIP
  • Reordered PerformanceProfile_t to be ABI compatible with 1.32.0
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements

What's in Qualcomm Neural Processing SDK v1.33.1?

  • New performance modes have been added:
  • LOW_POWER_SAVER: Run in lower clock than POWER_SAVER, at the expense of performance
  • HIGH_POWER_SAVER: Run in higher clock and provides better performance than POWER_SAVER
  • LOW_BALANCED: Run in lower balanced mode, provides lower performance than BALANCED
  • snpe-dlc-info adds a summary of the layer types in use in the model
  • Updated to use new BLAS functionality that leverages OpenMP. This adds a new dependency on the OpenMP shared library for Linux platforms
  • Added 32-bit bias support
  • Support init caching for SSD output layer on DSP
  • Bugs:
  • Fix memory leak causing increasing init time for DSP
  • Add converter support for dilated convolution when used with fakequant nodes
  • Multiple bugs fixed in snpe-onnx-to-dlc that were causing errors for models having torch.Mul op
  • Extends TF converter support to NMSv1 Op in addition to existing support for v2 and v3 NMS Ops
  • Tensorflow conversion bug fixed in infer_shape for StridedSlice Op. output_shape should not be a list of shapes but the shape of the one output
  • Fix bug with propagation of model version during conversion
  • If burst mode is set, set thread affinity to Big Cores during init and de-init, and restore to the previous setting after the actions are complete
  • Fix segfault when using user buffers with a resizable dimension

What's in Qualcomm Neural Processing SDK v1.32?

  • Add Caffe MVN Layer support in the Caffe Converter, CPU Runtime, and DSP Runtime
  • snpe-dlc-quantize: Enable the use of quantization parameters calculated during training when using dlc quantizer. To override the SNPE generated quantization parameters simply pass -- override_params to snpe-dlc-quantize.
  • Removed deprecated command line arguments from converters. All three converters now require passing -i/--input_network for model input paths. Help menus are updated for each converter
  • snpe-dlc-diff: Added command-line option [--diff_by_id/-i] to snpe-dlc-diff. This option allows users to compare 2 models in order(sorted by id); as oppose to only diffing common layers
  • Added support for L2Norm layer to TensorFlow converter
  • Optimized the DSP performance for the 'Space To Depth' layer
  • Add support in the Java API for setInitCacheEnabled(), and setStorageDirectory() to enable DLC caching support.
  • Allow graceful recovery after a fastrpc error - Recreate the userPD after the cDSP crashes so that the user can continue on the SNPE process with subsequent instances, instead of having to close the SNPE process. Note: all the instance associated to the previous userPD will be lost.
  • snpe-dlc-viewer: Associate each layer type to a fixed color for consistency when using snpe-dlc-viewer
  • Split the SNPE isRuntimeAvailable method into two separate functions to improve backward compatibility with existing client binaries that were built against the older signature.
  • Bugs:
  • TF Converter: Fix Elementwise Broadcast support
  • ONNX Converter: Fixed bug where output dimension was incorrect when keep_dims parameter was set to False for Argmax, ReduceSum and ReduceMax.
  • ONNX Converter: Fixed bug where pad attribute was not properly parsed for Deconv Op.
  • Caffe Converter: Fixed bug when converting SSD-based models when using Python 3.
  • TF Converter: Fixed bug where converter was removing const Op input to reshape op when passed through identity op(s). i.e const-> identity -> reshape.
  • Fixed bug where getOutputSize() would give the wrong result on output tensors in UserBuffer mode

What's in Qualcomm Neural Processing SDK v1.31?

  • New patterns were added to enable running the CLE algorithm on more op patterns and model architectures
  • Added support for HeatmapMaxKeypoint and the ROI Align layer in the CPU runtime
  • Added initial L2Norm layer support in CPU runtime. No support for axis parameter yet: normalization is performed along the inner-most dimension of the input tensor
  • Support for single-input Concatenation layers was added to CPU, GPU and DSP
  • Added support for Detection Output layer on DSP runtime. Currently, only a batch of 1 is supported
  • Changed determination of number of batch dimensions in the Fully Connected layer so rank greater than 1 is always assumed to mean that there is 1 batch dimension
  • Enhanced dlc-info tool to support runtimes available per layer. Removed constraint on the LSTM layer in the GPU runtime that prevented batch mode operation.
  • Added Tensorflow converter support for Caffe-style SSD networks
  • Added support for Leaky-RELU in the TensorFlow converter. Both the actual Leaky-Relu op and the elementwise op representation are supported and map to SNPE's Prelu op.
  • Added Argmax support to the Caffe converter, and optimized performance on the DSP runtime
  • Added new column to snpe-dlc-info that displays the supported runtimes for each layer. F12 Initial support for per-layer statistics from AIP/HTA subnets

What's in Qualcomm Neural Processing SDK v1.30?

  • Documentation has been added to reflect the new common converter command line options for input processing
  • Converters now propagate required batchnorm information for performing quantization optimizations
  • Support for the new bias correction quantization optimization which adjusts biases by analyzing float vs quantized activation errors and adjusting the model to compensate
  • ONNX converter now filters single input Concats as a no ops as SNPE didn’t support them
  • Converter input processing now uniformly handles different input types and encodings
  • ONNX converter now supports the ConvTranspose ‘output_padding’ attribute by adding an additional pad layer after the ConvTranspose op
  • Integrates the latest flatbuffer 1.11 library which brings speed improvements and options for model size reduction
  • GPU size limitations with the ArgMax op (when setting the keepDims op attribute to false) can be worked around by enabling CPU fallback
  • Fixed DSP error with MobileNet SSD on QCS403 and QCS405
  • Fixed the issue with partitioning of deconv layer in HTA