Release Notes

What's in Qualcomm Neural Processing SDK v1.43.0?

  • Improved the input/output conversion times for models having depth as 4 on AIP runtime
  • Enabled initial support for constant layers along with elementwise Op on HTA
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Added support for Caffe's "Clip" layer in the caffe converter
  • Added int16 example to snpe-sample app
  • BUG FIXES
  • Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
  • Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
  • Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
  • Fixed the issue with generation of HTA enabled dlc for denoise model
  • Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
  • Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.

What's in Qualcomm Neural Processing SDK v1.42.0?

  • Removed V60 DSP libs from SNPE SDK
  • Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler
  • Enabled multithread for re-quantize process in DSP runtime
  • Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime
  • BUG FIXES
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Fixed bug in UserBufferTF8 where retrieving the encoding would always return null
  • Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime
  • Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime

What's in Qualcomm Neural Processing SDK v1.41.0?

  • Added MatMul support on the CPU runtime
  • Added support for new version of 7250 with integrated PMIC module
  • User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively

What's in Qualcomm Neural Processing SDK v1.40.0?

  • Added DSP Graph Caching support for AIP models with HVX subnets
  • Upgraded DSP to use Hexagon SDK 3.5.2 toolchain
  • Added support for 16bit UDO layers in DSP
  • Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size
  • BUG FIXES
  • Fixed the issue with buffer ordering during the execution of batched models on AIP runtime
  • Fixed issue with SsdDetectionOut when number of classes is only 1
  • Fixed accuracy issue with Correlation 1D op
  • Fixed improper processing when 16bit input quantization is used in certain cases
  • Fixed scaling logic in convert_16 op

What's in Qualcomm Neural Processing SDK v1.39.1?

  • Update to v1.39.0 to address performance regression of Mobilenet SSD model on AIP runtime

What's in Qualcomm Neural Processing SDK v1.39.0?

  • Added graph caching support which improves init times for DSP & AIP networks. (DSP subnet with in AIP is not supported)
  • Optimized Prelu to reduce saturation loss during re-quantization at prelu by using cubic approximation
  • Added additional logging messages for debugging in DSP runtime
  • BUG FIXES
  • Fixed the issue with setting the performance profile for AIP runtime in multithreading scenarios
  • Fixed issue with incorrect dlc generation problem when multiple instances of snpe-dlc-quantize running in parallel for AIP runtime
  • Fixed potential bug with freeing threads in DSP runtime
  • Fixed issue of incorrect UDO tensor datatype in quantizer

What's in Qualcomm Neural Processing SDK v1.38.0?

  • Enabled FC/MatMul to use VTCM if available in DSP.
  • Optimized 16-bit MeanVarianceNormalize in DSP runtime.
  • Added support for batchwise scalar divide operation in DSP runtime.
  • Optimized Hard-swish operator for mobilenetV3.
  • Added support for EltwiseMin layer for ONNX converter and CPU runtime.
  • Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
  • Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
  • ONNX softmax converter support only existed for rank
  • Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices.
  • BUG FIXES
  • Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
  • Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
  • Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
  • Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
  • Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
  • Fixed a bug during graph transformation while folding the batchnorm on HTA.
  • Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.

What's in Qualcomm Neural Processing SDK v1.37.0?

  • Enabled the online compiler support for HTA 1.x family of devices
  • AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity
  • ONNX Converter: Added support for Onnx Pad layer (OpVer 11)
  • Bug fix. Snpe-dlc-ino: Fixed issue in MACs calculation error for deconvolution layer

What's in Qualcomm Neural Processing SDK v1.36.0?

  • Added Java API extension to register UDO package with SNPE
  • snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
  • Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
  • Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
  • Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
  • Extend Platform Validator to detect HTA API versio
  • Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
  • Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
  • Fixed bug with propagation of model version during conversion
  • Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
  • Fixed the issue with allocation of layer descriptor while loading the network on HTA