What's in Qualcomm Neural Processing SDK v1.41.0?
- Added MatMul support on the CPU runtime
- Added support for new version of 7250 with integrated PMIC module
- User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively
What's in Qualcomm Neural Processing SDK v1.40.0?
- Added DSP Graph Caching support for AIP models with HVX subnets
- Upgraded DSP to use Hexagon SDK 3.5.2 toolchain
- Added support for 16bit UDO layers in DSP
- Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size BUG FIXES
- Fixed the issue with buffer ordering during the execution of batched models on AIP runtime
- Fixed issue with SsdDetectionOut when number of classes is only 1
- Fixed accuracy issue with Correlation 1D op
- Fixed improper processing when 16bit input quantization is used in certain cases
- Fixed scaling logic in convert_16 op
What's in Qualcomm Neural Processing SDK v1.39.1?
- Update to v1.39.0 to address performance regression of Mobilenet SSD model on AIP runtime
What's in Qualcomm Neural Processing SDK v1.39.0?
- Added graph caching support which improves init times for DSP & AIP networks. (DSP subnet with in AIP is not supported)
- Optimized Prelu to reduce saturation loss during re-quantization at prelu by using cubic approximation
- Added additional logging messages for debugging in DSP runtime BUG FIXES
- Fixed the issue with setting the performance profile for AIP runtime in multithreading scenarios
- Fixed issue with incorrect dlc generation problem when multiple instances of snpe-dlc-quantize running in parallel for AIP runtime
- Fixed potential bug with freeing threads in DSP runtime
- Fixed issue of incorrect UDO tensor datatype in quantizer
What's in Qualcomm Neural Processing SDK v1.38.0?
- Enabled FC/MatMul to use VTCM if available in DSP.
- Optimized 16-bit MeanVarianceNormalize in DSP runtime.
- Added support for batchwise scalar divide operation in DSP runtime.
- Optimized Hard-swish operator for mobilenetV3.
- Added support for EltwiseMin layer for ONNX converter and CPU runtime.
- Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
- Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
- ONNX softmax converter support only existed for rank
- Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices. BUG FIXES
- Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
- Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
- Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
- Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
- Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
- Fixed a bug during graph transformation while folding the batchnorm on HTA.
- Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.
What's in Qualcomm Neural Processing SDK v1.37.0?
- Enabled the online compiler support for HTA 1.x family of devices
- AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity
- ONNX Converter: Added support for Onnx Pad layer (OpVer 11)
- Bug fix. Snpe-dlc-ino: Fixed issue in MACs calculation error for deconvolution layer
What's in Qualcomm Neural Processing SDK v1.36.0?
- Added Java API extension to register UDO package with SNPE
- snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
- Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
- Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
- Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
- Extend Platform Validator to detect HTA API versio
- Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
- Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
- Fixed bug with propagation of model version during conversion
- Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
- Fixed the issue with allocation of layer descriptor while loading the network on HTA
What's in Qualcomm Neural Processing SDK v1.35.0?
- Introduce the User-Defined Operations (UDO) feature
- Added support for SDM720G/SM7125
- Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16)
- Input batching support is added for networks that can run completely on AIP runtime
- Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes
- Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter
- Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so
- Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides
- Fix ssd_detection CDSP crash on DSP runtime
- Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition
- Improved the tiling configuration support for depth wise convolution layer
What's in Qualcomm Neural Processing SDK v1.34.0?
- Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
- New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
- Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
- Add support for the depth2space operation on the GPU.
- Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
- Truncate detection output on DSP to return valid data only.
- Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
- Fix support for NV21 encoding in the DSP runtime.