Release Notes

What's in Qualcomm Neural Processing SDK v2.07.0?

BUG FIXES
  • Tools: Converters: Fixed a bug in the optimization that merges Matmul + Reshape + Add to FC Op that would incorrectly insert the FC Op before the Constant Bias Op
  • DSP Runtime: Fix uninitialized variable for perf setting on HTP
Known Issues:
  • DSP Runtime: LSTM is not supported for HTP. It will be addressed in a future release.
  • SNPE AIP: an Error log is printed on Ubuntu & LE platforms, which can be ignored
  • GPU Runtime: Some networks show accuracy issues on SM8550 due to bug in OpenCL, which requires fix from META build
  • TF Converter: Some models with noops will fail to convert
  • DSP Runtime: Some resnet models fail to prepare for 2MB VTCM
  • DSP Runtime: Observing some performance regressions on HTP FP16 for Elementwise Add and Mul when they don't fit in VTCM
  • DSP Runtime: Observing accuracy drop on some models on V66 compared to SNPE1
  • Converters: Models containing LSTMs with multiple timesteps generate large DLCs. This will be addressed in a future release.
  • DSP Runtime: Performance profiles do not currently work properly on V66, leading to performance issues. This will be addressed in a future release.
  • AIP runtime: some models show performance regressions for init, de-init and inference
  • AIP Runtime: EfficientDet Lite has an accuracy issue when running on the AIP runtime
  • SDK Documentation: Images are missing for the new quantization and architecture checker documentation
  • ONNX Converter: In some cases, an Add following a MatMul is not properly fused which can lead to accuracy issues. This will be addressed in a future release.
  • Core: In some cases internal exceptions are not properly caught after an SSR
  • Core: There are some memory leaks in snpe-net-run and the SNPE Builder. These will be addressed in a future release.
  • GPU Runtime: A change to the ONNX converter for Softmax will update dimensions so that they are not currently supported by the GPU runtime. A fix will be made in a future release.
  • Core: The C API call for Snpe_SNPE_GetModelVersion always returns the version for the first model that was loaded in a process
  • DSP Runtime: Squeezenet currently has accuracy issues on HTP
  • DSP Runtime: YOLO networks show some accuracy issues for HTP
  • AIP: Performance modes are not working. All networks run with Burst mode.

What's in Qualcomm Neural Processing SDK v2.05?

IMPORTANT NOTES
  • SNPE now defaults to Unsigned PD, and the delivered skels are not signed.
  • To use signed PD, please sign the skels, and enable the use of signed PD in the platform config options.
NEW FEATURES
  • Tools: Added new options for snpe-net-run and snpe-parallel-run --use_native_input_files and --use_native_output_files to support inputs in their native format as opposed to default float32 format.
  • Tools: Added new flag --userbuffer_auto in snpe-parallel-run to automatically detect and use the right buffer type based on tensor data type in the model.
  • Documentation: SNPE1 to SNPE2 migration guide is added.
BUG FIXES
  • Tools: snpe-throughput-net-run - capturing the status of lost thread in the result summary.
  • Tools: snpe-dlc-quant: Fixed abnormal DLC size increase when axis quantization is used.
  • Tool: Tensorflow Converter: Fixed issues with per-channel quantization of weights: set is_symmetric = true by default, added param "axis" and "is_symmetric" into weight encodings info.
  • HTP: solve vtcm overflow for transposeconv2d layer whose groups > 1, in depth= out depth, padding =0 and groups != in depth.
Known Issues:
  • DSP Runtime: Int4 models can see higher accuracy degradation than expected.
  • DSP Runtime: Observing accuracy issues on HTP for some networks using FP16.
  • Tools: Quantizer: The bc algorithm is not currently functional.
  • GPU Runtime: Some networks show accuracy issues due to bug in OpenCL, requires fix from META build.
  • Tools: Platform Validator will hang on some newer metabuilds. This is still being investigated with the platform team.
  • TF Converter: Some models with noops will fail to convert.
  • AIP runtime: --debug option in snpe-net-run is not functional.
  • AIP runitme: init_cache is not working.
  • AIP runtime: some models show inference regressions.
  • AIP runtime: some models show accuracy drop.
  • AIP runtime: some models with partitions between HTP and DSP fail in model initialization.
  • ONNX Converter: Quantization failure when there are multiple MatMul Ops with different weights dimension, and connected to a common input, the bias op (constant) is shared among them and therefore doesn't reflect the correct shape for each of them.
  • DSP Runtime: On HTP, when using lower performance profiles, de-init may take more time than in previous releases. This is because the DSP clock was artificially high in the earlier releases.
  • DSP Runtime: LSTM is not supported for HTP. It will be addressed in an upcoming release.
  • DSP Runtime: Squeezenet currently has accuracy issues on HTP.
  • DSP Runtime: YOLO networks show some accuracy issues for HTP.
  • DSP Runtime: Some models show inference regressions for HTP FP16.
  • Tools: Offline Prepare: Some models show issues related to offline prepare.
  • DSP Runtime: Some models are showing accuracy issues on HTP.
  • GPU Runtime: Some networks are showing accuracy issues.
  • Converters: TFLite Converter: Converter incorrectly handles weights for TransposeConv2D ops.
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release.
  • GPU Runtime: Softmax layer doesn't support large tensor value in channel dimension.

What's in Qualcomm Neural Processing SDK v1.68.0?

NEW FEATURES
  • This release uses Android NDK 19c for building the Android code
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime.
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare.
New Limitations
  • Core: Relaxed validation criteria for constant tensor for GPU backend.
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115, SM4250, SM6225, QRB5165, QCS610LE, QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants.
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release.
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release.
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325.
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion.
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release.
  • SNPE Core: Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release.

What's in Qualcomm Neural Processing SDK v1.67.0?

NEW FEATURES
  • This release uses Android NDK 19c for building the Android code
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime.
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare.
Known Issues:
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants.
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release.
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release.
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325.
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion.
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release.
  • SNPE Core: Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release.

What's in Qualcomm Neural Processing SDK v1.66.0?

IMPORTANT INFORMATION
  • This release uses Android NDK 19c for building the Android code
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime.
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare.
NEW FEATURES
  • Core: Added protection for loading malicious dlc file
Known Issues:
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115, SM4250, SM6225, QRB5165, QCS610LE, QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants.
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release.
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release.
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325.
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release.
  • SNPE Core: Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release.

What's in Qualcomm Neural Processing SDK v1.65.0?

NEW FEATURES
  • This release uses Android NDK 19c for building the Android code
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare
  • Core: Re-Enable LSTM support for CPU, GPU (HTP will follow)
  • DSP Runtime: Implemented rules for coexistence and selection of multiple cache records for HTP based on VTCM size, DSP Architecture, and SoC
  • Tool: Converter: Added optimization to fold scalar min + max to ReluMinMax
BUG FIXES
  • Tools: Offline Prepare: Fixed some issues for offline prepare for Depthwise Convolution with Dilation
Known Issues:
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115, SM4250, SM6225, QRB5165, QCS610LE, QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core: Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.64.0?

NEW FEATURES
  • This release uses Android NDK 19c for building the Android code
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare
  • Onnx Converter : Reenabled converter command line input dtype to take precedence over model specified
  • GPU: Improved accuracy in deepsort model, Resolved issues with Conv + elu op fusion
BUG FIXES
  • Quantizer: Fixed issue observed with applying 8-bit overrides using 16-bit default activation quantization encodings
  • SNPE Core: Fixed failure to select HTP offline cache for certain multi-subnet network topologies
  • Tools: Transform sub to addsub even when input2 is exactly input1
Known Issues:
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core: LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core: Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.63.0?

Important Information
  • This release uses Android NDK 19c for building the Android code
  • Previously supported LE platforms, that were not supported in 1.62.0, are now reenabled in 1.63.0
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare
  • When using the isRuntimeAvailable API, the same process domain must be used when calling the SNPEBuilder, with SNPE DSP runtime for HTP
NEW FEATURES
  • SNPE Core: Support PRELU bias broadcasting in SNPE
  • SNPE Core : snpe-diagview tool updated to display actual units (like cycles) instead of usec by default
  • SNPE Core: Open GL buffers supported for GPU backend
BUG FIXES
  • SNPE Core : Fixed Zip utility's std::istream index to internal extensible array to be const for every container(DLC) load
Known Issues:
  • GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • GPU Runtime: UDO is currently not supported for the GPU
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in upcoming releases
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core : LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core : Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.62.0?

Important Information
  • This release uses Android NDK 19c for building the Android code
  • This release supports only Android targets, LE targets will return in SNPE 1.63.0
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime.
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare.
NEW FEATURES
  • DSP Runtime: Perf improvement for FP16 models on HTP
  • SNPE Core: Upgrading SNPE's archiving library zlib from version 1.2.11 to version 1.2.12
  • SNPE Core: Validation results now persisted in Offline Cache thus reducing init time for a offline prepared dlc
  • SNPE Core : Relaxed dimension constraints for PRelu layer in SNPE to support broadcasting
  • DSP Runtime: Optimized performance of the Elementwise Div layer for V65 and V66
  • DSP Runtime: Added GatherV2 support
  • Tools: Converters: Added an optimization that merges low-level Ops into Prelu Op
  • Tools: Converters: Added an optimization to squash ReduceL2 and Div Op into L2Norm Op
BUG FIXES
  • Tools: Converters: TF: Fixed issue with translating explicit padding from Conv Op
  • Tools: Converters: Onnx: Fixed Onnx Concat axis
  • Tools: Converters: Onnx: Fixed implementation details for Conv1D and Pool1D Ops
  • Tools: Converters: Onnx: Added optimization folding continuous reshapes
Known Issues:
  • This release supports Android targets only. LE platforms will return in SNPE 1.63.0
  • SNPE GPU Runtime: OpenGL buffer is not supported
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • GPU Runtime: UDO is currently not supported for the GPU
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core : LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core : Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.61.0?

NEW FEATURES
  • Converters: Onnx: Enabled support to handle custom op inputs correctly when the default values are provided
  • ONNX Converter: Added support to resolve static ONNX Cast operation as Constant
  • CPU Runtime: Supported CRD mode for depthtospace(pixelshuffle)
  • ONNX Converter: Fix simplifier behavior with given input dimensions
  • DSP Runtime: Added support for LayerNorm for V65/V66
  • Converters: Added new pattern to fold ReduceL2 + Div as L2Norm
  • Converters: Added support for Relay IR's requantize op that can be seen in framework quantized models
BUG FIXES
  • Core: Improved performance of loading DLC from a memory buffer
  • ONNX Converter: Fixes scale calculation for ONNX Resize Operator for align_corner mode. Also overrides Resize input axis format as per source axis order
  • Caffe Converter: Added support for Caffe Scale where the scale weights are of shape [batch,channels] and axis == 0
  • ONNX Converter: Fixed issues for Axis Tracking related to L2 Norm
  • SDK: Update Sample Code to demonstrate handling multiple ITensor inputs
  • AIP Runtime: Fixed low accuracy issue on mobilenet variant for Multi-class NMS layer
  • ONNX Converters: Added support for combination of Nearest and Half_pixel modes for ResizeOp
Known Issues:
  • SNPE DSP: Observing error if second input to scale layer is having rank equal to 1
  • Higher De-init time is observed on QRB5165 platform with CPU runtime for models like Mobilenet

What's in Qualcomm Neural Processing SDK v1.60.0?

NEW FEATURES
  • Tools: Converter: Added ONNX Gemm transA and transB support
  • Native sample code is updated to take static quantization parameters for quantized input buffers
  • libSNPE.so, libcalculator.so, libplatformValidatorShared.so, libnpe_dsp_domains_v2.so – libraries generated with gcc7.5, gcc8.2 and gcc9.3 toolchain - are now compiled with additional read-only relocation compiler flags
  • Documentation update: User Logging API documentation added in Application Tips section
Bug Fixes
  • HTP: Fixed issue with Cast op usage in certain configurations
  • ONNX Converter: Improvements to handle different input axis layouts
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed
  • Error: Model Validation fails for FC layer with error that there is a mismatch between weights and input dimensions
    • Characteristic: Typically seen with ONNX models where the FC layer (with 4D input A and 2D input B) input follows Reshape layer either immediately or after some trivial eltwise layers
    • Workaround: Insert reshape Op before FC to the input A with shape (orig_4D_shape[0], -1)
  • Error: ONNX models with LSTM layer will have validation error related to input shape or will cause significant drop in accuracy
    • Characteristic: LSTM models that have initial h/c input tensors will generally fail due to this issues
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Error: AssertionError: LSTM h/c input buffer needs to have format NONTRIVIAL, got NFC'
    • Characteristic: Failure seen with bidirectional lstm Layers
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Minor reduction in accuracy for VGG16 is observed