Release Notes

What's in Qualcomm Neural Processing SDK v1.63.0?

Important Information
  • This release uses Android NDK 19c for building the Android code
  • Previously supported LE platforms, that were not supported in 1.62.0, are now reenabled in 1.63.0
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare
  • When using the isRuntimeAvailable API, the same process domain must be used when calling the SNPEBuilder, with SNPE DSP runtime for HTP
NEW FEATURES
  • SNPE Core: Support PRELU bias broadcasting in SNPE
  • SNPE Core : snpe-diagview tool updated to display actual units (like cycles) instead of usec by default
  • SNPE Core: Open GL buffers supported for GPU backend
BUG FIXES
  • SNPE Core : Fixed Zip utility's std::istream index to internal extensible array to be const for every container(DLC) load
Known Issues:
  • GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • GPU Runtime: UDO is currently not supported for the GPU
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in upcoming releases
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core : LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core : Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.62.0?

Important Information
  • This release uses Android NDK 19c for building the Android code
  • This release supports only Android targets, LE targets will return in SNPE 1.63.0
  • On HTP targets, the mechanism for handling floating point inputs and outputs has changed. For best performance, please specify the --use_float_io argument to the quantizer for offline prepare or the --buffer_data_type argument to both the quantizer and the runtime.
  • The HTP stub and skel artifacts have been renamed to libSnpeHtpV68Stub/Skel and libSnpeHtpV69Stub/Skel. Also, there is a separate libSnpeHtpPrepare.so for performing online prepare.
NEW FEATURES
  • DSP Runtime: Perf improvement for FP16 models on HTP
  • SNPE Core: Upgrading SNPE's archiving library zlib from version 1.2.11 to version 1.2.12
  • SNPE Core: Validation results now persisted in Offline Cache thus reducing init time for a offline prepared dlc
  • SNPE Core : Relaxed dimension constraints for PRelu layer in SNPE to support broadcasting
  • DSP Runtime: Optimized performance of the Elementwise Div layer for V65 and V66
  • DSP Runtime: Added GatherV2 support
  • Tools: Converters: Added an optimization that merges low-level Ops into Prelu Op
  • Tools: Converters: Added an optimization to squash ReduceL2 and Div Op into L2Norm Op
BUG FIXES
  • Tools: Converters: TF: Fixed issue with translating explicit padding from Conv Op
  • Tools: Converters: Onnx: Fixed Onnx Concat axis
  • Tools: Converters: Onnx: Fixed implementation details for Conv1D and Pool1D Ops
  • Tools: Converters: Onnx: Added optimization folding continuous reshapes
Known Issues:
  • This release supports Android targets only. LE platforms will return in SNPE 1.63.0
  • SNPE GPU Runtime: OpenGL buffer is not supported
  • SNPE GPU Runtime: VGG-16 and VGG-19 networks are not supported on SM6115,SM4250,SM6225,QRB5165,QCS610LE,QCS605
  • SNPE GPU Runtime: Some networks are showing minor mAP variations: Inception, Mobilenet, Resnet and VGG variants
  • GPU Runtime: This release shows some performance regressions that will be addressed in the next release
  • GPU Runtime: UDO is currently not supported for the GPU
  • Tools: Android Sample App: UDO support in the Android Sample App is temporarily broken, and will be fixed in the next release
  • DSP Runtime: Observing slight regression in accuracy for mobilenet V2 SSD and inception v3 model on SM8350 and SM7325
  • ONNX Models like DETR with rank 3 inputs to Matmul followed by BiasAdd fail during conversion
  • SNPE Core : LRN - Alpha scaling, Generate Proposals (Caffe2) and CropAndResize layers are not supported in this release
  • SNPE Core : Support for Caffe2 BboxTransform and Caffe2 BoxWithNMSLimit is retired from this release

What's in Qualcomm Neural Processing SDK v1.61.0?

NEW FEATURES
  • Converters: Onnx: Enabled support to handle custom op inputs correctly when the default values are provided
  • ONNX Converter: Added support to resolve static ONNX Cast operation as Constant
  • CPU Runtime: Supported CRD mode for depthtospace(pixelshuffle)
  • ONNX Converter: Fix simplifier behavior with given input dimensions
  • DSP Runtime: Added support for LayerNorm for V65/V66
  • Converters: Added new pattern to fold ReduceL2 + Div as L2Norm
  • Converters: Added support for Relay IR's requantize op that can be seen in framework quantized models
BUG FIXES
  • Core: Improved performance of loading DLC from a memory buffer
  • ONNX Converter: Fixes scale calculation for ONNX Resize Operator for align_corner mode. Also overrides Resize input axis format as per source axis order
  • Caffe Converter: Added support for Caffe Scale where the scale weights are of shape [batch,channels] and axis == 0
  • ONNX Converter: Fixed issues for Axis Tracking related to L2 Norm
  • SDK: Update Sample Code to demonstrate handling multiple ITensor inputs
  • AIP Runtime: Fixed low accuracy issue on mobilenet variant for Multi-class NMS layer
  • ONNX Converters: Added support for combination of Nearest and Half_pixel modes for ResizeOp
Known Issues:
  • SNPE DSP: Observing error if second input to scale layer is having rank equal to 1
  • Higher De-init time is observed on QRB5165 platform with CPU runtime for models like Mobilenet

What's in Qualcomm Neural Processing SDK v1.60.0?

NEW FEATURES
  • Tools: Converter: Added ONNX Gemm transA and transB support
  • Native sample code is updated to take static quantization parameters for quantized input buffers
  • libSNPE.so, libcalculator.so, libplatformValidatorShared.so, libnpe_dsp_domains_v2.so – libraries generated with gcc7.5, gcc8.2 and gcc9.3 toolchain - are now compiled with additional read-only relocation compiler flags
  • Documentation update: User Logging API documentation added in Application Tips section
Bug Fixes
  • HTP: Fixed issue with Cast op usage in certain configurations
  • ONNX Converter: Improvements to handle different input axis layouts
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed
  • Error: Model Validation fails for FC layer with error that there is a mismatch between weights and input dimensions
    • Characteristic: Typically seen with ONNX models where the FC layer (with 4D input A and 2D input B) input follows Reshape layer either immediately or after some trivial eltwise layers
    • Workaround: Insert reshape Op before FC to the input A with shape (orig_4D_shape[0], -1)
  • Error: ONNX models with LSTM layer will have validation error related to input shape or will cause significant drop in accuracy
    • Characteristic: LSTM models that have initial h/c input tensors will generally fail due to this issues
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Error: AssertionError: LSTM h/c input buffer needs to have format NONTRIVIAL, got NFC'
    • Characteristic: Failure seen with bidirectional lstm Layers
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.59.0?

NEW FEATURES
  • DSP Runtime: Added support for edge padding from SNPE side
  • TensorFlow Converter: Added support for the beta and gamma parameters for InstanceNorm
  • ONNX Converter: Limited support for Expand operator when it can be interpreted as a noop from operator attributes
  • ONNX Converter: Added support for ScatterND
  • DSP Runtime: Added graph identifier into minidm logs to enhance debugging
BUG FIXES
  • Quantizer: Fixed duplicate Convert layer Id issue observed in generated DLC when multiple Convert layers feed into a single layer
  • ONNX Converter: Fixed handling of models with inputs of unknown shape
  • UDO: Fixed typo in generation of UDO template cod
  • ONNX Converter: Resolved issue where Shape operator translation could fail if the input was part of the initializer list
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.58.0?

NEW FEATURES
  • Converter: Enabled broadcasting of weights and bias for BatchNorm layer to match channel dimensions
  • DSP: Enabled the support for Elementwise Log and Neg Ops on HTP
  • DSP: Enabled support for all axis values in reduce mean with axis size of 2
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.57.0?

NEW FEATURES
  • ONNX Converter: Added support in dry-run mode to handle reporting ops that are not in ONNX schema domain
  • Converters: Updated inaccurate macs/params calculations for Ops per re-analysis
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD network
  • Converters: Fixed axis tracking bug for permute when input is btf format
  • Converters: Removed obsolete ssd_permute_param parameter in caffe converter permute translation
  • Converters: Added command line argument to override data type for inputs
  • TFlite Converter: Enabled conversion of BERT style models
BUG FIXES
  • Converters: Fixed coefficient input broadcasting issue for ONNX Prelu operation
  • DSP Runtime: Optimized performance for strided_slice operator for V65/V66
  • DSP Runtime: Fixed axis quantization not adding all the fixedPointParam of output to bufferDeltas
  • DSP Runtime: Accuracy Fixes and Improvements for PReLU on V65/V66
  • ONNX Converter: Fixed issue with improper handling of Elementwise Div during conversion
  • DSP Runtime: Improved handling of Pad ops during prepare for HTP
Known Issues:
  • Graph preparation fails when –enable_init_cache option is used in DSP Runtime for HTP

What's in Qualcomm Neural Processing SDK v1.56.2?

NEW FEATURES
  • Clarified the documentation/example for specifying UDO data types
BUG FIXES
  • Converter: Added new layernorm sequence for pattern matching and added a constraint to enforce MatMul layer's constant second input to 8-bit tensor in quantized model
  • ONNX Converter: Adds support for scale/offset quantization overrides
  • ONNX Converter: Fix warning for Const operator from Opset 11
  • ONNX Converter: Fix scale factor calculation error due to mixing height&width dimensions
  • CPU Runtime: Support LayerNorm layer
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD networks

What's in Qualcomm Neural Processing SDK v1.55.0?

NEW FEATURES
  • Added support for the OneHot operation across the SNPE converters with runtime support available on SNPE CPU
  • ONNX Converter: Added support for LSTM & CRNN
  • ONNX Converter: Added support for Exp layer
  • ONNX Converter: Change to stop parsing at specified output nodes
  • Caffe Converter: Changed the conversion of SSD models to use the DetectionOutput layer. Re-converting these models is strongly recommended as the old layer will be removed in the future.
  • Caffe Converter: Added support for Caffe Power scale/shift parameters
  • Caffe Converter: Warn for uninitialized weight/bias parameters in the BatchNorm layer and initialize them to default values
  • Converters: Enabled support for conversion of Cast Op in TFlite and Pytorch converters
  • DSP Runtime: Added support for LSTM
  • DSP Runtime: Optimized mirror padding for better vtcm utilization on HTP
BUG FIXES
  • DSP Runtime: Fixed the issue of invalid cache record added to DLC while doing offline prepare for HTP
  • Converters: Fixed Softmax and Reduction Ops to have default case for output_buf axis format
  • ONNX Converter: Fixed an issue with the Slice layer when the end is set to INT_MIN
Known Issues:
  • Slight accuracy drop is observed for Caffe Mobilenet SSD for GPU runtime
  • Performance regressions are observed on HTP for Caffe based detection models
  • Random issue is observed on stability on CPU runtime

What's in Qualcomm Neural Processing SDK v1.54.2?

NEW FEATURES
  • TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs
  • Pytorch Converter: Add initial Pytorch converter, and documentation for it
  • Converters: Adds support for Caffe Reduction layer Sum and Mean Ops
  • Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free
  • ONNX Converter: Add support for LSTM & CRNN in converters
  • Converters: A change was made that ensures that when the outputs of a network are represented by an Identity op, the network will retain original output names, even when the ops are removed from the network. Previously Identity ops were stripped and the output name used would default to the name of the input to the identity. With the new change the identity ops are still stripped, but the name of the input will be updated to represent the original Identity output name. This change will only impact customers which use outputs of a network that are Identity ops. In this case, the output name will now match the original framework model's output name rather than the previous node's output name.
  • DSP Runtime: Add support for LSTM
BUG FIXES
  • Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization
  • ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias
  • Converters: Add support for reverse engineering SAME padding mode from the explicit pad values
  • DSP runtime: Fixed scratch buffer over access issue in RoiAlign
  • DSP Runtime: Fixed graph prepare issue for reshape const node
  • DSP Runtime: Optimized reduce mean performance when reduced on channel
  • Java API: Added protection when removing tensors to avoid crashing in a multithreaded application
Known Issues:
  • Slight accuracy regressions Mobilenet_v1_quantaware and Mobilenet_V2_SSD_quantaware on HTP Runtime are observed
  • Performance regression is seen on DeeplabV3 model with online graph preparation

What's in Qualcomm Neural Processing SDK v1.53.2?

NEW FEATURES
  • Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize
  • Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select
  • Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes
  • BUG FIXES
  • Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation
  • Tool: Tf Converter: Fixed issue where tf model conversion was resulting in one static node even though start of network provided as input from command line
  • Known Issues:
  • Accuracy regressions on VGG16, Alexnet and Mobilenet_v1 on HTP Runtime are observed. Mobilenet_v1 accuracy regression can be recovered by using Enhanced Quantization

What's in Qualcomm Neural Processing SDK v1.52.0?

NEW FEATURES
  • Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
  • Tool: Converter: Added Converter support for Nd Reshape layer
  • Tool: Converter: Added CastOp support for TF
  • Tool: Converter: Added support for static subgraph resolutions at conversion time
  • Tool: Converter: Added support for tensor dtype for TF fill op translation
  • NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
  • Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
  • Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers
  • BUG FIXES
  • NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
  • NPE GPU : Added optimized kernel for ReduceMean Operation
  • Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
  • NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
  • NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
  • NPE AIP: Optimized the input conversion for the models involving padding along width dimension