Archives

What's in Qualcomm Neural Processing SDK v1.61.0?

NEW FEATURES
  • Converters: Onnx: Enabled support to handle custom op inputs correctly when the default values are provided
  • ONNX Converter: Added support to resolve static ONNX Cast operation as Constant
  • CPU Runtime: Supported CRD mode for depthtospace(pixelshuffle)
  • ONNX Converter: Fix simplifier behavior with given input dimensions
  • DSP Runtime: Added support for LayerNorm for V65/V66
  • Converters: Added new pattern to fold ReduceL2 + Div as L2Norm
  • Converters: Added support for Relay IR's requantize op that can be seen in framework quantized models
BUG FIXES
  • Core: Improved performance of loading DLC from a memory buffer
  • ONNX Converter: Fixes scale calculation for ONNX Resize Operator for align_corner mode. Also overrides Resize input axis format as per source axis order
  • Caffe Converter: Added support for Caffe Scale where the scale weights are of shape [batch,channels] and axis == 0
  • ONNX Converter: Fixed issues for Axis Tracking related to L2 Norm
  • SDK: Update Sample Code to demonstrate handling multiple ITensor inputs
  • AIP Runtime: Fixed low accuracy issue on mobilenet variant for Multi-class NMS layer
  • ONNX Converters: Added support for combination of Nearest and Half_pixel modes for ResizeOp
Known Issues:
  • SNPE DSP: Observing error if second input to scale layer is having rank equal to 1
  • Higher De-init time is observed on QRB5165 platform with CPU runtime for models like Mobilenet

What's in Qualcomm Neural Processing SDK v1.60.0?

NEW FEATURES
  • Tools: Converter: Added ONNX Gemm transA and transB support
  • Native sample code is updated to take static quantization parameters for quantized input buffers
  • libSNPE.so, libcalculator.so, libplatformValidatorShared.so, libnpe_dsp_domains_v2.so – libraries generated with gcc7.5, gcc8.2 and gcc9.3 toolchain - are now compiled with additional read-only relocation compiler flags
  • Documentation update: User Logging API documentation added in Application Tips section
Bug Fixes
  • HTP: Fixed issue with Cast op usage in certain configurations
  • ONNX Converter: Improvements to handle different input axis layouts
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed
  • Error: Model Validation fails for FC layer with error that there is a mismatch between weights and input dimensions
    • Characteristic: Typically seen with ONNX models where the FC layer (with 4D input A and 2D input B) input follows Reshape layer either immediately or after some trivial eltwise layers
    • Workaround: Insert reshape Op before FC to the input A with shape (orig_4D_shape[0], -1)
  • Error: ONNX models with LSTM layer will have validation error related to input shape or will cause significant drop in accuracy
    • Characteristic: LSTM models that have initial h/c input tensors will generally fail due to this issues
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Error: AssertionError: LSTM h/c input buffer needs to have format NONTRIVIAL, got NFC'
    • Characteristic: Failure seen with bidirectional lstm Layers
    • Workaround: Provide command line argument "--input_layout NONTRIVIAL" for each initial h/c input tensor for every LSTM Op
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.59.0?

NEW FEATURES
  • DSP Runtime: Added support for edge padding from SNPE side
  • TensorFlow Converter: Added support for the beta and gamma parameters for InstanceNorm
  • ONNX Converter: Limited support for Expand operator when it can be interpreted as a noop from operator attributes
  • ONNX Converter: Added support for ScatterND
  • DSP Runtime: Added graph identifier into minidm logs to enhance debugging
BUG FIXES
  • Quantizer: Fixed duplicate Convert layer Id issue observed in generated DLC when multiple Convert layers feed into a single layer
  • ONNX Converter: Fixed handling of models with inputs of unknown shape
  • UDO: Fixed typo in generation of UDO template cod
  • ONNX Converter: Resolved issue where Shape operator translation could fail if the input was part of the initializer list
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.58.0?

NEW FEATURES
  • Converter: Enabled broadcasting of weights and bias for BatchNorm layer to match channel dimensions
  • DSP: Enabled the support for Elementwise Log and Neg Ops on HTP
  • DSP: Enabled support for all axis values in reduce mean with axis size of 2
Known Issues:
  • Minor reduction in accuracy for VGG16 is observed

What's in Qualcomm Neural Processing SDK v1.57.0?

NEW FEATURES
  • ONNX Converter: Added support in dry-run mode to handle reporting ops that are not in ONNX schema domain
  • Converters: Updated inaccurate macs/params calculations for Ops per re-analysis
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD network
  • Converters: Fixed axis tracking bug for permute when input is btf format
  • Converters: Removed obsolete ssd_permute_param parameter in caffe converter permute translation
  • Converters: Added command line argument to override data type for inputs
  • TFlite Converter: Enabled conversion of BERT style models
BUG FIXES
  • Converters: Fixed coefficient input broadcasting issue for ONNX Prelu operation
  • DSP Runtime: Optimized performance for strided_slice operator for V65/V66
  • DSP Runtime: Fixed axis quantization not adding all the fixedPointParam of output to bufferDeltas
  • DSP Runtime: Accuracy Fixes and Improvements for PReLU on V65/V66
  • ONNX Converter: Fixed issue with improper handling of Elementwise Div during conversion
  • DSP Runtime: Improved handling of Pad ops during prepare for HTP
Known Issues:
  • Graph preparation fails when –enable_init_cache option is used in DSP Runtime for HTP

What's in Qualcomm Neural Processing SDK v1.56.2?

NEW FEATURES
  • Clarified the documentation/example for specifying UDO data types
BUG FIXES
  • Converter: Added new layernorm sequence for pattern matching and added a constraint to enforce MatMul layer's constant second input to 8-bit tensor in quantized model
  • ONNX Converter: Adds support for scale/offset quantization overrides
  • ONNX Converter: Fix warning for Const operator from Opset 11
  • ONNX Converter: Fix scale factor calculation error due to mixing height&width dimensions
  • CPU Runtime: Support LayerNorm layer
  • CPU Runtime: Set the max detections to keep top K for Caffe SSD networks

What's in Qualcomm Neural Processing SDK v1.55.0?

NEW FEATURES
  • Added support for the OneHot operation across the SNPE converters with runtime support available on SNPE CPU
  • ONNX Converter: Added support for LSTM & CRNN
  • ONNX Converter: Added support for Exp layer
  • ONNX Converter: Change to stop parsing at specified output nodes
  • Caffe Converter: Changed the conversion of SSD models to use the DetectionOutput layer. Re-converting these models is strongly recommended as the old layer will be removed in the future.
  • Caffe Converter: Added support for Caffe Power scale/shift parameters
  • Caffe Converter: Warn for uninitialized weight/bias parameters in the BatchNorm layer and initialize them to default values
  • Converters: Enabled support for conversion of Cast Op in TFlite and Pytorch converters
  • DSP Runtime: Added support for LSTM
  • DSP Runtime: Optimized mirror padding for better vtcm utilization on HTP
BUG FIXES
  • DSP Runtime: Fixed the issue of invalid cache record added to DLC while doing offline prepare for HTP
  • Converters: Fixed Softmax and Reduction Ops to have default case for output_buf axis format
  • ONNX Converter: Fixed an issue with the Slice layer when the end is set to INT_MIN
Known Issues:
  • Slight accuracy drop is observed for Caffe Mobilenet SSD for GPU runtime
  • Performance regressions are observed on HTP for Caffe based detection models
  • Random issue is observed on stability on CPU runtime

What's in Qualcomm Neural Processing SDK v1.54.2?

NEW FEATURES
  • TF Converter: Added support for detecting eltwise pattern for batchnorm layer with fakequant inputs
  • Pytorch Converter: Add initial Pytorch converter, and documentation for it
  • Converters: Adds support for Caffe Reduction layer Sum and Mean Ops
  • Quantizer: Added support to make Convert Operator upscale and downscale quantization parameters loss free
  • ONNX Converter: Add support for LSTM & CRNN in converters
  • Converters: A change was made that ensures that when the outputs of a network are represented by an Identity op, the network will retain original output names, even when the ops are removed from the network. Previously Identity ops were stripped and the output name used would default to the name of the input to the identity. With the new change the identity ops are still stripped, but the name of the input will be updated to represent the original Identity output name. This change will only impact customers which use outputs of a network that are Identity ops. In this case, the output name will now match the original framework model's output name rather than the previous node's output name.
  • DSP Runtime: Add support for LSTM
BUG FIXES
  • Converters: Added batch dimension to anchor input data conversion from tensorflow corner style to center style for DetectionOutput operation optimization
  • ONNX Converter: Added support to pre-apply ONNX batchnorm scale and bias quantization encodings before getting consumed by Converter to compute weights and bias
  • Converters: Add support for reverse engineering SAME padding mode from the explicit pad values
  • DSP runtime: Fixed scratch buffer over access issue in RoiAlign
  • DSP Runtime: Fixed graph prepare issue for reshape const node
  • DSP Runtime: Optimized reduce mean performance when reduced on channel
  • Java API: Added protection when removing tensors to avoid crashing in a multithreaded application
Known Issues:
  • Slight accuracy regressions Mobilenet_v1_quantaware and Mobilenet_V2_SSD_quantaware on HTP Runtime are observed
  • Performance regression is seen on DeeplabV3 model with online graph preparation

What's in Qualcomm Neural Processing SDK v1.53.2?

NEW FEATURES
  • Tool: Quantizer: Added support for fake quant operators in snpe-dlc-quantize
  • Tools: TF Converter: Support for logical_and, equal, greater, greater_equal, less, less_equal, not_equal, logical_or, select
  • Tool: TensorFlow Converter: Added support for Identity nodes that act as graph output nodes
  • BUG FIXES
  • Tool:ONNX converter: Fixed incorrect default bias shape for ConvTranspose translation
  • Tool: Tf Converter: Fixed issue where tf model conversion was resulting in one static node even though start of network provided as input from command line
  • Known Issues:
  • Accuracy regressions on VGG16, Alexnet and Mobilenet_v1 on HTP Runtime are observed. Mobilenet_v1 accuracy regression can be recovered by using Enhanced Quantization

What's in Qualcomm Neural Processing SDK v1.52.0?

NEW FEATURES
  • Tool: Converters: Removes pre-broadcasting of constant tensors resulting in smaller file sizes in converter output
  • Tool: Converter: Added Converter support for Nd Reshape layer
  • Tool: Converter: Added CastOp support for TF
  • Tool: Converter: Added support for static subgraph resolutions at conversion time
  • Tool: Converter: Added support for tensor dtype for TF fill op translation
  • NPE HTP: Enabled the support for float input of elementwise binary op in offline and online graph preparation
  • Converters: ONNX: Added support for NonMaxSuppression and updated Cast op ensure proper type tracking
  • Converters: Common: Updated op squashing logic to attempt a squash into subsequent op when node's input buffer has multiple consumers
  • BUG FIXES
  • NPE DSP: Fixed variance accuracy loss in InstanceNormalization on HTP
  • NPE GPU : Added optimized kernel for ReduceMean Operation
  • Tool: Converter: Fixed bug in TF fullyconnected translation where input was intermittently out-of-order
  • NPE DSP: Fixed the issue of freeing the uninitialized pointer that is leading to random crash
  • NPE DSP: Optimized specific unpack->elementwise sequences for certain models on HTP
  • NPE AIP: Optimized the input conversion for the models involving padding along width dimension

What's in Qualcomm Neural Processing SDK v1.51.0?

NEW FEATURES
  • Converter Tool: Added support for Onnx WhereOp
  • Added support for edge padding type for pad operation in GPU runtime
  • Neural Processing Engine DSP: Enabled support for ElementWiseUnary abs layer on HTP
  • GPU Runtime: Added support for asymmetric reflect padding for pad operation
  • UDO: Allow users to specify a different datatype for each core in single config file
  • UDO: HTML documentation & sample app is updated to provide example for loading UDO package
  • BUG FIXES
  • DSP Runtime: Fixed the context leak on HTP targets during repeated init/deinit scenarios
  • Neural Processing Engine: Init stage is optimized to be done faster
  • Neural Processing Engine DSP: Optimized maxpool with stride 2x1 on HTP
  • Neural Processing Engine DSP: Optimized the big sized concat ops to fit into memory
  • Neural Processing Engine DSP: Optimized the init on HTP
  • Neural Processing Engine DSP: Graph prepare is optimized for HTP targets to be able to run bigger graphs
  • Neural Processing Engine DSP: Fixed the issue with CDSP not going to sleep when the model is de-initialized
  • Fixed issues related to HMX hysteresis management on HTP which include correct timer expiry handling, and deadlock avoidance when both hysteresis timeout and de-init happens around the same time

What's in Qualcomm Neural Processing SDK v1.50.0?

NEW FEATURES
  • Tool: Quantizer: Added SNPE Quantizer support for is_symmetric field used in updated AIMET specification
  • DSP Runtime: Improved instance norm op accuracy when input size is big
  • DSP Runtime: Enabled edge padding support for v65/v66 targets
  • BUG FIXES
  • Tool: Tensorflow Converter: Resolved Xiaomi issue where TF Mul was not being translated correctly
  • Fixed issues with offline prepare of DeepLabV3 for SOCs with 2MB VTCM
  • Tool: Tensorflow Converter: Resolves issue where TF Mul was not being translated correctly
  • Fixed issue in HTP prepare with certain combinations of Conv followed by other layers
  • Improved Convolution performance on HTP when horizontal and vertical stride are not equal
  • Improved accuracy of Instance Norm on DSP
  • Fixed DSP clock drop issue by adding clock vote hysteresis support
  • Fixed issue with quantization of ArgMax layers of certain input type
  • Bug fixed that caused failure when running an INT8 network followed by an INT16 network in DSP runtime
  • Tool: Tensorflow Converter: fixed issue with constant coeff input to multiple Prelu layers
  • Enhanced split logic for ConvLayer for certain input type
  • Fixed issue with elementwise add for certain input type in DSP runtime
  • Fixed issue in HTP prepare with certain combinations of addsub_op followed by other layers
  • Resolved performance issue of multiple concurrent executions using common HW resource in DSP runtime
  • Fixed HTP prepare issue with MobileBERT

What's in Qualcomm Neural Processing SDK v1.49.0?

  • Optimized 16-bit quantization performance with l2 cache prefetch and aligned buffer load/save in DSP runtime
  • Enabled Matmul support in SNPE HTP
  • ONNX Converter:Added support for edge padding

What's in Qualcomm Neural Processing SDK v1.48.0?

  • IMPORTANT: Neural Processing SDK migrated to use Ubuntu 18.04 as the host platform
  • Updated dependencies.sh and check_python_dependencies.sh for the transition to Ubuntu 18.04 - Python 3.6, and libc++9
  • Switched diagnostic logging (SNPEDiag.log) to a new file format
  • Removed use of and dependency on OpenMP in CPU runtime
  • Resolved issues with layer dump feature using --debug flag on HTP
  • Optimized performance of the PreLU layer on HTP
  • Optimized Fast RPC performance by statically mapping frequently used buffers in DSP runtime
  • Improve instance norm op accuracy when input size is big in DSP runtime
  • Added support for using Unsigned PD in AIP runtime
  • Added support for the MobileDetEdgeTPU SSD model
  • BUG FIXES
  • Added support for models with UDO in the Android Sample App
  • Fixed bias encoding can't be overridden by quantization_overrides in Onnx/TF converter
  • Fixed support for processing tf.layers.Dense in TF Converter
  • Fixed the issues with UDO fallback to CPU on HTP
  • Fixed a shape issue with certain structures including FC in Onnx Converter
  • Fix Unpack Layer indexing error on HTP
  • Fix overflow issue in instance norm op when variance is too small in DSP runtime
  • Optimized input node followed by concat on HTP
  • Added session reset logic to handle the case when DSP goes to SSR
  • Improved the performance of 7x7 Depthwise Conv2D op on HTP
  • Enabled keep dims support for Reduce min/max/sum layers on HTP

What's in Qualcomm Neural Processing SDK v1.47.0?

  • Added a new matching pattern for ResizeNearestNeighbor to the TF Converter
  • Added support for TF 1.15 NonMaxSuppressionV3 translation
  • Added necessary restriction when optimizing graph having matmul in DSP runtime
  • Added quantization parameters for scale and offset to resolve 0 output in DSP runtime
  • Added scale layer in offline prepare in DSP runtime
  • Updated the Embedding Layer support on HTP to handle more than inputs of range greater than 255
  • Enabled Normalize layer(part of caffe_ssd fork) translation support in Caffe converter
  • Added opset 11 support for the ConvTranspose op in Onnx converter
  • BUG FIXES
  • Fixed the inputs of eltwise sub not being broadcast in SNPE CPU
  • Fixed problem with TensorFlow conversion of PReLU layers that contains the Addv2 op
  • Fixed bug where buffer attribute for element size was returning the wrong value for 16-bit tensors
  • Fixed 16bit dequantization issue when output data length does not align to 128

What's in Qualcomm Neural Processing SDK v1.46.0?

  • Optimized argmax op l2 cache prefetch in DSP runtime
  • BUG FIXES
  • Fixed issue of Lrn_d32 op fails for window size 1 in DSP runtime
  • Fixed issue of InputSupernode Fails in an edge case in DSP runtime

What's in Qualcomm Neural Processing SDK v1.45.3?

  • Accuracy fixes for various Layers on HTP
  • Init/De-init time improvements
  • Inference Performance Improvements

What's in Qualcomm Neural Processing SDK v1.43.0?

  • Improved the input/output conversion times for models having depth as 4 on AIP runtime
  • Enabled initial support for constant layers along with elementwise Op on HTA
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Added support for Caffe's "Clip" layer in the caffe converter
  • Added int16 example to snpe-sample app
  • BUG FIXES
  • Fixed the crash while running multi-threading applications with user buffer mode on AIP runtime
  • Fixed bug in ONNX converter that used a hard-coded name for the sequence length input of the LSTM operator
  • Fixed bug in ONNX converter for Unsqueeze layer, which got a key-error with static inputs
  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime
  • Fixed the issue with generation of HTA enabled dlc for denoise model
  • Fixed the segmentation fault issue during dlc generation with specific inputs, on HTA
  • Fixed issue with PlatformValidator.hpp reference to non-existent #include

What's in Qualcomm Neural Processing SDK v1.42.2?

  • Fixed the bug in l2_fetch usage during output conversion which improved the performance significantly for some models running on AIP runtime.

What's in Qualcomm Neural Processing SDK v1.42.0?

  • Removed V60 DSP libs from SNPE SDK
  • Enabled the AIP runtime support for generating the intermediate outputs from HTA with online compiler
  • Enabled multithread for re-quantize process in DSP runtime
  • Added optional parameter to set the hysteris period for sustained high and burst profiles in DSP runtime
  • BUG FIXES
  • Added support for opaque float concat operation in SNPE DSP concat layer
  • Fixed bug in UserBufferTF8 where retrieving the encoding would always return null
  • Fixed box decoder performance issue on mobilenet v2 ssd model for DSP runtime
  • Fixed tanh performance issue by replacing QuantizedTanh_8_ref with QuantizedTanh_8 op in DSP runtime

What's in Qualcomm Neural Processing SDK v1.41.0?

  • Added MatMul support on the CPU runtime
  • Added support for new version of 7250 with integrated PMIC module
  • User Defined Operations(UDO) with weight parameters have been added to demonstrate both quantization and network execution on CPU and DSP runtime cores respectively

What's in Qualcomm Neural Processing SDK v1.40.0?

  • Added DSP Graph Caching support for AIP models with HVX subnets
  • Upgraded DSP to use Hexagon SDK 3.5.2 toolchain
  • Added support for 16bit UDO layers in DSP
  • Added support for large average pooling, reduce_mean layer and improved elemetnwise_mul support for larger tensor size
  • BUG FIXES
  • Fixed the issue with buffer ordering during the execution of batched models on AIP runtime
  • Fixed issue with SsdDetectionOut when number of classes is only 1
  • Fixed accuracy issue with Correlation 1D op
  • Fixed improper processing when 16bit input quantization is used in certain cases
  • Fixed scaling logic in convert_16 op

What's in Qualcomm Neural Processing SDK v1.39.1?

  • Update to v1.39.0 to address performance regression of Mobilenet SSD model on AIP runtime

What's in Qualcomm Neural Processing SDK v1.39.0?

  • Added graph caching support which improves init times for DSP & AIP networks. (DSP subnet with in AIP is not supported)
  • Optimized Prelu to reduce saturation loss during re-quantization at prelu by using cubic approximation
  • Added additional logging messages for debugging in DSP runtime
  • BUG FIXES
  • Fixed the issue with setting the performance profile for AIP runtime in multithreading scenarios
  • Fixed issue with incorrect dlc generation problem when multiple instances of snpe-dlc-quantize running in parallel for AIP runtime
  • Fixed potential bug with freeing threads in DSP runtime
  • Fixed issue of incorrect UDO tensor datatype in quantizer

What's in Qualcomm Neural Processing SDK v1.38.0?

  • Enabled FC/MatMul to use VTCM if available in DSP.
  • Optimized 16-bit MeanVarianceNormalize in DSP runtime.
  • Added support for batchwise scalar divide operation in DSP runtime.
  • Optimized Hard-swish operator for mobilenetV3.
  • Added support for EltwiseMin layer for ONNX converter and CPU runtime.
  • Added support for Onnx BatchNorm layer (OpVer 9, 12) in Onnx Converters.
  • Caffe preprocessing subtract_mean layer is added. If specified, converter will enable preprocessing specified by a data layer transform_param subtract_mean.
  • ONNX softmax converter support only existed for rank <= 2. Support for tensors rank <= 4 was added.
  • Enabled the end-user / developer to request the use of an unsigned process domain to avoid the requirement of signed libraries for SNPE execution on 8250 and newer devices.
  • BUG FIXES
  • Removed autoquantization for classes output in MultiClassNMS layer and added support for float addition in ElementwiseOp layer to handle this case.
  • Fixed the issue with enabling stats for AIP runtime on models where number of layers in HTA subnet is more than SNPE layers.
  • Fixed the output conversions to allocate the required buffers during initialization itself in AIP runtime, to improve the inference time.
  • Enabled honoring of padding information from the HTA driver which is pre-computed by AIP runtime earlier, to unblock execution of more models.
  • Fixed the issue with output buffer id while converting depth2space to deconv on HTA.
  • Fixed a bug during graph transformation while folding the batchnorm on HTA.
  • Increased DCVS relaxed sleep latency duration, this will let power system know that CDSP can goto deeper sleep state. If there is no active request for inferencing, it is better for system to go in deeper sleep state.

What's in Qualcomm Neural Processing SDK v1.37.0?

  • Enabled the online compiler support for HTA 1.x family of devices
  • AIP performance profiles behavior is aligned similar to DSP runtime for reduced power consumption in case of inference inactivity
  • ONNX Converter: Added support for Onnx Pad layer (OpVer 11)
  • Bug fix. Snpe-dlc-ino: Fixed issue in MACs calculation error for deconvolution layer

What's in Qualcomm Neural Processing SDK v1.36.0?

  • Added Java API extension to register UDO package with SNPE
  • snpe-dlc-info now prints the command-line that was used to quantize the DLC if applicable
  • Added support to handle UDO layers with multiple TF8 outputs with different quantization parameters
  • Added support for an additional profiling level (moderate) for SNPE benchmarking script and associated snpe-net-run executable for tracking initialization time metrics
  • Upgraded DSP to use Hexagon SDK 3.5.1 toolchain
  • Extend Platform Validator to detect HTA API versio
  • Add VOLATILE_CHECK Mode for SNPE DSP Runtime Checking to query runtime availability in each call instead of giving cached result
  • Performance modes like LOW_POWER_SAVER, HIGH_POWER_SAVER, LOW_BALANCED added for CPU runtime
  • Fixed bug with propagation of model version during conversion
  • Fixed the issue with selecting the correct output shape during graph transformation while inserting1x1 conv2d for different input format
  • Fixed the issue with allocation of layer descriptor while loading the network on HTA

What's in Qualcomm Neural Processing SDK v1.35.0?

  • Introduce the User-Defined Operations (UDO) feature
  • Added support for SDM720G/SM7125
  • Added support to snpe-throughput-net-run for UserBuffer input tensors (both INT8 and INT16)
  • Input batching support is added for networks that can run completely on AIP runtime
  • Add support for the tf.stack and tf.unstack ops to the DSP and CPU runtimes
  • Add support for the tf.stack, tf.unstack, tf.floor, tf.minimum to the TF converter
  • Fixed some small memory leaks that are seen when repeatedly calling dlopen()/dlclose() on libSNPE.so
  • Updated the Deconvolution operation on DSP with a new kernel that improves performance on various kernel sizes and strides
  • Fix ssd_detection CDSP crash on DSP runtime
  • Updated the HTA to partition the input layer, if it has a connection to a layer that is not included in the same partition
  • Improved the tiling configuration support for depth wise convolution layer

What's in Qualcomm Neural Processing SDK v1.34.0?

  • Initial support for ops with 16-bit activations using HTA in both snpe-dlc-quantize and in the SNPE AIP runtime.
  • New option for snpe-net-run to automatically turn unconsumed tensors of the network (tensors that are not inputs to a layer) into network outputs.
  • Fixed inconsistent results on SM8250 in certain cases for depthwise convolutions.
  • Add support for the depth2space operation on the GPU.
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements.
  • Truncate detection output on DSP to return valid data only.
  • Ensure weights are properly flushed to DDR for use during inference in the DSP runtime.
  • Fix support for NV21 encoding in the DSP runtime.

What's in Qualcomm Neural Processing SDK v1.33.2?

  • Address accuracy issues for Deconvolution in the AIP runtime
  • Changed behavior of Crop layer resize, so it retains the number of copied elements on each dimension
  • Make quantizer --override_params work for AIP
  • Reordered PerformanceProfile_t to be ABI compatible with 1.32.0
  • Using optimized Softmax implementation in AIP networks when input activation has more than 5000 elements

What's in Qualcomm Neural Processing SDK v1.33.1?

  • New performance modes have been added:
  • LOW_POWER_SAVER: Run in lower clock than POWER_SAVER, at the expense of performance
  • HIGH_POWER_SAVER: Run in higher clock and provides better performance than POWER_SAVER
  • LOW_BALANCED: Run in lower balanced mode, provides lower performance than BALANCED
  • snpe-dlc-info adds a summary of the layer types in use in the model
  • Updated to use new BLAS functionality that leverages OpenMP. This adds a new dependency on the OpenMP shared library for Linux platforms
  • Added 32-bit bias support
  • Support init caching for SSD output layer on DSP
  • Bugs:
  • Fix memory leak causing increasing init time for DSP
  • Add converter support for dilated convolution when used with fakequant nodes
  • Multiple bugs fixed in snpe-onnx-to-dlc that were causing errors for models having torch.Mul op
  • Extends TF converter support to NMSv1 Op in addition to existing support for v2 and v3 NMS Ops
  • Tensorflow conversion bug fixed in infer_shape for StridedSlice Op. output_shape should not be a list of shapes but the shape of the one output
  • Fix bug with propagation of model version during conversion
  • If burst mode is set, set thread affinity to Big Cores during init and de-init, and restore to the previous setting after the actions are complete
  • Fix segfault when using user buffers with a resizable dimension

What's in Qualcomm Neural Processing SDK v1.32?

  • Add Caffe MVN Layer support in the Caffe Converter, CPU Runtime, and DSP Runtime
  • snpe-dlc-quantize: Enable the use of quantization parameters calculated during training when using dlc quantizer. To override the SNPE generated quantization parameters simply pass -- override_params to snpe-dlc-quantize.
  • Removed deprecated command line arguments from converters. All three converters now require passing -i/--input_network for model input paths. Help menus are updated for each converter
  • snpe-dlc-diff: Added command-line option [--diff_by_id/-i] to snpe-dlc-diff. This option allows users to compare 2 models in order(sorted by id); as oppose to only diffing common layers
  • Added support for L2Norm layer to TensorFlow converter
  • Optimized the DSP performance for the 'Space To Depth' layer
  • Add support in the Java API for setInitCacheEnabled(), and setStorageDirectory() to enable DLC caching support.
  • Allow graceful recovery after a fastrpc error - Recreate the userPD after the cDSP crashes so that the user can continue on the SNPE process with subsequent instances, instead of having to close the SNPE process. Note: all the instance associated to the previous userPD will be lost.
  • snpe-dlc-viewer: Associate each layer type to a fixed color for consistency when using snpe-dlc-viewer
  • Split the SNPE isRuntimeAvailable method into two separate functions to improve backward compatibility with existing client binaries that were built against the older signature.
  • Bugs:
  • TF Converter: Fix Elementwise Broadcast support
  • ONNX Converter: Fixed bug where output dimension was incorrect when keep_dims parameter was set to False for Argmax, ReduceSum and ReduceMax.
  • ONNX Converter: Fixed bug where pad attribute was not properly parsed for Deconv Op.
  • Caffe Converter: Fixed bug when converting SSD-based models when using Python 3.
  • TF Converter: Fixed bug where converter was removing const Op input to reshape op when passed through identity op(s). i.e const-> identity -> reshape.
  • Fixed bug where getOutputSize() would give the wrong result on output tensors in UserBuffer mode

What's in Qualcomm Neural Processing SDK v1.31?

  • New patterns were added to enable running the CLE algorithm on more op patterns and model architectures
  • Added support for HeatmapMaxKeypoint and the ROI Align layer in the CPU runtime
  • Added initial L2Norm layer support in CPU runtime. No support for axis parameter yet: normalization is performed along the inner-most dimension of the input tensor
  • Support for single-input Concatenation layers was added to CPU, GPU and DSP
  • Added support for Detection Output layer on DSP runtime. Currently, only a batch of 1 is supported
  • Changed determination of number of batch dimensions in the Fully Connected layer so rank greater than 1 is always assumed to mean that there is 1 batch dimension
  • Enhanced dlc-info tool to support runtimes available per layer. Removed constraint on the LSTM layer in the GPU runtime that prevented batch mode operation.
  • Added Tensorflow converter support for Caffe-style SSD networks
  • Added support for Leaky-RELU in the TensorFlow converter. Both the actual Leaky-Relu op and the elementwise op representation are supported and map to SNPE's Prelu op.
  • Added Argmax support to the Caffe converter, and optimized performance on the DSP runtime
  • Added new column to snpe-dlc-info that displays the supported runtimes for each layer. F12 Initial support for per-layer statistics from AIP/HTA subnets

What's in Qualcomm Neural Processing SDK v1.30?

  • Documentation has been added to reflect the new common converter command line options for input processing
  • Converters now propagate required batchnorm information for performing quantization optimizations
  • Support for the new bias correction quantization optimization which adjusts biases by analyzing float vs quantized activation errors and adjusting the model to compensate
  • ONNX converter now filters single input Concats as a no ops as SNPE didn’t support them
  • Converter input processing now uniformly handles different input types and encodings
  • ONNX converter now supports the ConvTranspose ‘output_padding’ attribute by adding an additional pad layer after the ConvTranspose op
  • Integrates the latest flatbuffer 1.11 library which brings speed improvements and options for model size reduction
  • GPU size limitations with the ArgMax op (when setting the keepDims op attribute to false) can be worked around by enabling CPU fallback
  • Fixed DSP error with MobileNet SSD on QCS403 and QCS405
  • Fixed the issue with partitioning of deconv layer in HTA

What's in Qualcomm Neural Processing SDK v1.29?

  • Added support for dlc reorder tool
  • Optimization of HTA d32 conversions
  • Added tf space_to_depth op for SNPE CPU and DSP runtime
  • Benchmarking scripts enhanced for showing further break down of execution time, across various components
  • Added support for additional ONNX binary element-wise ops
  • Optimized deconv layer for improving performance
  • Fixed an issue related to runtime error in DSP runtime
  • Performance Optimization of SNPE GPU Runtime for Shufflenet V2 by using profiling level config

What's in Qualcomm Neural Processing SDK v1.28?

  • Added an optional argument to isRuntimeAvailable for the DSP runtime so that it doesn't activate the DSP
  • Allow UB_T8 and UB_FLOAT output for snpe-net-run
  • Added a new command line option for snpe-dlc-diff to check layer names
  • Updated the --dlc argument to --output_path for snpe-caffe-to-dlc to align with the ONNX converter
  • Added --dry_run argument to snpe-onnx-to-dlc to allow evaluation for successful conversion on an ONNX model
  • Added support for the gather op in the DSP runtime
  • Added support to convert the TF MobileNet-V1-FPN-SSD model
  • Fixed a memory leak in the DSP runtime that is seen when repeatedly loading and unloading a network
  • Addressed issues on V66 DSPs related to acquiring VTCM memory
  • Fixed an issue related to multiple inputs for the Caffe converter
  • Fixed an issue in the TF converter related to element-wise sun and the atrous parameter
  • Fixed an issue in the TF converter related to tf.crop_and_resize when there are only 2 inputs
  • Fixed additional cases of uncaught exceptions with the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27.2?

  • Added support for SM8150P
  • Fixed memory leak issue on AIP runtime
  • Fixed additional cases of uncaught exceptions with the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27.1?

  • Updated the AIP runtime to support new features and to fix critical bugs related to HTA. On new Android builds, HTA can support new layers, Bilinear Resize and Prelu
  • Fixed issues relating to uncaught exceptions on the aarch64-android-clang6.0 platform

What's in Qualcomm Neural Processing SDK v1.27?

  • Added new APIs support for setting output tensor names to snpeBuilder and to fetch output tensor names for a given output layer name
  • Improved the peak memory usage with DLC v3 format
  • Fixed few issues with performance and runtime failures on DSP runtime
  • Fixed few issues and improved error handling for platform validator
  • Fixed the issues with Pooling and Instance norm layers of Tensorflow converter

What's in Qualcomm Neural Processing SDK v1.26?

  • Added support for the ONNX Gather Op in the ONNX Converter and CPU runtime
  • Optimized DeConvolution Layer for the DSP runtime
  • Support for tf.nn.moments in the TF converter, CPU and DSP runtimes
  • Added TF Reflect Pad support for the DSP runtime
  • Added symmetric quantizer option in snpe-dlc-quantize
  • Added support for batch > 1 when using the Scale Layer on the DSP runtime
  • Updated Platform Validator python script to be OS-independent
  • Added additional optimizations for HTA input conversion

What's in Qualcomm Neural Processing SDK v1.25.1?

This release focuses on a few key bug fixes for the AIP runtime.

  • Fixed accuracy issues on the AIP runtime
  • Added support for UB_TF8 with the AIP runtime
  • Added support for dilated depthwise convolution on GPU runtime

What's in Qualcomm Neural Processing SDK v1.25?

This release focuses on adding the support for multiple subnets within the AIP runtime and upgrading the DLC format to improve load time performance and memory consumption. In addition, this release fixes critical issues on DSP runtime and adds support for new operations on Tensorflow, ONNX converters and on DSP runtime.

  • There is a known issue with mobilenet benchmark performance regression due to variance in benchmarks and changes for improving accuracy
  • Added option to request larger memory allocations on DSP for improved init time, at the expense of more memory use
  • AIP runtime does not support ub_tf8 data mode currently
  • Support for Android GCC build variants will be discontinued after the 1.25.0 release
  • The last release for the Qualcomm Flight platform (arm-linux-gcc4.8hf) will be the 1.25.0 release
  • x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04 after the 1.27.0 release
  • The x86 binaries will move to clang 7 after the 1.26.0 release
  • Few performance improvements on DSP numbers, as measurements are reported on quantized DLCs from 1.25.0 release

What's in Qualcomm Neural Processing SDK v1.24?

This release focuses on adding the support for multiple inputs and multiple outputs on each subnet of AIP runtime and allows the setProfilingLevel API support for AIP and CPU runtimes.

  • There is a known conversion issue with the snpe-caffe-to-dlc-udl tool for converting a custom UDL layer which will be resolved in next release
  • Support for Android GCC build variants will be discontinued after the 1.25.0 release
  • x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04, after the 1.27.0
  • The x86 binaries will move to clang 7 after the 1.26.0 release

What's in Qualcomm Neural Processing SDK v1.23.1?

This release focuses on improving the initialization/de-initialization times along with adding important timing/accuracy fixes for various Ops.

  • Added support for non max suppression, crop and resize layers on Tensorflow converter
  • Fixed the output inconsistency when multiple instances running concurrently on DSP runtime
  • Support for Android GCC build variants will be discontinued after the 1.25.0 release
  • x86 architecture support will move to Ubuntu 16.04 OS from Ubuntu 14.04, after the 1.27.0

What's in Qualcomm Neural Processing SDK v1.22.0?

This is a major release that adds support for two new Snapdragon Mobile Platforms, Snapdragon 855 and Snapdragon 675. We introduce support for the Qualcomm® Hexagon™ Tensor Accelerator (“HTA”) though the new “AIP” runtime that executes neural networks on HTA and falls back to HVX where necessary. The following are the major features that complete the usual collection of bug fixes and smaller features:

  • Support for the Snapdragon 855 mobile platform on the Hexagon DSP with Tensor Accelerator and Vector eXtensions, Adreno GPU and CPU
  • Support for the Snapdragon 675 mobile platform on the Hexagon DSP, Adreno GPU and CPU
  • Added new AIP runtime for 855
  • Added priority control for DSP workloads
  • Support for manually setting quantization ranges
  • Added new ‘snpe-throughput-net-run’ tool with support for simultaneous execution on different cores

What's in Qualcomm Neural Processing SDK v1.19.2?

The focus of this release is to add new operations and to fill gaps in operators support and to optimize existing operations such as Deconvolution.

  • Support for the Qualcomm QCS605 SoC on the Hexagon DSP (Android, Linux) and on Adreno GPU and CPU
  • Added support for the ELU operation for TensorFlow and ONNX on GPU and CPU
  • Added support for the Power operation for Caffe2 on GPU
  • Added support for Python 3.4
  • Optimized the Deconvolution, Slice and large Softmax operations on DSP

What's in Qualcomm Neural Processing SDK v1.18.0?

This release brings in support for three Snapdragon Mobile Platforms, broadens compatibility with MobileNet SSD networks and expands the supported operations on TensorFlow and ONNX converters. In addition, this release optimizes support for batching, especially when executing MobileNets on the DSP runtime.

  • Support for the Snapdragon 632 mobile platform on the Hexagon DSP, Adreno GPU and CPU
  • Support for the Snapdragon 439 and 429 mobile platforms on Adreno GPU and CPU
  • Improved compatibility of MobileNets networks, including an extended support for MobileNet SSD variations
  • Support for the TensorFlow ‘pad’ and elementwise subtraction on Adreno GPUs
  • Added support for ChannelShuffle to the TensorFlow converter
  • Added support for Shape and Pad to the ONNX converter

What's in Qualcomm Neural Processing SDK v1.17.0?

This release completes a few features and focuses on quality and stability while bringing some minor optimizations with it.

  • Added batching support to DSP. All runtimes have basic batching support now.
  • Extended batching support to the ChannelShuffle layer
  • Extended Caffe Scale layer support to Snapdragon DSPs
  • Optimizations around effective utilization of the DSPs
  • Updated SDK examples

What's in Qualcomm Neural Processing SDK v1.16.0?

The major addition of this release is support for input batching, which means being able to process input tensors with more than one element on the ‘batch’ dimension. This applies to models in Caffe, Caffe2, TensorFlow and ONNX models and when run on the Snapdragon GPU and CPU cores.

  • Input batching on Snapdragon GPU and CPU
  • Support for a new layer: ChannelShuffle (on GPU and CPU, for Caffe2 models)
  • Optimized the Sigmoid, Batch Normalization and Instance Normalization layers
  • Added the Inception-v3 model to the example APP

What's in Qualcomm Neural Processing SDK v1.15.0?

This release adds support for Caffe-based MobileNet SSD networks, and introduces accelerated Instance Normalization, and initial support for Grouped Deconvolutions and per-channel Batch Normalization and a Power layer. See the Layers and Limitations sections of the Reference Guide (available online and in the SDK) for more details.

  • Support for Caffe-based MobileNet SSD
  • Support for new layers: Instance Normalization
  • Extended support with Grouped Deconvolution and 1D Batch normalization
  • MobileNet SSD is 49% faster on GPU 16-bit
  • On average networks are 9% faster across supported chipsets and acceleration cores

What's new in Qualcomm Neural Processing SDK v1.14.0?

The ONNX 1.0 open format for deep learning models is welcomed in our March SDK release. For the list of supported operations please refer to the documentation in the SDK, or to the Documentation section of this website. This release also adds support for two new layers and a new performance profile mode.

  • Support for ONNX 1.0 models (Beta)
  • Support for new layers: Generate Proposals, and RoIAlign
  • Added a manual performance mode

What's new in Qualcomm Neural Processing SDK v1.13.0?

This update increases inference performance, and in particular adds support for the new digital signal processor included in the Snapdragon 845 mobile platform. This release also adds optimization to the 16-bit floating point runtime.

  • Support for the digital signal processor in the Snapdragon 845 mobile platform
  • Performance increase on the 16-bit floating point runtime
  • Performance improvements on the GPU runtimes
  • Initial support for Generate Proposals and RoiAlign layers for Caffe2, on the DSP runtime

What's new in Qualcomm Neural Processing SDK v1.12.0?

This large update introduces a full new accelerated runtime for 16-bit GPU computation, and support for a TensorFlow-style SSD network with MobileNets. We also introduce new library variations optimization.

  • Support for MobileNet SSD support on CPU and GPU
  • Added a GPU 16-bit floating-point runtime
  • Optimizations to the DSP runtime for the Snapdragon 845 mobile platform
  • Added Android LLVM libraries
  • Support for shared Symphony System Manager SDK libraries

What's new in Qualcomm Neural Processing SDK v1.10.1?

This release adds support for new Snapdragon platforms, deploys a fully new DSP runtime, fixes bugs and completes MobileNets support.

  • Initial support for the Snapdragon 845 mobile platform
  • Support for MobileNets on DSP; note that 8-bit quantization may not work well on this network structure
  • Upgraded the DSP acceleration runtime for greater performance and broader compatibility
  • Fixed Faster R-CNN UserBuffers operation
  • Support for Snapdragon Flight boards