Snapdragon Neural Processing Engine SDK
Reference Guide
|
SNPE supports the network layer types listed in the table below.
See Limitations for details on the limitations and constraints for the supported runtimes and individual layer types.
All of supported layers in GPU runtime are valid for both of GPU modes: GPU_FLOAT32_16_HYBRID and GPU_FLOAT16.
GPU_FLOAT32_16_HYBRID - data storage is done in half float and computation is done in full float.
GPU_FLOAT16 - both data storage and computation is done in half float.
A list of supported ONNX operations can be found at ONNX Operator Support.
Note: this table is outdated and does not reflect the current state of supported layers/backends.
Layer Type | Description | Caffe Equivalent | Caffe2 Equivalent | TensorFlow Equivalent | TFLite Equivalent | Onnx Equivalent | PyTorch Equivalent | CPU | GPU | DSP/ AIP |
---|---|---|---|---|---|---|---|---|---|---|
ArgMax | Returns the index with the largest value across axes of a tensor. | n/a | n/a | ArgMax | n/a | ✔ | ✔ | ✔ (?) | ||
Batch normalization (+ Scaling) | Batch normalization (optionally followed by scaling operation). | Maps to the combination of batch_norm_layer followed immediately by scale_layer. (?) batch_norm_layer.cpp scale_layer.cpp | spatial_batch_norm_op.cc | batch_normalization | BatchNormalization | torch.nn.BatchNorm2d | ✔ | ✔ | ✔ (?) | |
Channel Shuffle | Interleaves the channels in groups. The number of channels must be divisible by the number of groups. At least 4 channels are required for this layer to have any effect. | n/a | channel_shuffle_op.h | n/a | n/a | n/a | torch.nn.PixelShuffle | ✔ | ✔ | ✔ |
Color space conversion | Converts input image color format (encoding type) into SNPE native color space. Color space conversion parameters are provided as an option to the model converter tool. | There is no such Caffe layer by itself. This functionality is technically part of the Caffe data provider. data_layer.cpp | n/a | n/a | n/a | n/a | n/a | ✔ | ✔ | ✔ |
Concatenation | This layer concatenates multiple inputs into a single output. | concat_layer.cpp | concat_split_op.cc | concat | Concatenation | Concat | torch.cat | ✔ | ✔ | ✔ |
Convolution | Computes dot products between the entries of the filter and the input at any position. | Includes support for groups and dilation. conv_layer.cpp | Includes support for spatial reflection padding. conv_op.cc | conv2d | conv_2d | Conv | torch.nn.Conv2d | ✔ | ✔ | ✔ |
Crop | Crops one layer to the dimensions of a reference layer. | crop_layer.cpp | utility_ops.cc | n/a | ✔ | ✔ | ✔ | |||
CropAndResize | Crops normalized regions from a batch of images and scales them to a desired output size. | crop_and_resize | n/a | ✔ | ✘ | ✘ | ||||
CrossMap Response Normalization | This is an option within LRN layer. | lrn_layer.cpp | n/a | local_response_normalization | n/a | LRN | ✔ | ✔ | ✔ | |
Deconvolution | Performs deconvolution operation. | deconv_layer.cpp | conv_transpose_op.cc | conv2d_transpose | transpose_conv | ConvTranspose | torch.nn.ConvTranspose2d | ✔ | ✔ | ✔ |
Depthwise Convolution | Performs a 2D depthwise convolution. | Equivalent to Convolution with 'num_output' = input channels and 'group' = 'num_output' conv_layer.cpp | Equivalent to Convolution with 'num_output' = input channels and 'group' = 'num_output' conv_op.cc | tf.nn.depthwise_conv2d | tfl.depthwise_conv_2d | n/a | n/a | ✔ | ✔ | ✔ |
Detection Output | Generate the detection output based on location and confidence predictions by doing non maximum suppression. Typically used in SSD networks. | Note that this layer is not available on the tip of Caffe. It requires a compatible branch of Caffe. detection_output_layer.cpp | n/a | n/a | n/a | n/a | n/a | ✔ | ✘ | ✔ |
Dropout | Layer is used for training only. Converters remove this layer from DLC creation. | dropout_layer.cpp | dropout_op.cc | dropout | n/a | Dropout | torch.nn.Dropout | n/a | n/a | n/a |
Elementwise | Supports SUM, DIV, PROD, MAX, MIN, and SUB mode with coefficients. | Support for MAX, SUM, and PROD eltwise_layer.cpp | Support for SUM and MAX utility_ops.cc | add add_n mul maximum minimum subtract | add sum div mul maximum minimum sub | Add Mul Max Min Sub Sum | torch.sum torch.mul torch.max torch.min | ✔ | ✔ | ✔ |
Elementwise Unary | Supports ABS, CEIL, EXP, FLOOR, LOG, NEG, ROUND, SIN, and SQRT. | floor sqrt | abs exp floor sqrt | Abs Ceil Exp Floor Log Neg Round Sin Sqrt | ✔ | ✔ | ✔ | |||
Elu | activation function: elu [ i.e., f(x) = max(0,x) + a*(exp(min(0,x))-1) ] | elu_layer.cpp | elu_op.cc | elu | elu | ✔ | ✔ | ✘ | ||
Flatten | Flatten an input to a layer | flatten_layer.cpp | utility_ops.cc | flatten | Flatten | torch.flatten | ✔ | ✔ | ✔ | |
Fully connected | Similar to convolution, but with connections to full input region, i.e., with filter size being exactly the size of the input volume. | inner_product_layer.cpp | fully_connected_op.cc | dense Tensordot | fully_connected | MatMul | torch.nn.Linear | ✔ | ✔ | ✔ |
HeatmapMaxKeypoint | Keypoint detection | n/a | inputn/a | n/a | n/a | n/a | ✔ | ✔ | ✘ | |
Input | This is an input layer to the network. | input_layer.cpp | input | n/a | n/a | n/a | ✔ | ✔ | ✔ | |
InstanceNorm | Instance normalization | Supported as batch_norm_layer with 'use_global_stats' = false. | instance_norm_op.cc | InstanceNormalization | ✔ | ✔ | ✔ | |||
L2Norm | L2 normalization along innermost dimension. | n/a | n/a | L2 Normalize | ✔ | ✔ | ✘ | |||
Local Response Normalization (LRN) | Performs a lateral inhibition by normalizing over local input regions. | lrn_layer.cpp | local_response_normalization_op.cc | LRN | ✔ | ✔ | ✔ | |||
LSTM | LSTM recurrent network cell | lstm_layer.cpp | tf.contrib.rnn.BasicLSTMCell tf.nn.static_rnn | LSTM | ✔ | ✔ | ✔ | |||
Normalize | Instance normalization using RMS instead of mean/variance. | Note that this layer is not available on the tip of Caffe. It requires a compatible branch of Caffe. | n/a | n/a | n/a | ✔ | ✔ | ✘ | ||
Output | There is no explicit output layer as the results from any layer in the network can be specified as an output when loading a network. | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Pack | Packs a list of tensors of rank "r" into a single rank (r+1) tensor. | n/a | n/a | stack | n/a | n/a | ✔ | ✘ | ✔ | |
Pad | Performs padding on the input tensor on the edges, in any or all dimensions as specified. values can be specified ("CONSTANT" mode in tf) or using mirror padding ("REFLECT" mode in tf) | n/a | n/a | pad | pad | n/a | torch.nn.functional.pad | ✔ | ✘ | ✔ |
Permute | Permute is used to rearrange the dimensions of a tensor. | Note that this layer is not available on the tip of Caffe. It requires a compatible branch of Caffe. permute_layer.cpp | n/a | transpose | transpose | Transpose | torch.transpose | ✔ | ✔ | ✔ |
Pooling | Pooling operation down samples the input volume spatially. Both average and max pooling are supported. | pooling_layer.cpp | pool_op.cc | average_pooling2d max_pooling2d | average_pool_2d max_pool_2d | MaxPool AveragePool GlobalMaxPool GlobalAveragePool | torch.nn.MaxPool2d torch.nn.AvgPool2d torch.nn.AdaptiveAvgPool2d | ✔ | ✔ | ✔ |
Power | Power layer computes (shift + scale * x) ^ power for input x. (?) | power_layer.cpp | n/a | n/a | n/a | n/a | ✔ | ✔ | ✔ (?) | |
Prelu | activation function: prelu [ i.e., y = max(0, x) + a*min(0,x) ] | prelu_layer.cpp | prelu_op.cc | PReLU | PRelu | ✔ | ✔ | ✔ | ||
Prior Box | Generate the prior boxes of designated sizes and aspect ratios. Typically used in SSD networks. This layer is handled (folded in) and removed from the DLC model during conversion. | Note that this layer is not available on the tip of Caffe. It requires a compatible branch of Caffe. prior_box_layer.cpp | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Proposal | Outputs region proposals, usually for consumption of an ROIPooling layer. Typically used in Faster RCNN. | Note that this layer is not available on the tip of Caffe. It requires a compatible branch of Caffe. proposal_layer.py | n/a | n/a | n/a | n/a | ✔ | ✘ | ✔ | |
Relu | activation function: relu [ i.e., y = max(0,x) ] | relu_layer.cpp | relu_op.cc | relu | relu | Relu | torch.nn.ReLU | ✔ | ✔ | ✔ |
Reshape | Change dimensions of the input to a layer | reshape_layer.cpp | utility_ops.cc | reshape | reshape | Reshape | torch.reshape | ✔ | ✔ | ✔ |
Sigmoid | activation function: sigmoid [ i.e., y = 1/(1 + exp(-x) ] | sigmoid_layer.cpp | sigmoid_op.cc | sigmoid | Sigmoid | ✔ | ✔ | ✔ | ||
Scale (Image) | Input image scaling, maintains aspect ratio. This function is primarily intended for images, but technically any 2D input data can be processed if it makes sense. Scaling parameters are provided as an option to the model converter tool. | There is no such Caffe layer by itself. This functionality is technically part of the Caffe data provider. data_layer.cpp | Resize Nearest Neighbor (Not supported on DSP) resize_op.cc | Resize Bilinear (does not support align_corners=True) tf.image.resize_bilinear Resize Nearest Neighbor (Not supported on DSP) tf.image.resize_nearest_neighbor | resize_bilinear resize_nearest_neighbor | n/a | torch.nn.UpsamplingBilinear2d | ✔ | ✔ | ✔ |
Scale | Elementwise scale, optionally add biases. | scale_layer.cpp | n/a | n/a | n/a | n/a | n/a | ✔ | ✘ | ✔ |
Silence | Silence is handled and removed from the model during conversion, similar to Dropout. | silence_layer.cpp | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a |
Slice | Slices an input layer into multiple output layers. | slice_layer.cpp | concat_split_op.cc | split | Slice | ✔ | ✔ | ✔ | ||
Softmax | Supports 1D and 2D modes. | softmax_layer.cpp | softmax_op.cc | softmax | softmax | Softmax | torch.nn.Softmax | ✔ | ✔ | ✔ |
Space to Depth | Rearranges blocks of spatial data into depth. | n/a | n/a | tf.nn.space_to_depth | n/a | ✔ | ✘ | ✔ | ||
Strided Slice | Extracts a slice of size (end-begin)/stride from the given input_tensor. Starting at the location specified by begin the slice continues by adding stride to the index until all dimensions are not less than end. Note that a stride can be negative, which causes a reverse slice. | n/a | n/a | tf.strided_slice | n/a | ✔ | ✔ | ✔ | ||
Tanh | activation function: tanh [ i.e., y = tanh(x) ] | tanh_layer.cpp | tanh_op.cc | tanh | tanh | Tanh | torch.nn.Tanh | ✔ | ✔ | ✔ |
Tile | Copies a blob along the specified dimensions | tile_layer.cpp | tile_op.cc | n/a | ✔ | ✔ | ✔ | |||
Unpack | Unpacks "n" tensors from a single packed tensor splitting along the axis dimension. | n/a | n/a | unstack | n/a | n/a | ✔ | ✘ | ✔ | |
Cross Correlation Layer | Computes dot products between the entries of the filter and the input at any position and then rotates the result by 180 degrees. | correlation_layer.cpp | n/a | n/a | n/a | n/a | n/a | ✔ | ✔ | ✘ |
Embedding Layer | Turns positive integers (indexes) into dense vectors of fixed size. | n/a | n/a | embedding_layer | n/a | ✔ | ✘ | ✘ | ||
ExtractGlimpse | Returns a set of windows called glimpses extracted at location offsets from the input tensor. | n/a | n/a | extract_glimpse | n/a | n/a | n/a | ✔ | ✘ | ✘ |
Gather | Gather slices from params axis according to indices. | n/a | n/a | gather | Gather | ✔ | ✘ | ✔ | ||
Image Projective Transform | Applies the projective transform to the image represented by a tensor of shape (num_images, num_rows, num_columns, num_channels). | n/a | n/a | image_projective_transform | n/a | n/a | n/a | ✔ | ✘ | ✔ |
Moments | Calculate the mean and variance of input tensor. | n/a | n/a | moments | n/a | n/a | n/a | ✔ | ✘ | ✔ |
Depth To Space | Rearranges data from depth into blocks of spatial data. This is the reverse transformation of SpaceToDepth. | n/a | n/a | depth_to_space | depth_to_space | n/a | torch.nn.PixelShuffle | ✔ | ✘ | ✔ |
Bbox Transform | Transform proposal bounding boxes to target bounding box using bounding box regression deltas. | n/a | bbox_transform_op.cc | n/a | n/a | n/a | n/a | ✘ | ✘ | ✔ |
Reduce (max, min, sum, product, mean) | Computes the maximum, minimum, sum, product, mean of elements across dimensions of a tensor. | n/a | n/a | reduce_max reduce_min reduce_sum reduce_prod reduce_mean | n/a | ✔ | ✔ | ✔ |
Note : AIP Runtime supports all layers supported by the DSP runtime, as layers not supported by HTA run on HVX.