Snapdragon Neural Processing Engine SDK Reference Guide
Image Preprocessing

SNPE provides support for common image preprocessing operations such as color space conversion (e.g. NV21 to BGR format), scaling, cropping and mean subtraction on all supported runtimes. These operations are added as layers to the network and are performed as part of the forward propagate pipeline. Any data required for the operation, such as a mean image, is embedded into the network DLC.

These image preprocessing operations are currently only supported for DLC networks converted from a Caffe model.

# Order of Operations

The image preprocessing operations are added to the network in a predefined order. This order is fixed and independent of the order in which the options are specified in the converter or in the network prototxt file.

The order of image preprocessing operations is:

# Supported Operations

## Image Color Space Conversion

SNPE supports converting input images of various pixel formats to BGR, the format required by Caffe networks.

This operation is added by specifying the input image pixel format option to the snpe-caffe-to-dlc conversion tool.

The following source encoding formats are supported:

NV21

NV21 is the Android version of YUV, also known as YUV420SP. The Chrominance is down sampled and has a sub sampling ratio of 4:2:0. Note that this image format has 3 channels, but the U and V channels are subsampled. For every four Y pixels there is one U and one V pixel.

SNPE supports the JPEG File Interchange Format's YUV pixel specification. The equations governing conversion between BGR and YUV pixels are given at https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion

ARGB32

The ARGB32 format consists of 4 bytes per pixel: one byte for Red, one for Green, one for Blue and one for the alpha channel. The alpha channel is ignored. For little endian CPUs, the byte order is BGRA. For big endian CPUs, the byte order is ARGB.

RGBA

The RGBA format consists of 4 bytes per pixel: one byte for Red, one for Green, one for Blue and one for the alpha channel. The alpha channel is ignored. The byte ordering is endian independent and is always RGBA byte order.

BGR

The BGR format consists of 3 bytes per pixel: one byte for Red, one for Green and one for Blue. The byte ordering is endian independent and is always BGR byte order.

### Image Color Space Conversion Example

The following prototxt and converter options describe a network where an input image of size 256x256 in the NV21 format is converted to an image also of size 256x256 in BGR format.

layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 } }
}

snpe-caffe-to-dlc -c net.prototxt --encoding nv21 -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of this encoding conversion:

--------------------------------------------------------------------------------
| Id | Name | Type | Inputs | Outputs | Out Dims    | Parameters               |
--------------------------------------------------------------------------------
| 0  | data | data | data   | data    | 1x256x256x3 | input_encoding_in: nv21  |
|    |      |      |        |         |             | input_encoding_out: bgr  |
|    |      |      |        |         |             | input_type: image        |


## Image Scaling

SNPE supports scaling the input image size as a preprocessing operation. The interpolation algorithm is Unit Square Bilinear. See https://en.wikipedia.org/wiki/Bilinear_interpolation#Unit_Square for more details.

This operation is added by specifying the input size option to the snpe-caffe-to-dlc conversion tool. The source image size is specified to the converter, while the target image size is determined from the input layer in the network prototxt. See snpe-caffe-to-dlc for details about this option.

### Image Scaling Example

The following prototxt and converter options describe a network where an input image of size 256x256 in the BGR encoding is scaled to a size 227x227 also in BGR encoding.

layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
}

snpe-caffe-to-dlc -c net.prototxt --input_size 256 256 -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of this input image scaling:

-------------------------------------------------------------------------------------------------------
| Id | Name       | Type    | Inputs | Outputs    | Out Dims    | Parameters                          |
-------------------------------------------------------------------------------------------------------
| 0  | data       | data    | data   | data       | 1x256x256x3 | input_preprocessing: passthrough    |
|    |            |         |        |            |             | input_type: default                 |
| 1  | data_scale | scaling | data   | data_scale | 1x227x227x3 | pad_value: 0                        |
|    |            |         |        |            |             | maintain_aspect_ratio: 0            |
|    |            |         |        |            |             | input_dim: [1, 256, 256, 3]         |
|    |            |         |        |            |             | output_dim: [1, 227, 227, 3]        |


## Image Cropping

SNPE supports cropping of input image size as a preprocessing operation. The cropped image is extracted from the center of the source image.

This operation is added by modifying the network prototxt. To specify a crop operation, add a transform_param block with a crop_size option to the Input layer in the network prototxt. Caffe's Input layer does not support cropping, and this transform parameter is meaningless to Caffe. However, snpe-caffe-to-dlc will read this parameter and add a preprocessing image cropping layer to the DLC file. The source image size is taken from the input_param shape and the destination or cropped image size is taken from the transform_param block.

The modification to the network prototxt resembles:

layer {
...
type: "Input"
input_param { shape: { <source image dimensions> } }
transform_param { crop_size: <cropped image dimensions> }
}

### Image Cropping Example

The following prototxt and converter options describe a network where an input image of size 256x256 in the BGR encoding is center cropped to a size 227x227 also in BGR encoding.

layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 } }
transform_param { crop_size: 227 }
}

snpe-caffe-to-dlc -c net.prototxt -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of this input image cropping:

-----------------------------------------------------------------------------------------------
| Id | Name      | Type | Inputs | Outputs   | Out Dims    | Parameters                       |
-----------------------------------------------------------------------------------------------
| 0  | data      | data | data   | data      | 1x256x256x3 | input_preprocessing: passthrough |
|    |           |      |        |           |             | input_type: default              |
| 1  | data_crop | crop | data   | data_crop | 1x227x227x3 | offsets[0]: 0                    |
|    |           |      |        |           |             | offsets[1]: 14                   |
|    |           |      |        |           |             | offsets[2]: 14                   |
|    |           |      |        |           |             | offsets[3]: 0                    |


## Image Mean Subtraction

SNPE supports image mean subtraction as a preprocessing operation. As with Caffe, SNPE supports both a mean image file in Caffe's binaryproto format, and also mean subtraction with constant channel values. Note that only one of these operations can be specified for a network.

Mean Subtraction With Mean Image

This operation is added by modifying the network prototxt. To specify an image mean subtraction, add a transform_param block with mean_file option to the Input layer in the network prototxt. As in Caffe, the mean file must be a binaryproto file. This file is usually generated during database creation in Caffe. Note that the path to mean file must be an absolute path.

Caffe's Input layer does not support mean subtraction, and this transform parameter is meaningless to Caffe. However, snpe-caffe-to-dlc will read this parameter and add a preprocessing mean subtraction layer to the DLC file.

Additionally, SNPE supports specifying a mean image that is larger than the input image size. In this case SNPE will use a center-crop from the mean image to do the mean subtraction.

The modification to the network prototxt resembles:

layer {
...
type: "Input"
...
transform_param {
mean_file: "<absolute path to binaryproto file>"
}
}


Mean Subtraction With Channel Values

This operation is added by modifying the network prototxt. To specify a constant channel mean subtraction, add a transform_param block with mean_value options to the Input layer in the network prototxt. As in Caffe, the order of the mean values is blue channel, then green channel, then red channel.

Caffe's Input layer does not support mean subtraction, and this transform parameter is meaningless to Caffe. However, snpe-caffe-to-dlc will read this parameter and add a preprocessing mean subtraction layer to the DLC file.

The modification to the network prototxt resembles:

layer {
...
type: "Input"
...
transform_param {
mean_value: <blue value>
mean_value: <green value>
mean_value: <red value>
}
}


### Image Mean Subtraction Examples

Mean Subtraction With Mean Image

The following prototxt and converter options describe a network where a mean image is subtracted from the input image of size 256x256.

layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 } }
transform_param {
mean_file: "/absolute/path/to/net.binaryproto"
}
}

snpe-caffe-to-dlc -c net.prototxt -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of this mean subtraction operation:

--------------------------------------------------------------------------------------------------------------------------
| Id | Name               | Type          | Inputs | Outputs            | Out Dims    | Parameters                       |
--------------------------------------------------------------------------------------------------------------------------
| 0  | data               | data          | data   | data               | 1x256x256x3 | input_preprocessing: passthrough |
|    |                    |               |        |                    |             | input_type: default              |
| 1  | data_subtract_mean | subtract_mean | data   | data_subtract_mean | 1x256x256x3 |                                  |


Mean Subtraction With Channel Values

The following prototxt and converter options describe a network where constant channel values are subtracted from an input image of size 256x256. A value of 104 is subtracted from the blue channel, 117 from the green channel, and 123 from the red channel.

layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 256 dim: 256 } }
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
}
}

snpe-caffe-to-dlc -c net.prototxt -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of this mean subtraction operation:

--------------------------------------------------------------------------------------------------------------------------
| Id | Name               | Type          | Inputs | Outputs            | Out Dims  | Parameters                         |
--------------------------------------------------------------------------------------------------------------------------
| 0  | data               | data          | data   | data               | 1x256x256x3 | input_preprocessing: passthrough |
|    |                    |               |        |                    |           | input_type: default                |
| 1  | data_subtract_mean | subtract_mean | data   | data_subtract_mean | 1x256x256x3 |                                  |


# Multiple Preprocessing Operations Example

Image preprocessing operations can be chained in the order described above.

The following prototxt and converter options describe a network where the following preprocessing operations occur in order:

1. A NV21 image of size 800x600 is converted to a BGR image of size 800x600
2. The 800x600 BGR image is scaled down to a 227x227 BGR image
3. Constant channel values are subtracted from the 227x227 BGR image. A value of 104 is subtracted from the blue channel, 117 from the green channel, and 123 from the red channel.
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
}
}

snpe-caffe-to-dlc -c net.prototxt --encoding nv21 --input_size 800 600 -d net.dlc
snpe-dlc-info -i net.dlc


The output from snpe-dlc-info verifies the details of these chained preprocessing operations:

--------------------------------------------------------------------------------------------------------------------------
| Id | Name               | Type          | Inputs     | Outputs            | Out Dims    | Parameters                   |
--------------------------------------------------------------------------------------------------------------------------
| 0  | data               | data          | data       | data               | 1x600x800x3 | input_encoding_in: nv21      |
|    |                    |               |            |                    |             | input_encoding_out: bgr      |
|    |                    |               |            |                    |             | input_type: image            |
| 1  | data_scale         | scaling       | data       | data_scale         | 1x227x227x3 | pad_value: 0                 |
|    |                    |               |            |                    |             | maintain_aspect_ratio: 0     |
|    |                    |               |            |                    |             | input_dim: [1, 600, 800, 3]  |
|    |                    |               |            |                    |             | output_dim: [1, 227, 227, 3] |
| 2  | data_subtract_mean | subtract_mean | data_scale | data_subtract_mean | 1x227x227x3 |                              |