Snapdragon Neural Processing Engine SDK
Reference Guide
User-Defined Layers (UDL) Tutorial


  • The SNPE SDK has been set up following the SNPE Setup chapter.
  • The Tutorials Setup has been completed.
  • Caffe cloned and built from source. Follow the instructions at Caffe & Caffe2 Setup to fetch and compile Caffe from source.


This tutorial goes through the complete user-defined layer workflow from training a model with a newly-defined layer to implementing the custom layer with SNPE's UDL APIs.


SNPE currently only supports user-defined layers in Caffe.

UDL Workflow

The following figure shows an overview of the modifications needed to support user-defined layers that are unknown to the SNPE runtime and converters.

There are two main pieces that need to be implemented to add support for user-defined layers:

  • Extend the model converter to handle the weights and parameters for the layer
  • Implement the code needed to execute the layer with SNPE runtime

UDL workflow

Basing a UDL on the Scale Layer in Caffe

For this tutorial, the existing Scale layer implementation in Caffe will be used as the basis. ScaleParameter prototxt will be re-used to carry the parameters for the custom layer.

  1. Create copies of the Scale layer code

    cp $CAFFE_HOME/include/caffe/layers/scale_layer.hpp $CAFFE_HOME/include/caffe/layers/mycustomscale_layer.hpp
    cp $CAFFE_HOME/src/caffe/layers/scale_layer.cpp $CAFFE_HOME/src/caffe/layers/mycustomscale_layer.cpp
    cp $CAFFE_HOME/src/caffe/layers/ $CAFFE_HOME/src/caffe/layers/

  2. Edit $CAFFE_HOME/include/caffe/layers/mycustomscale_layer.hpp with the following modifications:

    • Rename the class ScaleLayer to MyCustomScaleLayer
    • Rename the class constructor
    • Edit the type() method to return "MyCustomScale" instead of "Scale"

  3. Edit $CAFFE_HOME/src/caffe/layers/mycustomscale_layer.cpp with the following modifications:

    • Change the #include directive to include mycustomscale_layer.hpp instead of scale_layer.hpp
    • Change all the method definitions to be of class MyCustomScaleLayer instead of ScaleLayer
    • Change the STUB_GPU and INSTANTIATE_CLASS macros to pass MyCustomScaleLayer instead of ScaleLayer
    • Change the REGISTER_LAYER_CLASS macro to pass MyCustomScale instead of Scale

  4. Edit $CAFFE_HOME/src/caffe/layers/ with the following modifications:

    • Change the #include directive to include mycustomscale_layer.hpp instead of scale_layer.hpp
    • Change all the method definitions to be of class MyCustomScaleLayer instead of ScaleLayer
    • Change the INSTANTIATE_LAYER_GPU_FUNCS macro to pass MyCustomScaleLayer instead of ScaleLayer

  5. Rebuild Caffe with the new layer:
    cd $CAFFE_HOME
    make all
    make distribute
    make pycaffe

Training MNIST with Modified LeNet Model that Includes MyCustomScale Layer

Next, add MyCustomScale to the MNIST LeNet model to incorporate the MyCustomScale layer and train it to generate a caffemodel file.

  1. Follow the LeNet on MNIST Caffe tutorial at to train the unmodified model. This is also required to fetch the MNIST training and testing datasets.

  2. Create copies of the LeNet model and training scripts

    cd $CAFFE_HOME/examples/mnist
    cp lenet.prototxt mycustomlenet.prototxt
    cp lenet_train_test.prototxt mycustomlenet_train_test.prototxt
    cp lenet_solver.prototxt mycustomlenet_solver.prototxt

  3. Edit mycustomlenet.prototxt with the following modifications:

    • Modify the batch dimension of data layer to 1. The modified definition should resemble:
      layer {
        name: "data"
        type: "Input"
        top: "data"
        input_param { shape: { dim: 1 dim: 1 dim: 28 dim: 28 } }
    • Insert MyCustomScale layer after relu1. The layer definition looks as follows:
      layer {
        bottom: "ip1"
        top: "scale"
        name: "scale"
        type: "MyCustomScale"
        scale_param {
          bias_term: false
    • Modify the bottom parameter in the following ip2 layer from ip1 to scale. The modified definition should resemble:
      layer {
        name: "ip2"
        type: "InnerProduct"
        bottom: "scale"
        top: "ip2"

  4. Edit mycustomlenet_train_test.prototxt with the following modifications:

    • Insert MyCustomScale layer after relu1. For the initialization of scale weights, use a filler of zero weights. This ensures that the final trained model is highly sensitive to the scale layer weights. The layer definition should look as follows:
      layer {
        bottom: "ip1"
        top: "scale"
        name: "scale"
        type: "MyCustomScale"
        scale_param {
          bias_term: false
          filler: { value: 0 }
      Note that scale_param is re-used to carry MyCustomScale's parameters.
    • Modify the bottom parameter in the following ip2 layer from ip1 to scale. The modified definition should resemble:
      layer {
        name: "ip2"
        type: "InnerProduct"
        bottom: "scale"
        top: "ip2"

  5. Edit mycustomlenet_solver.prototxt with the following modifications:

    • Modify the 'net' parameter to point to mycustomlenet_train_test.prototxt
    • Modify the 'snapshot_prefix' parameter to be "examples/mnist/mycustomlenet"

  6. Edit to take in the mycustomlenet_solver.prototxt solver file.

  7. Finally, train the modified model.
    cd $CAFFE_HOME
    The training will generate the weights into mycustomlenet_iter_10000.caffemodel file. This in combination with mycustomlenet.prototxt model definition file will be used to generate the SNPE dlc model.

Model Conversion Tool Extension

Caffe models are converted to DLC models using the snpe-caffe-to-dlc executable. Support for user-defined layers is added by passing the [–udl] parameter with the filename and factory function of your udl python module.

To convert the Caffe model generated earlier with the MyCustomScale layer, run the following command:
Note: Follow setup instructions for setting SNPE_ROOT with CAFFE before running the following.

export PYTHONPATH=$PYTHONPATH:$SNPE_ROOT/examples/Python/UdlExample
python snpe-caffe-to-dlc \
    --input_network $CAFFE_HOME/examples/mnist/mycustomlenet.prototxt \
    --caffe_bin $CAFFE_HOME/examples/mnist/mycustomlenet_iter_10000.caffemodel \
    ---udl my_udl_layers udl_factory_func
    --output_path mycustomlenet.dlc

View the model details with snpe-dlc-info.

snpe-dlc-info -i mycustomlenet.dlc

Note the details of the user-defined layer.

| 5  | ip1   | fc            | pool2     | ip1       | 1x500    | param count: 400k (92.9%)        |
|    |       |               |           |           |          | MACs: 400k (17.4%)               |
| 6  | relu1 | neuron        | ip1       | relu1.ip1 | 1x500    | a: 0                             |
|    |       |               |           |           |          | b: 0                             |
|    |       |               |           |           |          | min_clamp: 0                     |
|    |       |               |           |           |          | max_clamp: 0                     |
|    |       |               |           |           |          | func: relu                       |
| 7  | scale | user_defined  | relu1.ip1 | scale     | 1x500    | blob_size: 2017                  |
| 8  | ip2   | fc            | scale     | ip2       | 1x10     | param count: 5k (1.16%)          |
|    |       |               |           |           |          | MACs: 5k (0.218%)                |
| 9  | prob  | softmax       | ip2       | prob      | 1x10     |                                  |

How --udl parameter works with snpe-caffe-to-dlc to support MyCustomLayer

Examine $SNPE_ROOT/examples/Python/UdlExample/

  1. udl_factory_func: the –udl parameters expects users to pass Filename, Function name.

    • Filename: Name of python module to load for registering custom udl(note: must be in PYTHONPATH). If file is part of a package, list the package.filename as you would when doing a python import.
    • Function name: Name of the udl factory function that returns a dictionary with layer_type as key and an object of UDL class as value. After instantiating the converter, this is used to set the UDL mapping from layer types to their handlers.

    For our example, the Filename is my_udl_layers and Function name is udl_factory_func.

  2. The UDL handler map (udl_supported_types) is a dictionary where the key is the layer name and the value is the handler class for that layer.

    # UDL layer name to UDL class map
    udl_supported_types = {
    'MyCustomScale': udl_mycustomscale

  3. The UDL handler instance (udl_mycustomscale) needs a callback function and the expected input/output axes order of your custom layer implementation.

    # Instance of Udl class for mycustomscale layer
    udl_mycustomscale = snpe_udl_utils.Udl(layer_callback=udl_mycustomscale_func,
    expected_axes_orders=[# First supported input/output axes order (4D: NSC input, NSC output)
    ( # input dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.HEIGHT,
    AxisTracker.AxisAnnotations.WIDTH, AxisTracker.AxisAnnotations.CHANNEL],
    # output dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.HEIGHT,
    AxisTracker.AxisAnnotations.WIDTH, AxisTracker.AxisAnnotations.CHANNEL]
    # Second supported input/output axes order (3D: NS input, NS output)
    ( # input dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.HEIGHT,
    # output dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.HEIGHT,
    # Third supported input/output axes order (2D: NF input, NF output)
    ( # input_dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.FEATURE],
    # output_dims
    [AxisTracker.AxisAnnotations.BATCH, AxisTracker.AxisAnnotations.FEATURE]

  4. # Conversion callback function for MyCustomerLayer
    def udl_mycustomscale_func(layer, input_dims):
    blob = UdlBlobMyCustomScale(layer)
    return snpe_udl_utils.UdlBlobOutput(blob=blob, out_dims=input_dims)

    The handler callback function for a custom layer receives the following parameters:

    • layer: The prototxt definition of the Caffe layer (does not include weights and biases)
    • input_dims: Dimensions of the inputs to the custom layer

    The handler function returns:

    • blob: A wrapper object that wraps all the layer weights and params in a single packed buffer
    • out_dims: Dimensions of the outputs of the custom layer

    For MyCustomScale layer, the output dimensions are the same as the input dimensions.

  5. The UdlBlobMyCustomScale is a helper class used by udl_mycustomscale_func to generate a packed buffer that contains all the layer params and weights. UdlBlobMyCustomScale delegates the packing to another sub-class.

    # UdlBlobMyCustomScale.__init__()
    snpe_params = UdlBlobMyCustomScale..MyCustomScaleLayerParam()

    The remaining of UdlBlobMyCustomScale class's init method fills out the fields that need to be packed. For MyCustomScale, the only layer parameter is bias_term. Note that the handler function did not receive any layer weights, they come from the converter through weight_provider. The blob2arr() utility method provided by SNPE converts the Caffe blob to a linear array of floats.

    caffe_weights = snpeUtils.blob2arr(weight_provider.weights_map[][0])

    Finally, call serialize() to generate the packed data.

    self._blob = snpe_params.serialize()
    self._size = len(self._blob)

  6. The MyCustomScaleLayerParam class, defined as subclass inside in UdlBlobMyCustomScale, packs all the layer data into a single buffer. This class defines the format by which the native code must load the layer data in the serialize() method.
    # MyCustomScaleLayerParam.serialize()
    packed = struct.pack('i', self.type)
    First integer indicates the layer type.
    packed += struct.pack('?', self.bias_term)
    Next the parameter bias_term is a boolean type.
    packed += struct.pack('I%sI' % len(self.weights_dim),
    len(self.weights_dim), *self.weights_dim)
    Next the array containing the dimensions of the weights. This is done by first packing an integer marking the number of elements in the array followed by the array values, which are also integers.
    packed += struct.pack('I%sf' % len(self.weights_data),
    len(self.weights_data), *self.weights_data)
    Finally the array containing the weight values. This is done by first packing an integer marking the number of elements in the array followed by the array values, which are floats.

Changes for SNPE Runtime

This section outlines the code changes necessary to implement a user-defined layer called by SNPE runtime.

A native example that implements the MyCustomScale layer is located at $SNPE_ROOT/examples/NativeCpp/UdlExample. This example extends the native BatchRun utility to support MyCustomScale layer via SNPE's UDL APIs.


// Specify UDL bundle prior to creating a SNPE object
udlBundle.cookie = (void*)0xdeadbeaf;
udlBundle.func = udlexample::MyUDLFactory;
zdl::DlSystem::RuntimeList runtimeList;
snpe = snpeBuilder.setOutputLayers({})

The setUdlBundle() method in SNPEBuilder is passed a UDLBundle object. This bundle object contains two fields:

  • cookie: Pointer to an opaque data type. This pointer is returned to the user during layer construction and execution.
  • func: User-implemented factory method to handle the construction of user-defined layers.
// Function signature of the UDL Factory function
zdl::DlSystem::IUDL* MyUDLFactory(void* cookie, const zdl::DlSystem::UDLContext* c)

The UDL factory method receives the opaque data passed in earlier (cookie) and a UDLContext object. The UDLContext object contains all the available information about the custom layer.

// MyUDLFactory()
const void* blob = c->getBlob();
size_t size = c->getSize();
myudl::CommonLayerParams params;
if(!myudl::ParseCommonLayerParams(blob, size, params)) {
PrintErrorStringAndExit("Failed to parse common layer params");

The UDL factory method parses the layer data to retrieve the layer type. Once the layer type is known the method instantiates and returns a pointer to the layer object.

The function for parsing the layer type from the packed layer data is defined in MyUdlLayers.hpp. It reads the first integer from the buffer, which denotes the layer type.

// MyUDLFactory()
switch(params.type) {
return new myudl::UdlMyCustomScale(*c);
PrintErrorStringAndExit("Unknown layer type");

Finally the factory method instantiates the requested layer.


A UDL implementation must extend the interface zdl::DlSystem::IUDL. This interface defined the following methods that need to be implemented by the user:

  • Setup: Gives the layer an opportunity to setup any internal state.
  • Close: The layer should release any handles and deallocate any resources.
  • Execute: A single forward pass of the layer. The layer should compute the outputs given the inputs.

Further, in this example a private method ParseMyCustomLayerParams() is used to handle the parsing of the packed data from the DLC model.


The UdlMyCustomScale::ParseMyCustomLayerParams() method handles the parsing of the packed layer data to initialize the parameters and weights for MyCustomScale layer. The unpacking is performed in the same order as the packing in

The UdlMyCustomScale::Setup() method performs layer initialization. It receives the following parameters:

  • size_t insz: Number of inputs
  • const size_t* indim[]: Dimensions for each of the inputs. There are insz number of pointers, where each points to an array containing the input dimensions for the corresponding input
  • const size_t indimsz[]: The size of each array in indim[]
  • size_t outsz: Number of outputs
  • const size_t* outdim[]: Dimensions for each of the outputs. There are outsz number of pointers, where each points to an array containing the output dimensions for the corresponding output
  • const size_t outdimsz[]: The size of each array in outdim[]
if (insz != 1 or outsz != 1) {
return false;

It first checks the number of inputs and outputs. In this example a single-input single-output scale layer is implemented.

size_t inszdim = getSizeByDim(
std::vector<size_t>(indim[0], indim[0] + indimsz[0]));
m_OutSzDim = getSizeByDim(
std::vector<size_t>(outdim[0], outdim[0] + outdimsz[0]));
if (inszdim != m_OutSzDim) {
return false;

Next it verifies that the total size of input matches the total size of the output as the Scale operation is one-to-one.

if (!ParseMyCustomLayerParams(blob, m_Context.getSize(), m_Params)) {
return false;

Then it calls the ParseMyCustomLayerParams() method to read the packed layer data and initialize the custom layer.

Finally, the method verifies that bias_term is set to false since this example only implements the multiplicative scale operation.

The UdlMyCustomScale::Execute() method implements the forward pass for the custom layer.

for (size_t i = 0; i < m_OutSzDim; i++) {
output[0][i] = input[0][i] * m_Params.weights_data[i];

It scales each input value by the corresponding weight.

Executing the Model

With all the runtime pieces in place, BatchRun can be built with support for MyCustomScale user-defined layer.

First move to UDL example base directory.

cd $SNPE_ROOT/examples/NativeCpp/UdlExample

Run the following command to compile for x86 Linux targets.

make -f Makefile.x86_64-linux-clang

To build with clang/libc++ SNPE binaries (i.e., arm-android-clang6.0 and aarch64-android-clang6.0), use the following command:

ndk-build NDK_TOOLCHAIN_VERSION=clang APP_STL=c++_shared

The UDL-enabled BatchRun can be run on a Linux Host with the UDL model generated earlier. Sample inputs of hand-written characters are provided in $SNPE_ROOT/models/mnist/data.

cd $SNPE_ROOT/models/mnist/data
$SNPE_ROOT/examples/NativeCpp/UdlExample/obj/local/x86_64-linux-clang/snpe-net-run-udl \
  --container $SNPE_ROOT/examples/Python/UdlExample/mycustomlenet.dlc \
  --input_list image_list.txt

The MyCustomScale layer's Setup() method should output information about the layer parameters and weights:

UdlMyCustomScale::Setup() of name scale
UdlMyCustomScale::Setup() input size dim: 500, output: 500
UdlMyCustomScale::Setup() got blob size 2017
UdlMyCustomScale::Setup() bias_term=0
UdlMyCustomScale::Setup() weight dimensions: (500,)
UdlMyCustomScale::Setup() # weights=500

The outputs can be verified with the following script:

python ../scripts/ output/Result_0/prob.raw
python ../scripts/ output/Result_1/prob.raw
python ../scripts/ output/Result_2/prob.raw
python ../scripts/ output/Result_3/prob.raw

The classification results should be 0, 3, 5 and 9, respectively.