Tensorflow MobilenetSSD model
Caffe MobilenetSSD model

Tensorflow MobilenetSSD model

Tensorflow Mobilenet SSD frozen graphs come in a couple of flavors. The standard frozen graph and a quantization aware frozen graph. The following example uses a quantization aware frozen graph to ensure accurate results on the SNPE runtimes.

Prerequisites

The quantization aware model conversion process was tested using Tensorflow v1.11 however other versions may also work. The CPU version of Tensorflow was used to avoid out of memory issues observed across various GPU cards during conversion.

Setup the Tensorflow Object Detection Framework

The quantization aware model is provided as a TFLite frozen graph. However SNPE requires a Tensorflow frozen graph (.PB). To convert the quantized model, the object detection framework is used to export to a Tensorflow frozen graph. Follow these steps to clone the object detection framework:

mkdir ~/tfmodels
cd ~/tfmodels
git clone https://github.com/tensorflow/models.git
Checkout a tested object detection framework commit (SHA)
- git checkout ad386df597c069873ace235b931578671526ee00

Follow these installation instructions to setup the Tensorflow object detection framework:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

Download the quantization aware model

A specific version of the Tensorflow MobilenetSSD model has been tested: ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz

wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz

After downloading the model extract the contents to a directory.

tar xzvf ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz

Export a trained graph from the object detection framework

Follow these instructions to export the Tensorflow graph:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md

or modify and execute this sample script

Create this file, export_train.sh, using your favorite editor. Modify the paths to the correct directory location of the downloaded quantization aware model files.

#!/bin/bash
INPUT_TYPE=image_tensor
PIPELINE_CONFIG_PATH=<path_to>/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/pipeline.config
TRAINED_CKPT_PREFIX=<path_to>/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt
EXPORT_DIR=<path_to>/exported
pushd ~/tfmodels/models/tfmodels/research
python object_detection/export_inference_graph.py \
--input_type=${INPUT_TYPE} \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
--output_directory=${EXPORT_DIR}
popd

Make the script executable

chmod u+x export_train.sh

Run the script

./export_train.sh

This should generate a frozen graph in <path_to>/exported/frozen_inference_graph.pb

Convert the frozen graph using the snpe-tensorflow-to-dlc converter.

snpe-tensorflow-to-dlc --input_network <path_to>/exported/frozen_inference_graph.pb --input_dim Preprocessor/sub 1,300,300,3 --out_node detection_classes --out_node detection_boxes --out_node detection_scores ---output_path mobilenet_ssd.dlc --allow_unconsumed_nodes

After SNPE conversion you should have a mobilenet_ssd.dlc that can be loaded and run in the SNPE runtimes.

The output layers for the model are:

Postprocessor/BatchMultiClassNonMaxSuppression
add

The output buffer names are:

(classes) detection_classes:0 (+1 index offset)
(classes) Postprocessor/BatchMultiClassNonMaxSuppression_classes (0 index offset)
(boxes) Postprocessor/BatchMultiClassNonMaxSuppression_boxes
(scores) Postprocessor/BatchMultiClassNonMaxSuppression_scores

Caffe MobilenetSSD model

A specific version of the Caffe MobilenetSSD model has been tested: Caffe MobilenetSSD.

There is pre-trained caffe model you can download from Caffe MobilenetSSD.
Download the following two files:

wget https://github.com/chuanqi305/MobileNet-SSD/blob/master/MobileNetSSD_deploy.prototxt
wget https://github.com/chuanqi305/MobileNet-SSD/blob/master/MobileNetSSD_deploy.caffemodel

Convert the model using the snpe-caffe-to-dlc converter.

snpe-caffe-to-dlc --input_network MobileNetSSD_deploy.prototxt --caffe_bin MobileNetSSD_deploy.caffemodel --output_path caffe_mobilenet_ssd.dlc

The input and output layers:

Input layer is specified in MobileNetSSD_deploy.prototxt file, via input_shape.
By default, the output layer is the last layer as specified in MobileNetSSD_deploy.prototxt file. In this case that is detection_out (DetectionOutput) layer.

To see info about converted DLC model, use snpe-dlc-info tool

snpe-dlc-info -i caffe_mobilenet_ssd.dlc

The PriorBox layer is folded by the converter (for model/performance optimization reasons). Consequently, PriorBox layer will not be written into DLC file, hence it will not be listed in DLC info for the model.

Training the model

To train the model in Caffe, follow instructions at Caffe MobilenetSSD.

Running the model in SNPE

The following are limitations and suggestions for running DLC model in SNPE:

Batch dimension > 1 is not supported.
DetectionOutput layer is supported on CPU runtime processor only.
To run the model using different runtime processor, such as GPU or DSP, CPU fallback mode must be enabled in Runtime List (see SNPEBuilder::setRuntimeProcessorOrder() description in C++ API).
If using snpe-net-run tool, use –runtime_order option
It is recommended to have all DetectionOutput layers in the network listed at the end in the .prototxt file.
This is to minimize runtime overhead incurred by CPU fallback.
Caffe MobilenetSSD .prototxt has DetectionOutput layer at the end by default, but if the network has more than one detection output branch, that may not be the case.
Simply edit .prototxt file, locate and move all DetectionOutput layer definitions to end of the file.
Configure DetectionOutput layer reasonably.
Performance of DetectionOutput layer (i.e. processing time) is function of layer parameters: top_k, keep_top_k and confidence_threshold.
For example, top_k parameters have practically exponential impact on processing time; e.g. top_k=100 will result in much smaller processing time vs. top_k=1000. Smaller confidence_threshold will result in larger number of boxes to output, and vice versa.
Resizing input dimensions at SNPE object creation/build time is not allowed.
Note that input dimensions are embedded into DLC model during conversion, but in some cases can be overridden via SNPEBuilder::setInputDimensions() (see description in C++ API) at SNPE object creation/build time. Due to PriorBox layer folding in the model converter, input/network resizing is not possible.