Working with Machine Learning Models in the Qualcomm Neural Processing SDK for AI

Tips for developers working on machine learning apps on Android

1. Model training and conversion

Machine learning frameworks have specific formats for storing neural network models. The Qualcomm® Neural Processing SDK includes tools for converting pre-trained models to the Deep Learning Container (DLC) format. The Qualcomm® Neural Processing Engine (NPE) runtime then uses the .dlc file in executing the neural network.

Network details like input layer name, output layer name and input shape are required before converting a model. The SDK includes tools for retrieving those details and getting the network to run the application.

The SDK also includes tools for converting models from TensorFlow and ONNX frameworks to .dlc format:

Once the model conversion is done, the next step is to analyze the input shape and number of outputs from the neural network to work alongside an Android application.

2. Quantizing a model

By default, the conversion tools in the Qualcomm Neural Processing SDK convert non-quantized models into a non-quantized .dlc file. All network parameters remain in a 32-bit, floating-point representation of the original model.

For converted models (.dlc files) that are too large, the SDK includes a quantization tool, snpe-dlc-quantize, to optimize the model to an 8-bit, fixed-point representation without compromising on quality.

For information on when to use a quantized model, see Quantized vs Non-Quantized Models in the SDK documentation.

3. Setting up the runtime environment

It is necessary to set up the runtime hardware (core) on which the converted model will run.

The Snapdragon SoC consists of the CPU, the Qualcomm® Adreno™ GPU and the Qualcomm® Hexagon™ DSP. The variety of cores allows for faster processing and, therefore, faster prediction. Adreno GPU and Hexagon DSP are designed to optimize inference on the device, but depending on the neural network, some layers may not support processing in those runtime environments.

By examining the layers and design of the neural network, the Neural Processing API determines which runtime environments (CPU, GPU, DSP) are supported.

4. Loading the model

Once the runtime environment is set, the next step is to load the model (converted into a .dlc file) into it.

The following code sets up the environment and loads the model through the API:

final SNPE.NeuralNetworkBuilder builder = new SNPE.NeuralNetworkBuilder(mApplicationContext)
// Allows selecting a runtime order for the network.
// In the example below use DSP and fall back, in order, to GPU then CPU
// depending on whether any of the runtimes are available. .setRuntimeOrder(DSP, GPU, CPU)
// Loads a model from DLC file .setModel(new File("<model-path>"))
// Build the network
network = builder.build();

5. Processing input frames for real-time prediction

In the following example of a mobile app, the device processes camera frames continuously using the Camera2 API:

private class CameraSession extends android.hardware.camera2.CameraCaptureSession.CaptureCallback { @Override public void onCaptureCompleted(@NonNull CameraCaptureSession session, @NonNull CaptureRequest request, @NonNull TotalCaptureResult result) { super.onCaptureCompleted(session, request, result);
// Getting the bitmap with size of 299x299 Bitmap mBitmap = mTextureView.getBitmap(299, 299);
// Compressing it into JPEG formatting ByteArrayOutputStream stream = new ByteArrayOutputStream(); mBitmap.compress(Bitmap.CompressFormat.JPEG, 50, stream);
// Converting image into byte array. byte[] byteArray = stream.toByteArray(); Bitmap compressedBitmap = BitmapFactory.decodeByteArray(byteArray, 0, byteArray.length);
}

In an image-based, deep learning mobile application, getting the camera frames is not the only task. The most important task is to convert the frames into a proper input shape. For example, the Inception_v3 model requires an input with the shape of [1, 299, 299, 3].

6. Image classification using input frames and model object

Before the bitmap is handed off for prediction, it must undergo basic image processing, such as conversion to RGB, grayscale, etc. The image processing depends on the input shape required by the model. Examples:

Inception network input image size: 299x299x3 (3 channel image input)
MobileNet input image size: 224x224x3 (3 channel image input)
VGG16/19 input image size: 224x224x3 (3 channel image input)
fer2013 network input: 48x48x1 (1 channel image input)

Next, it is necessary to convert the processed image into the tensor. The prediction API requires a tensor format with type Float.

After all the processing is complete, the tensor goes to the neural network API for prediction, as shown in the following code:

// mNeuralNetwork is instance of NeuralNetwork class.
final Map outputs = mNeuralNetwork.execute(inputs);

where NeuralNetwork is an instance of the NeuralNetwork class, and inputs is a map of input name and tensor.

References

In the snpe-sdk/examples/android/ directory of the Qualcomm Neural Processing SDK, refer to the demo Android application for image classification based on static images.

Qualcomm Neural Processing Engine, Qualcomm Adreno, Qualcomm Hexagon and Qualcomm Neural Processing SDK are products of Qualcomm Technologies, Inc. and/or its subsidiaries.