DeepLab-v3 using Qualcomm Neural Processing SDK for AI on Android

Running a DeepLab model for image segmentation on the mobile device

This article describes an Android application based on the machine learning capabilities of the Qualcomm® Neural Processing SDK for AI, deep learning software for Snapdragon® mobile platforms. The SDK is used to convert trained models from ONNX and TensorFlow to the Deep Learning Container (.dlc) format supported on Snapdragon. The following exercises describe how to use trained models in an Android application to perform semantic segmentation using DeepLab-v3 support from the SDK.

How does it work?

The DeepLab-enabled Android application opens a camera preview, takes a picture and converts it to a bitmap. The network is built with NeuralNetworkBuilder when the .dlc file is passed as input. The bitmap goes to the model for inference, which returns FloatTensor output. The output goes to post-processing for manipulation, which changes the background of the original image from color to black and white.

Prerequisites

It is helpful to have experience in developing Android applications.

Follow the instructions for setting up the Neural Processing SDK.

Runtime permissions

To capture live frames, the application requires runtime permissions for camera access. The permissions are mandatory for Android version 6.0 and later (API > 23) before performing any action which requires access to any resource from the user or the system.

In the Android project, declare the following code in AndroidManifest.xml:

<uses-permission android:name="android.permission.CAMERA"/>

To use the Camera2 API from the Android SDK, add the following feature to AndroidManifest.xml:

<uses-feature android:name="android.hardware.camera2" />

Set the camera permission as follows:

if (ContextCompat.checkSelfPermission(AppContext,Manifest.permission.CAMERA) !=PackageManager.PERMISSION_GRANTED) { // Grant Camera Permission }

If camera permission is not set, then a pop-up window will request permission to access the camera.

Loading the model

The following code connects the neural network and loads the model:

@override
protected NeuralNetwork doInBackgroung(File.. params){
final SNPE.NeuralNetworkBuilder builder = new
SNPE.NeuralNetworkBuilder(mApplicationContext)
// Sets Runtime order for Neural Network
.setRuntimeOrder(DSP, GPU, CPU)
// Loads a model from DLC file
.setModel(new File("<model-path>"))
// Build the network
network = builder.build();
}

Capturing preview using Camera2 API

TextureView from the Android SDK is used to render the camera preview in the application. TextureView.SurfaceTextureListener is an interface used to notify when the surface texture is available.

private final TextureView.SurfaceTextureListener mSurfaceTextureListener
= new TextureView.SurfaceTextureListener() {
@Override
public void onSurfaceTextureAvailable(SurfaceTexture surfaceTexture, int width, int height) {
//code to check runtime permission
if (ContextCompat.checkSelfPermission(getActivity(), Manifest.permission.CAMERA)
!= PackageManager.PERMISSION_GRANTED) {
requestCameraPermission();
return;
}
CameraManager manager = (CameraManager) activity.getSystemService(Context.CAMERA_SERVICE);
try {
if (!mCameraOpenCloseLock.tryAcquire(2500, TimeUnit.MILLISECONDS)) {
throw new RuntimeException("Time out waiting to lock camera opening.");
}
//mCameraId = 1(Front Camera), mCameraId = 0(Rear Camera)
manager.openCamera(mCameraId, mStateCallback, mBackgroundHandler);
} catch (CameraAccessException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}

Camera callbacks

CameraDevice.StateCallback is used for receiving updates about the state of a camera device. In the following overridden method, surface texture is created to capture the preview and obtain the frames.

@Override
public void onOpened(@NonNull CameraDevice cameraDevice) {
// Camera preview is started here
mCameraOpenCloseLock.release();
mCameraDevice = cameraDevice;
Surface surface = mPreview.getSurface();
mPreviewRequestBuilder=mCameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW);
mPreviewRequestBuilder.addTarget(surface);
}

Getting image data from ImageReader

The ImageReader class allows direct application access to image data rendered into a surface. The application uses this class to fetch the file path of the image created, as follows:

private final ImageReader.OnImageAvailableListener mOnImageAvailableListener
= new ImageReader.OnImageAvailableListener() {
@Override
public void onImageAvailable(ImageReader reader) {
mBackgroundHandler.post(new ImageSaver(reader.acquireNextImage(), mFile));
}
};

Running object inference

As shown below, the bitmap image is converted to an RGBA byte array of size 513×513×3. Basic image processing depends on the input shape required by the model; then it is necessary to convert the processed image into the tensor. The prediction API requires a tensor format with type Float which returns the object prediction as a Map<String, FloatTensor> object.

private Map<String, FloatTensor> inferenceOnBitmap(Bitmap scaledBitmap) {
final Map<String, FloatTensor> outputs;
try {
if (mNeuralnetwork == null || mInputTensorReused == null || scaledBitmap.getWidth() != getInputTensorWidth() || scaledBitmap.getHeight() != getInputTensorHeight()) {
Logger.d("SNPEHelper", "No NN loaded, or image size different than tensor size");
return null;
}
//Bitmap to RGBA byte array (size: 513*513*3 (RGBA..))
mBitmapToFloatHelper.bitmapToBuffer(scaledBitmap);
//Pre-processing: Bitmap (513,513,4 ints) -> Float Input Tensor (513,513,3 floats)
final float[] inputFloatsHW3 = mBitmapToFloatHelper.bufferToNormalFloatsBGR();
if (mBitmapToFloatHelper.isFloatBufferBlack())
return null;
mInputTensorReused.write(inputFloatsHW3, 0, inputFloatsHW3.length, 0, 0);
// execute the inference
outputs = mNeuralnetwork.execute(mInputTensorsMap);
} catch (Exception e) {
e.printStackTrace();
Logger.d("SNPEHelper", e.getCause() + "");
return null;
}
return outputs;
}

Processing pixels from RGBA (0...255) to BGR(-1...1)

public float[] bufferToNormalFloatsBGR() {
final byte[] inputArrayHW4 = mByteBufferHW4.array();
final int area = mFloatBufferHW3.length / 3;
long sumG = 0;
int srcIdx = 0, dstIdx = 0;
final float inputScale = 0.00784313771874f;
for (int i = 0; i < area; i++) {
// NOTE: the 0xFF a "cast" to unsigned int (otherwise it will be negative numbers for bright colors)
final int pixelR = inputArrayHW4[srcIdx] & 0xFF;
final int pixelG = inputArrayHW4[srcIdx + 1] & 0xFF;
final int pixelB = inputArrayHW4[srcIdx + 2] & 0xFF;
mFloatBufferHW3[dstIdx] = inputScale * (float) pixelB - 1;
mFloatBufferHW3[dstIdx + 1] = inputScale * (float) pixelG - 1;
mFloatBufferHW3[dstIdx + 2] = inputScale * (float) pixelR - 1;
srcIdx += 4;
dstIdx += 3;
sumG += pixelG;
}
// the buffer is black if on average on average Green < 13/255 (aka: 5%)
mIsFloatBufferBlack = sumG < (area * 13);
return mFloatBufferHW3;
}

Performing image segmentation

In the following code, the output FloatTensor is further processed to change the background in an image.

The index number for people is 15; therefore, the float matrix will be 0 for the background and 15 for the person detected. Based on that matrix, an output bitmap is created with background pixels of black or white; the pixels in the main object retain their color.

MNETSSD_NUM_BOXES = mOutputs.get(MNETSSD_OUTPUT_LAYER).getSize();
// convert tensors to boxes - Note: Optimized to read-all upfront
mOutputs.get(MNETSSD_OUTPUT_LAYER).read(floatOutput, 0, MNETSSD_NUM_BOXES);
//for black/white image
int w = mScaledBitmap.getWidth();
int h = mScaledBitmap.getHeight();
int b = 0xFF;
int out = 0xFF;
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
b = b & mScaledBitmap.getPixel(x, y);
for (int i = 1; i <= 3 && floatOutput[y * w + x] != 15; i++) {
out = out << (8) | b;
}
mScaledBitmap.setPixel(x, y, floatOutput[y * w + x] != 15 ? out : mScaledBitmap.getPixel(x, y));
out = 0xFF;
b = 0xFF;
}
}
mOutputBitmap = Bitmap.createScaledBitmap(mScaledBitmap, originalBitmapW,originalBitmapH, true);

Below are sample before- and after-images showing the changed background:

Snapdragon and Qualcomm Neural Processing SDK are products of Qualcomm Technologies, Inc. and/or its subsidiaries.