Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.
“It’s a good model,” you say, thinking about the model you’ve trained in the cloud for your machine learning application. “I just wish we could fine-tune it on the user’s device.”
Now you can do just that.
We’ve published a toolkit for transfer learning in the Qualcomm Adreno OpenCL Machine Learning SDK v.3.0. Following the example in the SDK, you can take a pretrained MobileNet V1 model and retrain it after deployment to a mobile device. In this post, I’ll describe transfer learning and show you how to test a useful variation called fine-tuning.

Transfer learning on mobile devices
The last time I posted about ML training at the edge, I mentioned that the OpenCL ML SDK allows you to train on mobile devices. It gives you the option of fine-tuning a pretrained model by updating the weights through training passes. The core functionality that enables transfer learning and fine tuning is part of the cl_qcom_ml_ops extension to OpenCL driver for Adreno GPUs.
I also mentioned transfer learning, in which you start out with a model trained in the cloud, then update it by adjusting the weights for specific layers through edge training. It’s called “transfer learning” because it enables the transfer of model knowledge between data domains.
Fine-tuning: Adapting the model for new data without retraining every level
The new model can be trained via transfer learning with fine tuning in a variety of ways. A typical approach is to freeze the weights for the layers from the pretrained model, and only update the weights for the newly added layers during training.
With fine-tuning, you freeze the weights for all the pretrained layers except for one. Then, through edge training, the weights for the single pretrained layer are updated. Through fine-tuning, you can adapt your model for data that was not available during the initial training. The new data might be available only at the network edge where you deploy the model, far from where you trained it.
For example, you could start by training a model in the cloud to classify images into many classes. Then, as your users are traveling around the Australian Outback with their mobile devices, they could update the model (that is, transfer knowledge in) to classify local wildflowers.
The example we’ve included in the toolkit employs transfer learning with fine-tuning.
Our example of fine-tuning
This example illustrates transfer learning and fine-tuning with the OpenCL ML SDK. The objective is to start with a pretrained MobileNet V1 model, then retrain it at the edge to enable classification of flowers.
- The flower image data used for training is available from TensorFlow. It consists of labeled images for five classes of flowers: daisy, dandelion, roses, sunflowers, and tulips.
- Our example is adapted from a transfer learning and fine-tuning model based on the pretrained Keras MobileNet V1 model. This pretrained model is trained with the ImageNet dataset, which comprises 1,281,167 training images in 1000 object classes.
- Keras provides an API for loading the MobileNet V1 pretrained ImageNet model without the terminal layers that perform final classification of input images into ImageNet object classes. The portion of the MobileNet V1 Model without the classification layers is called the feature extractor (FE).
So, our model starts with the MobileNet V1 FE. But for our fine-tuning example, we replace the classification layers with four different layers as shown below — average_pooling2d, flatten, dense, softmax
_________________________________________________________
Layer (type) Output Shape Param #
==============================================
mobilenet_1.00_224 (None, 7, 7, 1024) 3228864
(Functional)
average_pooling2d (None, 1, 1, 1024) 0
(AveragePooling2D)
flatten (Flatten) (None, 1024) 0
dense (Dense) (None, 5) 5125
softmax (Softmax) (None, 5) 0
===============================================
Total params: 3,233,989
Trainable params: 1,053,701
Non-trainable params: 2,180,288
The first layer listed, mobilenet_1.00_224, is the base MobileNet V1 model (FE) without the ImageNet classifier. The other four layers — average_pooling2d, flatten, dense, softmax — are newly added layers that comprise the classifier for the five classes in the flower dataset.
Freezing layers
Here, fine-tuning freezes the layers in the MobileNet V1 FE except for the final convolution layer (conv_pw_13). The dense layer is initialized with new values and is trained along with conv_pw_13 from the Mobilenet V1 FE.
Here is a summary of the relevant portions of the MobileNet V1 FE that include the conv_pw_13 layer:
conv_dw_13_relu (ReLU) (None, 7, 7, 1024) 0
conv_pw_13 (Conv2D) (None, 7, 7, 1024) 1048576
conv_pw_13_bn (None, 7, 7, 1024) 4096
(BatchNormalization)
conv_pw_13_relu (ReLU) (None, 7, 7, 1024) 0
==========================================================
Total params: 3,228,864
Trainable params: 1,048,576
Non-trainable params: 2,180,288
___________________________________________________________
Adaptation and results
The TensorFlow Flowers Dataset contains 3670 labeled data points, a relatively small dataset. After an 80/20 split, we ended up with 2934 training samples and 736 testing samples.
Initial testing accuracy was 147/736, or 19.9728%. Given 5 classes, such testing accuracy is equivalent to uniform random guessing.
We divided the 2934 training samples into 18 batches per epoch, for a batch size of 163. We then trained for 100 epochs with data shuffling and a learning rate of 0.003. We achieved the testing accuracy of 670/736, or 91.0326%.
For more details, refer to the clml_mobilenet_transfer_learning sample in the OpenCL ML SDK.
Other use cases
To recap the approach we’re highlighting, you can develop and train baseline models on high-performance, server-based GPUs or in the cloud. With the OpenCL ML SDK, you can then deploy transfer learning models. Applications running on your users’ mobile devices will incrementally update the models with a wide variety of data generated by device sensors, including cameras. As demonstrated in this example, the incremental, ongoing updates take place without the need to retrain every layer in the new model.
Fine-tuning allows personalization of the baseline model with data unique to a particular user. My previous post also mentioned a use case for personalization in video conferencing. Imagine using transfer learning for a model that can identify the participant who is speaking at any given time so you can blur out his/her background. Other promising use cases involve voice recognition, lighting conditions and natural language processing (NLP). Those are among the areas ripe for transfer learning because you’re adapting model behavior to data available only at the edge, where the device is being used.
Next steps
Download the latest Adreno OpenCL Machine Learning SDK. It contains examples, detailed documentation and the toolkit for the demonstration in this post. Try it for yourself as the first step to fine-tuning your models after deployment on mobile devices.
And for a deeper look (with dogs and cats instead of wildflowers), read the Keras guide “Transfer learning & fine-tuning.”
Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.