Implementing Machine Learning and Operations (MLOps)

Thursday 6/11/20 09:03am
Posted By Felix Baum
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

The lifecycle of traditional software is arguably quite straight forward. At its simplest, you develop, test, and deploy the software, and then release a new version with features, updates, and/or fixes as needed. To facilitate this, traditional software development often relies on DevOps, which involves Continuous Integration (CI), Continuous Delivery (CD), and Continuous Testing (CT) to reduce development time while continuously delivering new releases and maintaining quality.

When it comes to machine learning (ML) modeling, it’s easy to think that the ML workflow follows a similar pattern. After all, it should just be a matter of creating and training the ML model, deploying it, and releasing a new version as required. But the environment in which ML systems operate complicates things. For starters, ML systems themselves are inherently different from traditional software due to their data-driven, non-deterministic behavior. And as recent global events have highlighted, our world is constantly changing, so ML practitioners must anticipate that the real-world data on which production models infer, will inevitably change too.

As a result, a special kind of DevOps for ML has emerged, called MLOps (short for machine learning and operations), to help manage this constant change and the subsequent need for model redeployments. MLOps embraces DevOps’ continuous integration and continuous delivery, but replaces the continuous testing phase with continuous training. This continuous training of new models, which includes redeployment of those new models and all of the technical efforts that go along with it, aims to address three notable aspects of ML projects:

  • The need for “explainability” as to how and why a model makes certain predictions. This is especially important for auditing purposes to meet regulations and/or certain levels of predictive performance.
  • Model decay, which is the reduction of the production model’s predictive performance over time, due to new and changing real-world data encountered by the model.
  • The continuous development and enhancements to the model driven by business requirements.

Continuous training, and indeed MLOps in general, embrace the idea that the model will constantly and inevitably change, which means organizations implement MLOps strategies and tactics to varying degrees.

As MLOps have evolved, a number of organizations have put forth frameworks for best practices. One prominent example is Google’s MLOps guidelines which describes three levels of MLOps implementations adopted by organizations:

  • MLOps level 0: Manual Process: the need for model training and deployment are formally recognized, but are performed manually, often in an ad-hoc fashion through scripts and interactive processes. This level generally lacks continuous integration and continuous delivery.
  • MLOps level 1: ML pipeline automation: this level introduces a pipeline for continuous training. Data and model validation are automated, and triggers are in place to retrain the model with fresh data when its performance degrades.
  • MLOps level 2: CI/CD pipeline automation: the ML workflow has been automated to the point where data scientists are able to update both the model and pipeline with reduced intervention from developers.

Organizations that have implemented MLOps level 2 or MLOps level 3 may even make use of so-called shadow models, which are models that are trained in parallel while the production model runs, using a different training dataset for each. When a new production model is needed, a shadow model can be quickly selected and deployed.

Click image to enlarge

Figure 1 - A visual depiction of parallel training of shadow models on different datasets. Each model is a potential candidate to become the new production model.

To better understand how and when these different levels of MLOps can be implemented, consider the following examples.

Example 1: Packaging Robots

A simple example is a robot at the end of an assembly line, that uses computer vision powered by ML to analyze and package up products. The ML model may have been trained to recognize square and rectangular boxes of a limited range of sizes. However, the business will now introduce new shapes and sizes for its packages, so a new ML model is created and deployed. In this scenario a development team at MLOps level 0 would manually create, train, and deploy a new model, pre-emptively before the new types of packages start rolling down the assembly line. Given the limited domain of the robot’s capabilities, this level of MLOps may suffice.

Example 2: Speech Recognition

Consider a mobile app for speech recognition that can sense or identify context (e.g., tone or emotion) based on how people speak. Over time, new phrases and slang come along, and the general style of how people talk changes. In this scenario the ML model will likely exhibit model decay over a long period of time.

This is an example where the guidelines of MLOps level 1 could potentially help. Under MLOps level 1, a system could be run on the device, to monitor the model’s predictive performance. If performance approaches or falls below a threshold, the system triggers an alert to the team that a new model should be trained using fresh data, and then deployed to replace the production model.

Example 3: Sudden Outliers in the Stock Market

Recently the price of oil entered negative territory. If the ML model makes predictions based on the price of oil but has only been trained with positive prices, how will it perform when it suddenly encounters a negative price? In this situation, team members will need to be alerted to the problem immediately, and must be in a position to quickly train and redeploy a new model.

Here, an implementation of MLOps level 2 could be beneficial. At this level, much of the pipeline for training and deploying a new model has been automated, so data scientists should be able to handle most or all of the pipeline without developer assistance. Moreover, the degree of automation should allow them to focus updating, training, and deploying a new model as quickly and as reliably as possible.

Integrating Snapdragon into a MLOps Pipeline.

Snapdragon® mobile platforms, along with the Qualcomm® Neural Processing SDK for artificial intelligence (AI) are well positioned for MLOps. First, devices based on Snapdragon not only host the model, but also run it for inference using the most optimal built-in compute resource. Second, our Neural Processing SDK for artificial intelligence (AI) that developers use to optimize and load models onto Snapdragon mobile platforms can be integrated into the MLOps pipeline to facilitate the model deployment process.

The general process for working with the SDK in the context of the ML workflow is as follows:

  1. The ML model is designed and trained using a framework such as TensorFlow or Caffe2.
  2. The DLC converter tool in the Qualcomm Neural Processing SDK is used to convert that model to a DLC format for execution on Snapdragon.
  3. API calls from the Qualcomm Neural Processing SDK are added to the app to load the DLC data onto the Qualcomm® Hexagon™ processor and to run inference. We often refer to an app that uses the SDK’s API as a Qualcomm® AI Engine supported app.
  4. The Qualcomm AI Engine supported app and the model are loaded onto the device powered by Snapdragon and deployed for inference on real-world data.
Click image to enlarge

Figure 2 - The basic workflow for integrating the Qualcomm Neural Processing SDK into the ML app.

In the context of MLOps, steps 1, 2, and 4 from above would be repeated throughout the product’s lifecycle as model updates are required. And depending on the level of MLOps that has been implemented, developers may build additional entities such as monitoring tools, that run on Snapdragon platforms to gauge the model’s predictive performance. They may also build some sort of alert and/or trigger that can start a new iteration of the ML workflow.

Using the SDK for MLOps

The Qualcomm Neural Processing SDK includes command-line tools to convert from TensorFlow, Caffe2, and ONNX to DLC format, and these can be integrated into the MLOps pipeline model deployment script. Additional command-line tools from our SDK can also be integrated into the deployment pipeline to further optimize the converted model.

The following diagram illustrates where these tools can be deployed in the ML workflow:

Click image to enlarge

Figure 3 – A more detailed view of the role of the Qualcomm® Neural Processing Engine (NPE) SDK in the ML workflow.

In the upper half of the workflow, the ML model is trained in the ML framework. In the lower half of the workflow, the SDK’s command-line tools are used to convert the model to DLC format, and optionally, optimize the model for the Snapdragon-powered device. In the context of MLOps, it’s these command-line tools that would be invoked either manually or as part of automated build scripts.

Developers can also implement different methods to reload a new model into the Qualcomm AI Engine supported app running on the device. One method is to run services on the device that automatically pull down the new model and restart the Qualcomm AI Engine supported app (or even restart the whole device) with a new model, but that could disrupt service while the restart is in progress. Alternatively, the app itself could watch for the presence of new DLC files at runtime, and re-invoke the Qualcomm NPE APIs to load the new model. Yet another option is to use a push approach, in which over-the-air updates initiated by a server push a new model to the device.

Depending on the app’s architecture, each of these options may incur different levels of downtime as the model is deployed and the app switches to using it. Developers will therefore need to weigh the complexity of implementing each technique, with the potential downtime required to deploy a new model.


ML systems are inherently different to traditional software because they are non-deterministic, and operate in a world of constantly evolving, ever-changing data. But thanks to formal methodologies like MLOps, developers have the tools they need to strategize and implement robust ML solutions.

With ML gaining such prominence in edge computing, and our Snapdragon mobile platforms powering ML on so many devices, it’s exciting to see how our hardware and SDKs can support developers in implementing MLOps.

For developers looking for more information, be sure to check out the Qualcomm Neural Processing SDK page as well as the SDK’s documentation. And for those looking to build the foundation of a Snapdragon-powered device, be sure to check out our hardware development kits such as the Snapdragon 865 mobile hardware development kit.

Snapdragon, Qualcomm Neural Processing SDK, Qualcomm Neural Processing Engine, Qualcomm AI Engine and Qualcomm Hexagon are products of Qualcomm Technologies, Inc. and/or its subsidiaries.