Training ML Models at the Edge with Federated Learning

Monday 6/7/21 08:41am
Posted By Hsin-I Hsu
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

Centralized machine learning (ML) is the ML workflow that most of us are familiar with today, where training is allocated to powerful servers which update model parameters using large datasets. The trained model may then be deployed to edge devices over the cloud for inference at the edge, or edge devices may collect and send data to the server for inference and receive back the server’s prediction(s).

But thanks to today’s powerful mobile processors, and some clever distributed computing designs, training no longer has to be limited to servers sitting in the cloud. Enter Federated Learning, a technology first introduced by Google in 2016 that effectively takes the model to the data, rather than bring the data to model. In doing so, some interesting possibilities are unlocked.

In this blog, we’ll take a closer look at how federated learning works, explore some of the issues with centralized ML which federated learning can overcome, and check out some technologies that can power federated learning.

Federated Learning in Four Steps

The goal of federated learning is to take advantage of data from different locations. This is accomplished by having devices (e.g., smartphones, IoT devices, etc.) at those locations each train a local copy of a global ML model using local data. Collectively, these devices then contribute their training updates to the global ML model on a central server, and each can obtain a copy of the updated global model with these combined changes.

Here’s a summary of how it works:

  1. Each device gets a copy of the global ML model either by asking the server if a model is available or receiving one that the server pushes to the device. This model may be an initial version with just random weights or could be one that has been trained in the past.
  2. The device collects local data and trains its local copy of the model using this data.
  3. The device then sends its model changes (deltas) to the cloud (e.g., updates to the model’s weights and biases). These deltas represent the differences between the initial model and the trained model, which means that the underlying training data is never sent outside the device
  4. The server combines the deltas with those received from other edge devices to update the global model, using averaging algorithms such as Federated stochastic gradient descent or Federated Averaging. After the combined changes are integrated into the global model, the new and improved version of the global model is ready for inference. This global model can then be sent back to the edge devices and/or used for inference in the cloud.

This entire process is continually repeated to iteratively improve the model, and each iteration is known as a round. Developers can also choose to perform federated analytics, which measures the quality of federated learning models against real-world data.

Types of Federated Learning

There are two general types of federated learning. The first is Cross-device federated learning, which involves multiple devices training a local copy of the model as described above. This setup can range from just a few devices to thousands or even millions of devices. Cross-device federated learning is ideal for use cases such as incrementally improving user experiences. For example, it can be used to improve keyword search predictions across large numbers of smartphones. In this example, the history of queries is kept on the device, while on-device training improves the local model. The device then sends its training deltas to the server for integration into the global model. As the model improves over time, the search functionality for all users improves.

The second type is Cross-silo federated learning, which is similar to cross-device federated learning, but takes place across organizations rather than individual devices like users’ smartphones, IoT devices, etc. Here, the goal is for organizations with large datasets to collaborate on improving ML models. For example, a group of hospitals may want to collaborate on an ML model to detect a rare medical condition. In this case, training data may be hard to acquire due to the rarity of the condition and issues around data privacy, but the hospitals can each take advantage of their siloed data to contribute model deltas and mutually benefit from the improved global model.

Benefits and Key Features of Federated Learning

Federated learning has several key features and benefits. But to truly appreciate them, it’s important first to understand some of the drawbacks of traditional, centralized ML training.

A primary issue with centralized ML where edge devices depend on a model in the cloud for inference is latency (i.e., sending request for prediction and waiting for a response). There are many causes, including bandwidth, server downtime/availability, and resource limits such as data plan caps and battery life. Similarly, such factors can also make it difficult to transfer models to the edge for inference. In addition, privacy concerns can occur if data is sent outside of the device, and/or when it’s combined with data with from other users/devices. This can be particularly detrimental for sharing data across organizations. Finally, centralized ML models are inherently trained with limited data compared to the vast amount of real-world data in different locations that distributed devices can potentially train with.

Federated learning solves many of these challenges and provides numerous advantages. First, it takes advantage of the fact that there is a vast amount of data everywhere, and there are now billions of devices out there with powerful mobile processors to train with it. This allows data to be captured and trained right at the edge across many locations, effectively training on quantities of data that far exceed that available with centralized ML. And the processing power required to train on such a large amount of data is distributed across many devices. This in turn, can potentially reduce or simplify the required cloud-processing hardware infrastructure since the server is now only responsible for combining and integrating model deltas. Furthermore, ML models can be trained without reliance on the compute resources owned by giant AI companies.

Security-wise, federated learning could be seen as more secure since the only data leaving the device(s) are the training deltas. It’s also been suggested that federated learning can potentially overcome legalities preventing data from being transferred to another country. With federated learning, the transfer of model deltas could provide a more streamlined effort as opposed to utilizing raw user data. Cross-silo federated learning can also facilitate vertical federated learning, where alliances between organizations and even competitors can benefit from collaborative model training.

Of course, realizing these benefits requires a few implementation considerations. Developers must build communication protocols to send training deltas to the cloud and coordinate transfers of updated global models to devices. Developers must also plan for varying levels of device participation which can be affected by connection quality, device battery life, and other wireless communication intricacies. Thankfully there are resources to help with this.

Resources for Federated Learning at the Edge

Implementing federated learning requires a strong development framework and edge devices with powerful processors. Developers should start by checking out TensorFlow Federated (TFF), which provides an API for federated training, distributed communications, and other federated learning tasks.

On the device side, we offer a number of powerful SoCs like our flagship Snapdragon® 888 5G Mobile Platform and Snapdragon 865+ 5G Mobile Platform. Equipped with 5G connectivity and our Qualcomm® Hexagon™ digital signal processor (DSP), these and other offerings in our lineup of devices based on our Snapdragon mobile platform are poised to power both training and inference at the edge.

The Hexagon DSP, when utilized with our Hexagon DSP SDK, provides a valuable foundation for high-performance training at the edge. When used in conjunction with Snapdragon optimization tools such as our Snapdragon Profiler and our Snapdragon Power Optimization SDK, developers can also analyze performance issues while improving power efficiency. Developers can also take advantage of the Qualcomm® Neural Processing SDK for artificial intelligence (AI) to convert models into Snapdragon’s deep learning container (DLC) format for optimal inference on Snapdragon SoCs.


Federated learning effectively takes the model to the data by allowing for training at the edge. This provides numerous benefits including taking advantage of distributed mobile processing power to train on potentially vast amounts of data available at different locations. And federated learning can potentially overcome data privacy and legal challenges if implemented correctly.

For additional information about federated learning, be sure to check out the following resources:

Snapdragon, Qualcomm Hexagon, and Qualcomm Neural Processing SDK are products of Qualcomm Technologies, Inc. and/or its subsidiaries.