Accelerating PyTorch models on Mobile Devices with PyTorch Edge delegate

Tuesday 10/17/23 11:52pm
Posted By Felix Baum
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

At Qualcomm, we’re excited to announce that we have collaborated with the Meta PyTorch Edge team and share the availability of a new on-device AI stack ExecuTorch, that allows PyTorch models to run in an accelerated and power efficient manner on Qualcomm Snapdragon mobile platforms. This delegate allows developers to offload inferencing of machine learning algorithms to the Qualcomm Hexagon NPU, our optimized AI engine built for AI inferencing at the edge. This is another effort that is coming from our AI software team to make it easier for developers to access our powerful NPU.

As pioneers driving on-device AI, we want to empower developers with performance, efficiency, and a seamless workflow across our Qualcomm AI stack. Our engineering teams have directly contributed to the ExecuTorch codebase to create the delegate bindings that connect PyTorch to our AI optimizations on Snapdragon.

For the first time, PyTorch developers can accelerate models on mobile neural processing units without the need for conversion into proprietary AI software frameworks. By providing mobile developers with direct access to the Qualcomm AI Engine through the edge delegate, we are streamlining the process for leveraging the capabilities of our NPU hardware acceleration. This simplified workflow unlocks the full potential of Snapdragon’s platform for PyTorch mobile development.

The ExecuTorch API offers flexibility, which is beneficial from Qualcomm’s perspective. Meaning there is not a need to compile the entire graph at once, and support can be progressively added over time. This is especially useful for large and complex graphs. This flexibility has direct benefits for PyTorch users as well. With the consistent ExecuTorch API, users can compile and deploy their models, and underneath the hood, some parts of the graph can be delegated to our powerful Hexagon NPU as much as possible while the rest of the graph can still be executed in general purposes CPUs. This improves the developer experience by allowing for fast iteration and model coverage, while still gaining meaningful performance without requiring any changes to user code.

At the time of the launch the Qualcomm delegate for ExecuTorch will have an initial testing phase with models supported:

  • MobileNet v2
  • Inception v4
  • DeepLab v3
  • MobileBert

The long-term roadmap is to expand coverage across network architectures and use cases. We aim to provide a unified on-device AI workflow – from prototyping in PyTorch to optimized deployment on our intelligent edge platforms.

To get started, check out ExecuTorch at:

We look forward to seeing the solutions the PyTorch developer community creates using this new edge acceleration capability. The potential for on-device intelligence is growing rapidly thanks to optimizations across the AI stack, from model development to deployment. The Pytorch Edge delegate comes as an addition to our existing portfolio which already supports TFlite and ONNX. Learn more at: Qualcomm AI Stack | Unified AI Software Portfolio | Qualcomm

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.