Results for: ai inferencing

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

Co-written with Apoorva Gokhale Figure 1: Generation on Qualcomm Cloud AI 100 Ultra (1) Baseline with FP16 weights (2) Acceleration with MX6 (3) Acceleration with MX6 and SpD Speculative...
https://developer.qualcomm.com/blog/how-quadruple-llm-decoding-performance-speculative-decoding-spd-and-microscaling-mx-formats
Tags:

Train anywhere, Infer on Qualcomm Cloud AI 100

Co-written with Nitin Jain. In this blog post we will go through the journey of taking a model from any framework, trained on any GPU or AI accelerator and deploying it on DL2q instance that hosts...
https://developer.qualcomm.com/blog/train-anywhere-infer-qualcomm-cloud-ai-100
Tags:

Qualcomm Cloud AI 100 Accelerates Large Language Model Inference by ~2x Using Microscaling (Mx) Formats

MxFP, defined by the Microscaling Formats (Mx) Alliance, is enabled on DL2q instance of AWS EC2 and is evaluated on several large language models. Large Language Model Challenges When performing...
https://developer.qualcomm.com/blog/qualcomm-cloud-ai-100-accelerates-large-language-model-inference-2x-using-microscaling-mx
Tags:

Power-efficient acceleration for large language models – Qualcomm Cloud AI SDK

Want to accelerate your large language model (LLM) inference workloads without blowing your power budget? Or your cooling budget? The Qualcomm Cloud AI 100 performs AI inference on the edge cloud...
https://developer.qualcomm.com/blog/power-efficient-acceleration-large-language-models-qualcomm-cloud-ai-sdk
Tags: