Results for: ai inferencing
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
Co-written with Apoorva Gokhale
Figure 1: Generation on Qualcomm Cloud AI 100 Ultra (1) Baseline with FP16 weights (2) Acceleration with MX6 (3) Acceleration with MX6 and SpD
Speculative...
https://developer.qualcomm.com/blog/how-quadruple-llm-decoding-performance-speculative-decoding-spd-and-microscaling-mx-formatsTags:
Train anywhere, Infer on Qualcomm Cloud AI 100
Co-written with Nitin Jain.
In this blog post we will go through the journey of taking a model from any framework, trained on any GPU or AI accelerator and deploying it on DL2q instance that hosts...
https://developer.qualcomm.com/blog/train-anywhere-infer-qualcomm-cloud-ai-100Tags:
Qualcomm Cloud AI 100 Accelerates Large Language Model Inference by ~2x Using Microscaling (Mx) Formats
MxFP, defined by the Microscaling Formats (Mx) Alliance, is enabled on DL2q instance of AWS EC2 and is evaluated on several large language models.
Large Language Model Challenges
When performing...
https://developer.qualcomm.com/blog/qualcomm-cloud-ai-100-accelerates-large-language-model-inference-2x-using-microscaling-mxTags:
Power-efficient acceleration for large language models – Qualcomm Cloud AI SDK
Want to accelerate your large language model (LLM) inference workloads without blowing your power budget? Or your cooling budget?
The Qualcomm Cloud AI 100 performs AI inference on the edge cloud...
https://developer.qualcomm.com/blog/power-efficient-acceleration-large-language-models-qualcomm-cloud-ai-sdk