Forums - Mixed Precision Inference on HTP

3 posts / 0 new
Last post
Mixed Precision Inference on HTP
rajupalem.r
Join Date: 6 Nov 22
Posts: 2
Posted: Mon, 2023-01-16 21:39

Hi,

We are trying to run inference on SM8450's HTP processor using mixed precision. 

By mixed precision, we meant few layers have to run in int16/8 and few layers in fp16 (because SM8450 HTP supports fp16 inference).

So wanted to know if this scenario is supported or not on HTP. If not HTP, does any other delegate like GPU or CPU in SM8450 supports this mixed-precision inference.

If above scenario is not supported, We are also thinking of creating an User-defined operation (UDO) for few operations that we wanted to run on fp32. Rest of the operations in network can run in int16 using the default implementation.

Any thoughts/opinions on this idea?

SNPE version: 1.68

Hexagon SDK version: 4.0.0

 

Regards,

Pratesh

 

  • Up0
  • Down0
sanjjey.a.sanjjey
Join Date: 17 May 22
Posts: 67
Posted: Tue, 2023-01-17 06:14

Hi,

The method for processing floating point inputs and outputs on HTP target changed.Specify --use_float_io parameter to the quantizer for offline preparing or the --buffer_data_type argument to the runtime for the greatest possible performance.

For CPU & GPU Runtime on Quantized model will be Dequantized by the Runtime, Increasing network Initialization time.Accuracy maybe impacted, and for Non-Quantized model, the model is native format for this CPU & GPU  runtime. Model can  be directly passed to the runtime.Maybe more accurate than the Quantized model.Non-Quantized model use Floating point Representations of network parameters.

Thanks.

  • Up0
  • Down0
rajupalem.r
Join Date: 6 Nov 22
Posts: 2
Posted: Sun, 2023-02-05 23:20

Thanks for the reponse. But this doesn't  exactly answer above query.

Anyhow, recently I came across Quantization overrides parameter in snpe-tensorflow-to-dlc tool. So does SNPE suport mixed-precision using this overrides option. Can I override the default Quantiation precision and modify some particular layers to run in Fp16.

BR.

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.