Hi,
We are trying to run inference on SM8450's HTP processor using mixed precision.
By mixed precision, we meant few layers have to run in int16/8 and few layers in fp16 (because SM8450 HTP supports fp16 inference).
So wanted to know if this scenario is supported or not on HTP. If not HTP, does any other delegate like GPU or CPU in SM8450 supports this mixed-precision inference.
If above scenario is not supported, We are also thinking of creating an User-defined operation (UDO) for few operations that we wanted to run on fp32. Rest of the operations in network can run in int16 using the default implementation.
Any thoughts/opinions on this idea?
SNPE version: 1.68
Hexagon SDK version: 4.0.0
Regards,
Pratesh
Hi,
The method for processing floating point inputs and outputs on HTP target changed.Specify --use_float_io parameter to the quantizer for offline preparing or the --buffer_data_type argument to the runtime for the greatest possible performance.
For CPU & GPU Runtime on Quantized model will be Dequantized by the Runtime, Increasing network Initialization time.Accuracy maybe impacted, and for Non-Quantized model, the model is native format for this CPU & GPU runtime. Model can be directly passed to the runtime.Maybe more accurate than the Quantized model.Non-Quantized model use Floating point Representations of network parameters.
Thanks.
Thanks for the reponse. But this doesn't exactly answer above query.
Anyhow, recently I came across Quantization overrides parameter in snpe-tensorflow-to-dlc tool. So does SNPE suport mixed-precision using this overrides option. Can I override the default Quantiation precision and modify some particular layers to run in Fp16.
BR.