Dear Qualcomm team,
we experience large numerical and visual differences when using the QNN SDK to run a fully convolutional model in Float32 (libQnnGpu.so) versus Float16 (also libQnnGpu.so or libQnnHtp.so).
The models are converted from the same TF Lite checkpoint, the only difference is the "--float_bw 16" versus "--float_bw 32" option.
As a sanity check, we have run the same model in Pytorch using 16bit and 32bit precision and used the SNPE SDK to run with both precisions on GPU. All these tests showed only minor numerical differences and the outputs looked visually identical.
Can you help us find and resolve this issue? Running on the HTP backend with Float16 precision is necessary for us to continue using Qualcomm on a large scale and we would have to abandon it otherwise.
Please let me know if you can help in any way!
With best regards,
Manuel
Dear developer,
Not fully understand your questions. We assmue it's the FP16 accuracy not good as FP32. Do you know which ops accuracy dropped after layer laye analysis.
BR.
Wei
Hi Wei,
Thank you for getting back to me!
So we have a very simple model, Conv2D-ReLU-Conv2D-ReLU-Conv2D-ReLU-Conv2D, trained in Pytorch.
If run it in Pytorch with Float32 and in Pytorch with Float16, their outputs show minimal differences.
The same is if we run it using SNPE on a phone GPU with Float32 and Float16, both outputs show minimal differences.
BUT, if we run it using QNN on a phone GPU with Float32 and Float16, the outputs suddenly have a very large difference.
Do you have any idea, why that is? What makes the differences between SNPE and QNN here?
Hi Manuel,
I encounter the same problem as you, however, when I convert my Float32 model to Float16 via SNPE SDK with "--float_bw 16", the large difference between these two precisions is observed. May I know how you convert Float32 model using SNPE?
Also sent you a DM regarding this problem. Looking forward to hearing from you soon!
Hi,
yes of course, my command to convert to SNPE is
snpe-tflite-to-dlc \
Note that in SNPE you don't specify the float bit-width during conversion, but later by selecting the "GPU" or "GPU16" backend.
Hi Manuel,
Thanks for your prompt reply. As far as I know, isn't that SNPE also supports Float16 inference on DSP? Are you managed to inference on SNPE DSP with float16 precision? Besides, I found that by simply converting tflite to dlc or .so files on SNPE and QNN respectively, I able to get result on HTP, have you tried this before?
Hi Xarius,
Thank you for the great exchange!
All our conversion (except for testing) is done from TF Lite to HTP using the QNN toolkit, meaning that I still experience errors when coming from TF Lite.
I have not done Float16 on the DSP using SNPE, how would I do that? I if convert a model using the command above, it fails do run. Do I need to specify Float16 during conversion somehow?
Cheers,
Manuel