Hi, I used Aimet to train a 16bit model (16bit activation + 8 bit weights) with per-axis quantization. I try to run the model on the DSP but the results I get from it are bad. I create a raw file which consist the image in float32 format and I get a very bad result. When I run the exact same model on the CPU I get good results.
So, my question is. How should I run a 16bit model in DSP.
I'm using snapdragon 888 with snpe 2.5.*
The command I used for generating the DLC:
npe-onnx-to-dlc --input_network fld_20230417_183923_quant.onnx --output fld_20230417_183923_quant.dlc --quantization_overrides fld_20230417_183923_quant.encodings
Quantize the model:
snpe-onnx-to-dlc --input_network fld_20230417_183923_quant.onnx --output fld_20230417_183923_quant.dlc --quantization_overrides fld_20230417_183923_quant.encodings
And for running the model:
Thanks.
Dear developer,
Good to see the AIMET has been integrated to your project.
You can raise this issue to AIMET Github repo for more help.
GitHub - quic/aimet: AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
BR.
Wei
Hi SNPE team
This is Abhi from the AIMET team. I reviewed the script that @almogdavid used for applying AIMET AutoQuant and then QAT. His approach seems correct.
I am guessing that there are missing arguments when invoking snpe
- Should we not specify 8-bits for weights and 16 bits for activations?
- Is there a flag to specify 32-bit bias? We should set that.
- Any others?