Hi all.
I trained a model using torch and converted it to onnx and from there to DLC.
I quantized the DLC using snpe-dlc-quantize tool with 16 bit activations and 8 bit weights.
Afterwards I used snpe-net-run in order to run the model. When using the --use_dsp flag the model give bad results (random) while removing this flag yields normal results. I'm not using other flags than the input/output parameters and the --use_dsp flag
How can we run a model with 16 bit activations and 8 bit weights on DSP correctly?
I'm using SNPE with version: SNPE v2.5.0.4052
Dear developer,
You can take a look at below commands which can convert to 16bits.
BR.
Wei
Thanks for your comment, I still get high differences when running on DSP vs CPU (the exact same model)
When I use the command you shared with me it makes no difference and i still get high differences between the 2 runs.
Is there some memory limitation of the DSP? is there a way that I can debug it?
Thanks,
David.
To make it more specific and to reproduce this issue fully:
I took a model from: https://google.github.io/mediapipe/solutions/face_mesh.html
Specificaly this model: https://storage.googleapis.com/mediapipe-assets/face_landmark.tflite
In order to convert the tflite model I used the next command:
snpe-tflite-to-dlc --input_network face_landmark.tflite --output_path face_landmark.dlc --input_dim input_1 "1,192,192,3" --out_node conv2d_21
(BTW the output shape is not the same as the original model, but let's leave it aside its not affecting the problem I'm talking about)
Now, I quantized the model using the next command:
snpe-dlc-quantize --input_dlc face_landmark.dlc --output_dlc face_landmark_quant.dlc --weights_bitwidth=8 --act_bitwidth=16 --input_list /home/almog/workspace/docker_share/sample_data_for_fld_raw/image_list_short_full_path.txt --use_enhanced_quantizer --enable_htp --override_params
And run it on the device using the next 2 commands:
/data/local/tmp/snpeexample/snpe-net-run --container $qat_model_loc --input_list /data/local/tmp/sample_data_for_fld_raw/image_list_short_full_path.txt --output_dir /data/local/tmp/precision_test/qat_dsp --use_dsp
/data/local/tmp/snpeexample/snpe-net-run --container $qat_model_loc --input_list /data/local/tmp/sample_data_for_fld_raw/image_list_short_full_path.txt --output_dir /data/local/tmp/precision_test/qat_dsp --use_dsp
Than I compared the outputs of the 2 runs which are significantly different!
What is the cause for it? Am I doing something wrong?