Dear Qualcomm Team,
Architecture: SM7450
QNN version: 2.12.0.230626
I have a simple Conv2D tflite model, the architecture is Input -> Conv2D -> Output. The input and output are of the same shape which are (299, 299, 3). The Conv2D layer does not have any bias, and I assigned all of the kernel weights to 10.
In order to check the precision, I create a dummy data which is a 299x299x3 tensor with values of 1.0.
Command to generate FP32 model:
${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-tflite-converter --input_network test.tflite --input_dim "serving_default_input_1:0" 1,299,299,3 --output_path test.cpp
Command to generate FP16 model:
${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-tflite-converter --input_network test.tflite --input_dim "serving_default_input_1:0" 1,299,299,3 --float_bw 16 --output_path test.cpp
Then I generate .so files for FP16 and FP32 models respectively.
I pushed the .so files and dummy data to SM7450 phone.
Command to inference on phone:
./qnn-net-run --backend <libQnnGpu.so or libQnnHtp.so> --model <fp32.so or fp16.so> --input_list target_raw_list.txt
By using fp32 model on GPU and HTP, I managed to get result of 270, which is correct since the formula to calculate convolution result of one point is 10*3*3*1.0*3=270
However, result of fp16 model is close to zero, which is around 1e-27.
May I know is the above commands the correct way to generate FP16 model and doing inference? How could I solve this precision problem?
Thanks.
Dear Chan, dear Qualcomm Team,
just wanted to highlight again that we are running into the same problem when using Float16 on the HTP - see my post from a few days ago.
Any help here would be highly appreciated!
Best regards,
Manuel
Hi Manuel,
Are you using the same commands to convert model fo FP16 precision and inference?
Regards,
Yi Xuan.
Hi Yi Xuan,
Yes I use the same kind of command. My model is named differently of course, but the rest is the same otherwise.