Can't understand why there's a difference between reported total inference time and the sum of per-layer computational times.
Join Date: 3 Feb 22
Posts: 3
Posted: Fri, 2023-12-08 03:28
I have successfully created an ONNX model with 7 layers which includes 2 convolution layers, 2 sigmoid layers, 2 max pool layers, and a softmax layer. I processed this model using the qnn-onnx-converter, resulting in .cpp and .bin files. Subsequently, I utilized the qnn-model-lib-generator to generate a shared library (lib*.so).
After the successful generation of the model shared library, I employed the qnn_bench.py script from the QNN SDK to conduct detailed layer-wise profiling on a QRB5165 target. The command used for this profiling is:
python3 qnn_bench.py -c model.json -t aarch64-ubuntu-gcc7.5 -l detailed -v ADA7D246
Here are the layer wise computational times acheived in the csv file:
Total Inference Time [_Conv_0] 11018 us
Total Inference Time [_Sigmoid_1] 1328 us
Total Inference Time [_MaxPool_2] 183 us
Total Inference Time [_Conv_3] 6721 us
Total Inference Time [_Sigmoid_4] 197 us
Total Inference Time [_MaxPool_5] 53 us
Total Inference Time [__12_nchw] 157 us
Total Inference Time [_Gemm_7] 1704 us
Total Inference Time [_Softmax_8] 4 us
Total Inference Time [NetRun] 21401 usec
The sum of the individual layer-wise timings, 11018+1328+183+6721+197+53+157+1704+4 = 21365, does not match the reported total inference time of 21401us. There is a discrepancy of 36 microseconds between the sum of the individual layer-wise timings and the reported total inference time.
I would appreciate your assistance in understanding the cause of this difference.
Best,
Lavanya varikuppala