Hello,
I ran all the three runtimes, CPU (not quantized, 32float), GPU (not quantized, 32float), and DSP (quantized, 8bit) with inception-v1 and inception-v3 graphs on Galaxy S8. In case of inception-v3 graph, the results meets the expectation i.e. DSP has 1.5x higher performance and 2.72x higher inferences/s/W than GPU.
However, in case of Inception-v1, GPU has 1.7x higher performance and 1.46x higher inferences/s/W than DSP. Why is that the case? Is it because, GPU has higher internal memory and can better utilize the data on-chip incase of relatively smaller inception-v1 graph?
I provided the result table below. The measured power is the load power.
S8 (performance mode) ----------- Measured unit ----------- CPU (32f) ----------- DSP (8bit) ----------- GPU (32f)
--------------------------------------------------------------------------------------------------------------------------------------------
Inception-v1 ----------- Img/sec ----------- 6.22 ----------- 21.94 ----------- 37.30
Inception-v1 ----------- Img/sec/W ----------- 9.80 ----------- 38.48 ----------- 56.28
--------------------------------------------------------------------------------------------------------------------------------------------
Inception-v3 ----------- Img/sec ----------- 1.33 ----------- 13.51 ----------- 9.04
Inception-v3 ----------- Img/sec/W ----------- 1.11 ----------- 24.24 ----------- 8.92
Here're similar results.
I’ve tested AlexNet and Inception V3 on SnapDragon 820 and 835 with the model files in SNPE SDK 1.2.2.
Surprisingly, average execution time revealed unexpected result:
So, I would like to add some questions on this post:
Hi,
Kindly find the response for your queries below,
1. Why is execution time on DSP different for network models?
A. The statement "The higher processing power of DSP will help my model perform better compared to GPU", is not always true.
We worked with Face Expression Recognition(FER) model built using Keras and converted to the DLC file,
Comparison of Total Inference Time for GPU and DSP draws to the conclusion that DSP performance is 60% to that of GPU. Before drawing that conclusion we also have an account of the time consumed for RPC Execute ( acts as a communicator between CPU/GPU and DSP), SNPE Accelerator and Accelerator. On considering these mentioned parameters, it looks GPU is performing better than DSP for a single/lesser number of predictions. We can choose DSP as a run time only if we required to make a higher number of predictions using the FER model.
2. Why is 820 DSP faster than 835 DSP?
A. The below posts answer your question with the detailed explanation from the Qualcomm team on the problem you are facing
https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34559
3. Does Qualcomm have any guide document for designing a neural network model for DSP?
A. No, Snapdragon Neural Process Engine does not require different model designs for different run times.
4. Can I get python source files for the Inception V3 used to generate the frozen graph in SDK?
A. You cannot get any intermediate files that are generated to create a DLC file.