Forums - Benchmarking Inception-v1: GPU has higher performance than DSP (SD832)

3 posts / 0 new
Last post
Benchmarking Inception-v1: GPU has higher performance than DSP (SD832)
atul.rahman
Join Date: 14 Nov 16
Posts: 3
Posted: Tue, 2017-08-08 19:14

Hello,

I ran all the three runtimes, CPU (not quantized, 32float), GPU (not quantized, 32float), and DSP (quantized, 8bit) with inception-v1 and inception-v3 graphs on Galaxy S8. In case of inception-v3 graph, the results meets the expectation i.e. DSP has 1.5x higher performance and 2.72x higher inferences/s/W than GPU. 

However, in case of Inception-v1, GPU has 1.7x higher performance and 1.46x higher inferences/s/W than DSP. Why is that the case? Is it because, GPU has higher internal memory and can better utilize the data on-chip incase of relatively smaller inception-v1 graph? 

I provided the result table below. The measured power is the load power.

S8 (performance mode) ----------- Measured unit  ----------- CPU (32f) ----------- DSP (8bit) ----------- GPU (32f)

--------------------------------------------------------------------------------------------------------------------------------------------

Inception-v1                   ----------- Img/sec            -----------  6.22         -----------  21.94       ----------- 37.30

Inception-v1                   ----------- Img/sec/W        -----------  9.80         -----------  38.48       ----------- 56.28

--------------------------------------------------------------------------------------------------------------------------------------------

Inception-v3                    ----------- Img/sec            -----------  1.33         -----------  13.51       ----------- 9.04

Inception-v3                    ----------- Img/sec/W        -----------  1.11         -----------  24.24       ----------- 8.92

 

 

  • Up0
  • Down0
hong.choi
Join Date: 26 Jul 17
Posts: 1
Posted: Thu, 2017-08-10 18:15

Here're similar results.

I’ve tested AlexNet and Inception V3 on SnapDragon 820 and 835 with the model files in SNPE SDK 1.2.2.

Surprisingly, average execution time revealed unexpected result:

  1. The AlexNet was faster on GPU. But, The InceptionV3 was faster on DSP.
  2. The AlexNet on 820 DSP was faster than the same model on 835 DSP. (average execution time – 102 ms vs 128 ms)

So, I would like to add some questions on this post:

  1. Why is execution time on DSP different for network models?
  2. And, why is 820 DSP faster than 835 DSP?
  3. Does Qualcomm have any guide document for designing a neural network model for DSP?
  4. Can I get python source files for the Inception V3 used to generate the frozen graph in SDK?
  • Up0
  • Down0
gesqdn-forum
Join Date: 4 Nov 18
Posts: 184
Posted: Tue, 2019-12-03 05:27

Hi,
Kindly find the response for your queries below,

1. Why is execution time on DSP different for network models?
A. The statement "The higher processing power of DSP will help my model perform better compared to GPU", is not always true.
We worked with  Face Expression Recognition(FER) model built using Keras and converted to the DLC file,

Comparison of Total Inference Time for GPU and DSP draws to the conclusion that  DSP performance is 60% to that of  GPU. Before drawing that conclusion we also have an account of the time consumed for RPC Execute ( acts as a communicator between CPU/GPU and DSP), SNPE Accelerator and Accelerator. On considering these mentioned parameters, it looks GPU is performing better than DSP for a single/lesser number of predictions. We can choose DSP as a run time only if we required to make a higher number of predictions using the FER model.

2. Why is 820 DSP faster than 835 DSP?
A. The below posts answer your question with the detailed explanation from the Qualcomm team on the problem you are facing
https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34559

3. Does Qualcomm have any guide document for designing a neural network model for DSP?
A. No, Snapdragon Neural Process Engine does not require different model designs for different run times.

4. Can I get python source files for the Inception V3 used to generate the frozen graph in SDK?
A. You cannot get any intermediate files that are generated to create a DLC file.

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.