Forums - Benchmark Results 820 vs. 835

4 posts / 0 new
Last post
Benchmark Results 820 vs. 835
npailoor3
Join Date: 28 Jul 17
Posts: 13
Posted: Thu, 2017-08-03 11:30

Hi,

I noticed that there was a big gap in the performance of the inference between the 820 and 835. The following are my results for a squeezenet model inferring 113 raw files. I used the snpe_bench.py.

835                                   820

------------------------------------------------

GPU: 9.1ms                      10.4ms

CPU: 115.3ms                   55.ms

 

 

There was an increase in GPU performance (9.1ms is awesome!) but the GPU for the 820 was twice as fast. Is this to be expected or should the performance be the same/better for the 835?

I also benchmarked using the DSP but only for the 835 and got 25ms. On the main page it says the DSP is expected to run 2x faster than the GPU. Do themodels have to be trained differently to acheieve those results?

  • Up0
  • Down0
moljaca moderator
Join Date: 25 Jul 17
Location: San Diego
Posts: 40
Posted: Mon, 2017-08-07 18:07

Hi,

Thank you for the interest in Snapdragon NPE.

There is no need to train models differently for different SNPE runtimes.

SNPE SDK is fast evolving product and we are improving its capabilities continuously. Different SNPE runtimes (i.e. CPU, DSP, GPU) are not at full parity in terms of supported layer types and levels of optimization, and accross various devices. 

Your observation is correct, and event though we do not see exact same numbers in absoulute terms we can confirm that CPU runtime in our tests is slower on 835 device comparing to 820 device. The difference is also model dependent, and with SqueezeNet is more pronounced. We will look into addressing this in future releases.

Similarly this applies to your DSP runtime performance observation. We are working on further optimizing performance of DSP runtime. Did you try loading quantized DLC for DSP runtime (e.g. using snpe-dlc-quantize tool)? That could yield better performance, so you can give that a try if you did not already.

Could you share (OEM, model, OS version) of the devices you have used in your tests? It would help us get better understandign of the problem.

Best regards.

Thanks

 

  • Up0
  • Down0
npailoor3
Join Date: 28 Jul 17
Posts: 13
Posted: Tue, 2017-08-08 10:47

Hi Moljaca,

I appreciate the reply.

The devices used are from the Open-Q 820 and Open-Q 835 dev kits.

I tried the quantized model from the DSP and noticed that it was slightly faster than the non-quantized model. However, both models resulted in significant accuracy degradation. When creating the quantized I provided a diverse sample of 100 images from all the categories (I tried ranges from 1, 5, 10, 20, etc till 100).

 

  • Up0
  • Down0
moljaca moderator
Join Date: 25 Jul 17
Location: San Diego
Posts: 40
Posted: Tue, 2017-08-08 16:32

Hi,

thanks for trying the experiment and answering my questions. I can confirm that significant accuaracy degradation when running SqueezeNet model in SNPE using DSP runtime is known issue, we are working on resolving it. Fix wil be provided in future SDK releases.

Best regards.

Thanks 

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.