Hi,
I noticed that there was a big gap in the performance of the inference between the 820 and 835. The following are my results for a squeezenet model inferring 113 raw files. I used the snpe_bench.py.
835 820
------------------------------------------------
GPU: 9.1ms 10.4ms
CPU: 115.3ms 55.ms
There was an increase in GPU performance (9.1ms is awesome!) but the GPU for the 820 was twice as fast. Is this to be expected or should the performance be the same/better for the 835?
I also benchmarked using the DSP but only for the 835 and got 25ms. On the main page it says the DSP is expected to run 2x faster than the GPU. Do themodels have to be trained differently to acheieve those results?
Hi,
Thank you for the interest in Snapdragon NPE.
There is no need to train models differently for different SNPE runtimes.
SNPE SDK is fast evolving product and we are improving its capabilities continuously. Different SNPE runtimes (i.e. CPU, DSP, GPU) are not at full parity in terms of supported layer types and levels of optimization, and accross various devices.
Your observation is correct, and event though we do not see exact same numbers in absoulute terms we can confirm that CPU runtime in our tests is slower on 835 device comparing to 820 device. The difference is also model dependent, and with SqueezeNet is more pronounced. We will look into addressing this in future releases.
Similarly this applies to your DSP runtime performance observation. We are working on further optimizing performance of DSP runtime. Did you try loading quantized DLC for DSP runtime (e.g. using snpe-dlc-quantize tool)? That could yield better performance, so you can give that a try if you did not already.
Could you share (OEM, model, OS version) of the devices you have used in your tests? It would help us get better understandign of the problem.
Best regards.
Thanks
Hi Moljaca,
I appreciate the reply.
The devices used are from the Open-Q 820 and Open-Q 835 dev kits.
I tried the quantized model from the DSP and noticed that it was slightly faster than the non-quantized model. However, both models resulted in significant accuracy degradation. When creating the quantized I provided a diverse sample of 100 images from all the categories (I tried ranges from 1, 5, 10, 20, etc till 100).
Hi,
thanks for trying the experiment and answering my questions. I can confirm that significant accuaracy degradation when running SqueezeNet model in SNPE using DSP runtime is known issue, we are working on resolving it. Fix wil be provided in future SDK releases.
Best regards.
Thanks