Hello,
I ran all the three runtimes, CPU (not quantized, 32float), GPU (not quantized, 32float), and DSP (quantized, 8bit) with inception-v1 and inception-v3 graphs on Galaxy S8. In case of inception-v3 graph, the results meets the expectation i.e. DSP has 1.5x higher performance and 2.72x higher inferences/s/W than GPU.
However, in case of Inception-v1, GPU has 1.7x higher performance and 1.46x higher inferences/s/W than DSP. Why is that the case? Is it because, GPU has higher internal memory and can better utilize the data on-chip incase of relatively smaller inception-v1 graph?
I provided the result below for Galaxy S8 ( performance mode). The measured power is the load power.
Inception-v1,
Unit(inf/sec), CPU --> 6.22, DSP --> 21.94, GPU --> 37.30
Unit(inf/sec/W), CPU --> 9.80, DSP --> 38.48, GPU --> 56.28
Inception-v3,
Unit(inf/sec), CPU --> 1.33, DSP --> 13.51, GPU --> 9.04
Unit(inf/sec/W), CPU --> 1.11, DSP --> 24.24, GPU --> 8.92
Hi,
Thanks for interest in Snapdragon NPE.
What you observed is accurate due to some layers types / configurartions that are present in v1 network not being fully optimized for DSP runtime. The runtime optimizations are work-in-progress and are evolving, it will get better in future in SNPE SDK releases.
Best regards.
Hi,
I think the following parameters are very useful when evaluating the inference performance,
But I don't know how to measure inference performance with inference/sec/W on snapdragon device(i.e. SDM821). Is there any special benchmark tool to obtain these parameters? Look forward to your suggestions and Thanks so much.
Yangfan
Hi,
The statement "The higher processing power of DSP will help my model perform better compared to GPU", is not always true.
We worked with Face Expression Recognition(FER) model built using Keras and converted to the DLC file,
Comparison of Total Inference Time for GPU and DSP draws to the conclusion that DSP performance is 60% to that of GPU. Before drawing that conclusion we also have an account of the time consumed for RPC Execute ( acts as a communicator between CPU/GPU and DSP), SNPE Accealtor and Accelerator. On considering these mentioned parameters, it looks GPU is performing better than DSP for a single/lesser number of predictions. we can choose DSP as a run time only if we required to make a higher number of predictions using the FER model.
I have the same question. How to measure infernece performance with inference/sec/W on snapdragon device(i.e. 855 or sa8155p)?
I know Snapdragon Profiler can give same profiling data but not include Watt.
Looking forward to your relay.