Hi all,
First, I try to test the inception_v3_quantized model which i get the Total MACs is 5713M on DSP runtime(Only inference time)
a、it cost 60ms in 820
b、it cost 37ms in 845 while 38ms in 710
c、it cost 57ms in 835
Second, Than I try our model which I get the Total MACs is 78M and 138M on DSP runtime
a、it cost 200+ms in 820
b、it cost 180+ms in 845 while 180+ms in 710
c、it cost 200+ms in 835
So what is the reason for that result? Does someone has the same problem or even fix or found out the reason.
Thanks!!!
Hi zeekim,
Please try SNPE benchmarking tool, snpe-diagview. It genetates layer-wise timing information, which might help you.
https://developer.qualcomm.com/docs/snpe/tools.html#tools_snpe-diagview
Thanks,
Jihoon
Dear Jihoon,
Thanks for your quik replay.
I has test the benchmark with inception v3 and my model. The new model(My test model) is cost 132ms by snapdragon on DSP Runtime.
And I have same questions:
1. My test model convolution cost much time on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contrary
For example, I had make same photo but it can not place images here, I test the inception_v3 and compare the covn_2 layer:My model's second layer is also conv2 layer with Relu.
1-1. Inception v3:[snpe-dlc-info show the MACs: 398m; in hwc is 147x147x32; out hwc is 147x147x64 ]
layer_005 (Name:conv_2/Conv2D Type:convolutional) GPU: 6538 DSP: 0
layer_006 (Name:conv_2 Type:neuron) GPU: 0 DSP:2943
1-2. My test model::[snpe-dlc-info show the MACs: 199m; in hwc is 360x640x12; out hwc is 360x640x8]
layer_003 (Name:Mytest/conv2/Conv2D Type:convolutional) GPU: 3513 DSP: 0
layer_004 (Name:Mytest/conv2/Relu Type:neuron) GPU: 0 DSP: 17491
2. I also find that deconvolution on DSP-Runtime is too slow too.
Thanks!!!
Hi zeekim,
I think you aren't using the latest version of SNPE.
AFAIK, deconvolution and activation has huge speed-up on DSP recently.
Please try the latest one, 1.17.0.
Thanks,
Jihoon
Dear Jihoon,
I have tested the newest version(1.17.0) on Snapdragon 820
---->First, there was a very big improvement, it cost only half time of the old version.
--->Second, the performance of convolution with neuron still too slow(which cost much time on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contray). This is very important for us. Do you have any suggestions for optimization?
Thanks!!!
Hi zeekim,
I think the current speed is the maximum in your network model using snpe. You need to apply optimization techniques to your model like pruning, decomposition, and/or compression.
Thanks,
Jihoon
Hi jihoonk,
I have tried to convert deconvolution but fails. And slim.conv2d_transpose and tf.layers.conv2d_transpose are tested, both fails to be converted.
Could you please show me which API should I use, or is there a sample code that deconvolution could be converted successfully?
Any comments are appreciated!