Forums - running custom model on DSP Runtime too slow

7 posts / 0 new
Last post
running custom model on DSP Runtime too slow
zeekim
Join Date: 29 Mar 18
Posts: 15
Posted: Wed, 2018-07-25 05:33

Hi all,

        First, I try to test the inception_v3_quantized model which i get the Total MACs is 5713M on DSP runtime(Only inference time)

                  a、it cost 60ms in 820

                  b、it cost 37ms in 845 while 38ms in 710

                  c、it cost 57ms in 835

         Second, Than I try our model which I get the Total MACs is 78M and 138M on DSP runtime

                  a、it cost 200+ms in 820

                  b、it cost 180+ms in 845 while 180+ms in 710

                  c、it cost 200+ms in 835

           So what is the reason for that result? Does someone has the same problem or even fix or found out the reason. 

Thanks!!!

 

  • Up0
  • Down0
jihoonk
Profile picture
Join Date: 28 Jan 13
Location: Seoul
Posts: 55
Posted: Wed, 2018-07-25 21:17

Hi zeekim,

Please try SNPE benchmarking tool, snpe-diagview. It genetates layer-wise timing information, which might help you.

https://developer.qualcomm.com/docs/snpe/tools.html#tools_snpe-diagview

Thanks,

Jihoon

  • Up0
  • Down0
zeekim
Join Date: 29 Mar 18
Posts: 15
Posted: Fri, 2018-07-27 19:19

Dear Jihoon,

         Thanks for your quik replay.

         I has test the benchmark with inception v3 and my model. The new model(My test model) is cost 132ms by snapdragon on DSP Runtime.

  And I have same questions:

          1. My test model convolution cost much time  on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contrary

                For example, I had make same photo but it can not place images here, I test the inception_v3 and compare the covn_2 layer:My model's second layer is also conv2 layer with Relu.

1-1.  Inception v3:[snpe-dlc-info show the MACs: 398m; in hwc is 147x147x32; out hwc is 147x147x64 ]

layer_005 (Name:conv_2/Conv2D Type:convolutional)                      GPU: 6538                          DSP: 0

layer_006 (Name:conv_2 Type:neuron)                                                     GPU:     0                              DSP:2943
 

1-2. My test model::[snpe-dlc-info show the MACs: 199m; in hwc is 360x640x12; out hwc is 360x640x8]

layer_003 (Name:Mytest/conv2/Conv2D Type:convolutional)       GPU: 3513                          DSP: 0

layer_004 (Name:Mytest/conv2/Relu Type:neuron)                           GPU:  0                                DSP: 17491

 

          2.   I also find that deconvolution on DSP-Runtime is too slow too. 

Thanks!!!

  • Up0
  • Down0
jihoonk
Profile picture
Join Date: 28 Jan 13
Location: Seoul
Posts: 55
Posted: Sun, 2018-07-29 07:05

Hi zeekim,

I think you aren't using the latest version of SNPE.

AFAIK, deconvolution and activation has huge speed-up on DSP recently.

Please try the latest one, 1.17.0.

Thanks,

Jihoon

  • Up0
  • Down0
zeekim
Join Date: 29 Mar 18
Posts: 15
Posted: Mon, 2018-07-30 07:04

Dear Jihoon,

       I have tested the newest version(1.17.0) on Snapdragon 820

       ---->First, there was a very big improvement, it cost only half time of the old version.

      --->Second, the performance of convolution with neuron still too slow(which cost much time  on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contray). This is very important for us. Do you have any suggestions  for optimization?

 

Thanks!!!

  • Up0
  • Down0
jihoonk
Profile picture
Join Date: 28 Jan 13
Location: Seoul
Posts: 55
Posted: Mon, 2018-07-30 21:30

Hi zeekim,

I think the current speed is the maximum in your network model using snpe. You need to apply optimization techniques to your model like pruning, decomposition, and/or compression.

Thanks,

Jihoon

 

  • Up0
  • Down0
zf.africa
Join Date: 15 Jun 17
Posts: 51
Posted: Tue, 2018-07-31 19:55

Hi jihoonk,

I have tried to convert deconvolution but fails. And slim.conv2d_transpose and tf.layers.conv2d_transpose are tested, both fails to be converted.

Could you please show me which API should I use, or is there a sample code that deconvolution could be converted successfully?

Any comments are appreciated!

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.