Forums - running custom model on DSP Runtime too slow

7 posts / 0 new

or Register

Last post

running custom model on DSP Runtime too slow

zeekim

Join Date: 29 Mar 18

Posts: 15

Posted: Wed, 2018-07-25 05:33

Top

Hi all,

First, I try to test the inception_v3_quantized model which i get the Total MACs is 5713M on DSP runtime(Only inference time)

a、it cost 60ms in 820

b、it cost 37ms in 845 while 38ms in 710

c、it cost 57ms in 835

Second, Than I try our model which I get the Total MACs is 78M and 138M on DSP runtime

a、it cost 200+ms in 820

b、it cost 180+ms in 845 while 180+ms in 710

c、it cost 200+ms in 835

So what is the reason for that result? Does someone has the same problem or even fix or found out the reason.

Thanks!!!

Forum vote up/down

Re: running custom model on DSP Runtime too slow #1

jihoonk

Join Date: 28 Jan 13

Location: Seoul

Posts: 55

Posted: Wed, 2018-07-25 21:17

Top

Hi zeekim,

Please try SNPE benchmarking tool, snpe-diagview. It genetates layer-wise timing information, which might help you.

https://developer.qualcomm.com/docs/snpe/tools.html#tools_snpe-diagview

Thanks,

Jihoon

Re: running custom model on DSP Runtime too slow #2

zeekim

Join Date: 29 Mar 18

Posts: 15

Posted: Fri, 2018-07-27 19:19

Top

Dear Jihoon,

Thanks for your quik replay.

I has test the benchmark with inception v3 and my model. The new model(My test model) is cost 132ms by snapdragon on DSP Runtime.

And I have same questions:

1. My test model convolution cost much time on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contrary

For example, I had make same photo but it can not place images here, I test the inception_v3 and compare the covn_2 layer:My model's second layer is also conv2 layer with Relu.

1-1. Inception v3:[snpe-dlc-info show the MACs: 398m; in hwc is 147x147x32; out hwc is 147x147x64 ]

layer_005 (Name:conv_2/Conv2D Type:convolutional) GPU: 6538 DSP: 0

layer_006 (Name:conv_2 Type:neuron) GPU: 0 DSP:2943

1-2. My test model::[snpe-dlc-info show the MACs: 199m; in hwc is 360x640x12; out hwc is 360x640x8]

layer_003 (Name:Mytest/conv2/Conv2D Type:convolutional) GPU: 3513 DSP: 0

layer_004 (Name:Mytest/conv2/Relu Type:neuron) GPU: 0 DSP: 17491

2. I also find that deconvolution on DSP-Runtime is too slow too.

Thanks!!!

Re: running custom model on DSP Runtime too slow #3

jihoonk

Join Date: 28 Jan 13

Location: Seoul

Posts: 55

Posted: Sun, 2018-07-29 07:05

Top

Hi zeekim,

I think you aren't using the latest version of SNPE.

AFAIK, deconvolution and activation has huge speed-up on DSP recently.

Please try the latest one, 1.17.0.

Thanks,

Jihoon

Re: running custom model on DSP Runtime too slow #4

zeekim

Join Date: 29 Mar 18

Posts: 15

Posted: Mon, 2018-07-30 07:04

Top

Dear Jihoon,

I have tested the newest version(1.17.0) on Snapdragon 820

---->First, there was a very big improvement, it cost only half time of the old version.

--->Second, the performance of convolution with neuron still too slow(which cost much time on DSP-Runtime than GPU-Runtime, while Inception_v3 is on the contray). This is very important for us. Do you have any suggestions for optimization?

Thanks!!!

Re: running custom model on DSP Runtime too slow #5

jihoonk

Join Date: 28 Jan 13

Location: Seoul

Posts: 55

Posted: Mon, 2018-07-30 21:30

Top

Hi zeekim,

I think the current speed is the maximum in your network model using snpe. You need to apply optimization techniques to your model like pruning, decomposition, and/or compression.

Thanks,

Jihoon

Re: running custom model on DSP Runtime too slow #6

zf.africa

Join Date: 15 Jun 17

Posts: 51

Posted: Tue, 2018-07-31 19:55

Top

Hi jihoonk,

I have tried to convert deconvolution but fails. And slim.conv2d_transpose and tf.layers.conv2d_transpose are tested, both fails to be converted.

Could you please show me which API should I use, or is there a sample code that deconvolution could be converted successfully?

Any comments are appreciated!

or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Sort By

Filter Results