Forums - SNPE v1.51: S888 Wrong layer-by-layer timing when benchmarking with snpe_bench.py

3 posts / 0 new
Last post
SNPE v1.51: S888 Wrong layer-by-layer timing when benchmarking with snpe_bench.py
conan353
Join Date: 3 Feb 20
Posts: 12
Posted: Mon, 2021-06-14 20:51

Hi:

When I tried to run snpe_bench.py on S888 to see layer-by-layer time (with '-l detialed' mode and HTP enabled), there were wierd results.

For example in the table below, the total inference time was 6383 us, however the first layer was reported as 614692 us which is MUCH longer  than the total inference time. Is there any metric mismatch or something? 

 

DSP_ub_tf8_timing(10 runs)

 

 

avg (us)

max (us)

Load

284

397

Deserialize

38112

41428

Create

39514

41233

Init

92212

96738

De-Init

6416

7346

Create Network(s)

381

393

RPC Init Time

8275

9039

Snpe Accelerator Init Time

6888

7456

Accelerator Init Time

6681

7243

Total Inference Time

6383

6693

Forward Propagate

6363

6676

RPC Execute

5342

5515

Snpe Accelerator

4842

4912

Accelerator

3818

3878

Misc Accelerator

0

0

layer_000 (Name:data Type:data)

0

0

layer_001 (Name:bn_data Type:batchnorm)

614692

619451

layer_002 (Name:conv0 Type:convolutional)

0

0

layer_003 (Name:relu0 Type:neuron)

173676

174932

layer_004 (Name:pooling0 Type:pooling)

0

0

layer_005 (Name:stage1_unit1_conv1 Type:convolutional)

0

0

layer_006 (Name:stage1_unit1_relu1 Type:neuron)

0

0

layer_007 (Name:stage1_unit1_conv2 Type:convolutional)

0

0

layer_008 (Name:stage1_unit1_relu2 Type:neuron)

6906

7144

layer_009 (Name:stage1_unit1_conv3 Type:convolutional)

18223

18264

layer_010 (Name:stage1_unit1_sc Type:convolutional)

48521

49066

layer_011 (Name:stage1_unit1_plus Type:elementwise_op)

0

0

layer_012 (Name:stage1_unit1_relu Type:neuron)

75195

76933

layer_013 (Name:stage1_unit2_conv1 Type:convolutional)

0

0

layer_014 (Name:stage1_unit2_relu1 Type:neuron)

0

0

layer_015 (Name:stage1_unit2_conv2 Type:convolutional)

0

0

layer_016 (Name:stage1_unit2_relu2 Type:neuron)

9549

9699

layer_017 (Name:stage1_unit2_conv3 Type:convolutional)

20394

20497

layer_018 (Name:stage1_unit2_plus Type:elementwise_op)

0

0

layer_019 (Name:stage1_unit2_relu Type:neuron)

76122

76686

layer_020 (Name:stage1_unit3_conv1 Type:convolutional)

0

0

layer_021 (Name:stage1_unit3_relu1 Type:neuron)

0

0

layer_022 (Name:stage1_unit3_conv2 Type:convolutional)

0

0

layer_023 (Name:stage1_unit3_relu2 Type:neuron)

8817

9112

layer_024 (Name:stage1_unit3_conv3 Type:convolutional)

20364

20527

layer_025 (Name:stage1_unit3_plus Type:elementwise_op)

0

0

layer_026 (Name:stage1_unit3_relu Type:neuron)

79295

80135

layer_027 (Name:stage2_unit1_conv1 Type:convolutional)

0

0

layer_028 (Name:stage2_unit1_relu1 Type:neuron)

0

0

layer_029 (Name:stage2_unit1_conv2 Type:convolutional)

0

0

layer_030 (Name:stage2_unit1_relu2 Type:neuron)

17054

17507

layer_031 (Name:stage2_unit1_conv3 Type:convolutional)

12367

12555

layer_032 (Name:stage2_unit1_sc Type:convolutional)

26945

27403

layer_033 (Name:stage2_unit1_plus Type:elementwise_op)

0

0

layer_034 (Name:stage2_unit1_relu Type:neuron)

25908

26128

layer_035 (Name:stage2_unit2_conv1 Type:convolutional)

0

0

layer_036 (Name:stage2_unit2_relu1 Type:neuron)

0

0

layer_037 (Name:stage2_unit2_conv2 Type:convolutional)

0

0

layer_038 (Name:stage2_unit2_relu2 Type:neuron)

9539

9924

layer_039 (Name:stage2_unit2_conv3 Type:convolutional)

11917

11996

layer_040 (Name:stage2_unit2_plus Type:elementwise_op)

0

0

layer_041 (Name:stage2_unit2_relu Type:neuron)

25905

26101

layer_042 (Name:stage2_unit3_conv1 Type:convolutional)

0

0

layer_043 (Name:stage2_unit3_relu1 Type:neuron)

0

0

layer_044 (Name:stage2_unit3_conv2 Type:convolutional)

0

0

layer_045 (Name:stage2_unit3_relu2 Type:neuron)

6966

7180

layer_046 (Name:stage2_unit3_conv3 Type:convolutional)

11854

11900

layer_047 (Name:stage2_unit3_plus Type:elementwise_op)

0

0

layer_048 (Name:stage2_unit3_relu Type:neuron)

26617

26837

layer_049 (Name:stage2_unit4_conv1 Type:convolutional)

0

0

layer_050 (Name:stage2_unit4_relu1 Type:neuron)

0

0

layer_051 (Name:stage2_unit4_conv2 Type:convolutional)

0

0

layer_052 (Name:stage2_unit4_relu2 Type:neuron)

7465

7736

layer_053 (Name:stage2_unit4_conv3 Type:convolutional)

11907

11963

layer_054 (Name:stage2_unit4_plus Type:elementwise_op)

0

0

layer_055 (Name:stage2_unit4_relu Type:neuron)

26945

27431

layer_056 (Name:stage3_unit1_conv1 Type:convolutional)

0

0

layer_057 (Name:stage3_unit1_relu1 Type:neuron)

0

0

layer_058 (Name:stage3_unit1_conv2 Type:convolutional)

0

0

layer_059 (Name:stage3_unit1_relu2 Type:neuron)

6636

6765

layer_060 (Name:stage3_unit1_conv3 Type:convolutional)

5801

5834

layer_061 (Name:stage3_unit1_sc Type:convolutional)

16244

16582

layer_062 (Name:stage3_unit1_plus Type:elementwise_op)

0

0

layer_063 (Name:stage3_unit1_relu Type:neuron)

7866

7979

layer_064 (Name:stage3_unit2_conv1 Type:convolutional)

0

0

layer_065 (Name:stage3_unit2_relu1 Type:neuron)

0

0

layer_066 (Name:stage3_unit2_conv2 Type:convolutional)

0

0

layer_067 (Name:stage3_unit2_relu2 Type:neuron)

2619

2701

layer_068 (Name:stage3_unit2_conv3 Type:convolutional)

5645

5686

layer_069 (Name:stage3_unit2_plus Type:elementwise_op)

0

0

layer_070 (Name:stage3_unit2_relu Type:neuron)

7810

7963

layer_071 (Name:stage3_unit3_conv1 Type:convolutional)

0

0

layer_072 (Name:stage3_unit3_relu1 Type:neuron)

0

0

layer_073 (Name:stage3_unit3_conv2 Type:convolutional)

0

0

layer_074 (Name:stage3_unit3_relu2 Type:neuron)

2922

3171

layer_075 (Name:stage3_unit3_conv3 Type:convolutional)

5645

5671

layer_076 (Name:stage3_unit3_plus Type:elementwise_op)

0

0

layer_077 (Name:stage3_unit3_relu Type:neuron)

7769

7839

layer_078 (Name:stage3_unit4_conv1 Type:convolutional)

0

0

layer_079 (Name:stage3_unit4_relu1 Type:neuron)

0

0

layer_080 (Name:stage3_unit4_conv2 Type:convolutional)

0

0

layer_081 (Name:stage3_unit4_relu2 Type:neuron)

2634

2695

layer_082 (Name:stage3_unit4_conv3 Type:convolutional)

5945

6025

layer_083 (Name:stage3_unit4_plus Type:elementwise_op)

0

0

layer_084 (Name:stage3_unit4_relu Type:neuron)

7635

7841

layer_085 (Name:stage3_unit5_conv1 Type:convolutional)

0

0

layer_086 (Name:stage3_unit5_relu1 Type:neuron)

0

0

layer_087 (Name:stage3_unit5_conv2 Type:convolutional)

0

0

layer_088 (Name:stage3_unit5_relu2 Type:neuron)

2816

2910

layer_089 (Name:stage3_unit5_conv3 Type:convolutional)

5645

5671

layer_090 (Name:stage3_unit5_plus Type:elementwise_op)

0

0

 

 

  • Up0
  • Down0
lskraoc
Join Date: 15 Jun 21
Posts: 1
Posted: Tue, 2021-06-15 08:42

Hello conan353,

On targets with DSP architecture v68(s888), each row in the detailed profiling report provides per op profiling result by cycle counts instead of time in microsecs.
There is no direct conversion method from cycle count to microsecs because of the parallelized execution
of Ops. Hence it is recommended to use the per layer cycle timings as a reference to compare/measure the
relative performance to know which of them are using lower/higher cycles to finish the execution.

Thanks

  • Up0
  • Down0
conan353
Join Date: 3 Feb 20
Posts: 12
Posted: Tue, 2021-06-15 23:37

Hi lskraoc,

Thank you very much for the clarification, that's really helpful.

However, may I ask a few more questions please?

Is there any hw stall caused by hardware conflict (or something else) due to the parallelism? If no, then what is the frequency of v68 DSP? In this way we can at least roughly estimate the execution time of each op.

Also do you know any way to plot a simple timeline of each ops? That will be very helpful to understand the overlap condition between ops.

Thanks again,

Conan353

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.