Hi:
When I tried to run snpe_bench.py on S888 to see layer-by-layer time (with '-l detialed' mode and HTP enabled), there were wierd results.
For example in the table below, the total inference time was 6383 us, however the first layer was reported as 614692 us which is MUCH longer than the total inference time. Is there any metric mismatch or something?
DSP_ub_tf8_timing(10 runs)
avg (us)
max (us)
Load
284
397
Deserialize
38112
41428
Create
39514
41233
Init
92212
96738
De-Init
6416
7346
Create Network(s)
381
393
RPC Init Time
8275
9039
Snpe Accelerator Init Time
6888
7456
Accelerator Init Time
6681
7243
Total Inference Time
6383
6693
Forward Propagate
6363
6676
RPC Execute
5342
5515
Snpe Accelerator
4842
4912
Accelerator
3818
3878
Misc Accelerator
0
0
layer_000 (Name:data Type:data)
0
0
layer_001 (Name:bn_data Type:batchnorm)
614692
619451
layer_002 (Name:conv0 Type:convolutional)
0
0
layer_003 (Name:relu0 Type:neuron)
173676
174932
layer_004 (Name:pooling0 Type:pooling)
0
0
layer_005 (Name:stage1_unit1_conv1 Type:convolutional)
0
0
layer_006 (Name:stage1_unit1_relu1 Type:neuron)
0
0
layer_007 (Name:stage1_unit1_conv2 Type:convolutional)
0
0
layer_008 (Name:stage1_unit1_relu2 Type:neuron)
6906
7144
layer_009 (Name:stage1_unit1_conv3 Type:convolutional)
18223
18264
layer_010 (Name:stage1_unit1_sc Type:convolutional)
48521
49066
layer_011 (Name:stage1_unit1_plus Type:elementwise_op)
0
0
layer_012 (Name:stage1_unit1_relu Type:neuron)
75195
76933
layer_013 (Name:stage1_unit2_conv1 Type:convolutional)
0
0
layer_014 (Name:stage1_unit2_relu1 Type:neuron)
0
0
layer_015 (Name:stage1_unit2_conv2 Type:convolutional)
0
0
layer_016 (Name:stage1_unit2_relu2 Type:neuron)
9549
9699
layer_017 (Name:stage1_unit2_conv3 Type:convolutional)
20394
20497
layer_018 (Name:stage1_unit2_plus Type:elementwise_op)
0
0
layer_019 (Name:stage1_unit2_relu Type:neuron)
76122
76686
layer_020 (Name:stage1_unit3_conv1 Type:convolutional)
0
0
layer_021 (Name:stage1_unit3_relu1 Type:neuron)
0
0
layer_022 (Name:stage1_unit3_conv2 Type:convolutional)
0
0
layer_023 (Name:stage1_unit3_relu2 Type:neuron)
8817
9112
layer_024 (Name:stage1_unit3_conv3 Type:convolutional)
20364
20527
layer_025 (Name:stage1_unit3_plus Type:elementwise_op)
0
0
layer_026 (Name:stage1_unit3_relu Type:neuron)
79295
80135
layer_027 (Name:stage2_unit1_conv1 Type:convolutional)
0
0
layer_028 (Name:stage2_unit1_relu1 Type:neuron)
0
0
layer_029 (Name:stage2_unit1_conv2 Type:convolutional)
0
0
layer_030 (Name:stage2_unit1_relu2 Type:neuron)
17054
17507
layer_031 (Name:stage2_unit1_conv3 Type:convolutional)
12367
12555
layer_032 (Name:stage2_unit1_sc Type:convolutional)
26945
27403
layer_033 (Name:stage2_unit1_plus Type:elementwise_op)
0
0
layer_034 (Name:stage2_unit1_relu Type:neuron)
25908
26128
layer_035 (Name:stage2_unit2_conv1 Type:convolutional)
0
0
layer_036 (Name:stage2_unit2_relu1 Type:neuron)
0
0
layer_037 (Name:stage2_unit2_conv2 Type:convolutional)
0
0
layer_038 (Name:stage2_unit2_relu2 Type:neuron)
9539
9924
layer_039 (Name:stage2_unit2_conv3 Type:convolutional)
11917
11996
layer_040 (Name:stage2_unit2_plus Type:elementwise_op)
0
0
layer_041 (Name:stage2_unit2_relu Type:neuron)
25905
26101
layer_042 (Name:stage2_unit3_conv1 Type:convolutional)
0
0
layer_043 (Name:stage2_unit3_relu1 Type:neuron)
0
0
layer_044 (Name:stage2_unit3_conv2 Type:convolutional)
0
0
layer_045 (Name:stage2_unit3_relu2 Type:neuron)
6966
7180
layer_046 (Name:stage2_unit3_conv3 Type:convolutional)
11854
11900
layer_047 (Name:stage2_unit3_plus Type:elementwise_op)
0
0
layer_048 (Name:stage2_unit3_relu Type:neuron)
26617
26837
layer_049 (Name:stage2_unit4_conv1 Type:convolutional)
0
0
layer_050 (Name:stage2_unit4_relu1 Type:neuron)
0
0
layer_051 (Name:stage2_unit4_conv2 Type:convolutional)
0
0
layer_052 (Name:stage2_unit4_relu2 Type:neuron)
7465
7736
layer_053 (Name:stage2_unit4_conv3 Type:convolutional)
11907
11963
layer_054 (Name:stage2_unit4_plus Type:elementwise_op)
0
0
layer_055 (Name:stage2_unit4_relu Type:neuron)
26945
27431
layer_056 (Name:stage3_unit1_conv1 Type:convolutional)
0
0
layer_057 (Name:stage3_unit1_relu1 Type:neuron)
0
0
layer_058 (Name:stage3_unit1_conv2 Type:convolutional)
0
0
layer_059 (Name:stage3_unit1_relu2 Type:neuron)
6636
6765
layer_060 (Name:stage3_unit1_conv3 Type:convolutional)
5801
5834
layer_061 (Name:stage3_unit1_sc Type:convolutional)
16244
16582
layer_062 (Name:stage3_unit1_plus Type:elementwise_op)
0
0
layer_063 (Name:stage3_unit1_relu Type:neuron)
7866
7979
layer_064 (Name:stage3_unit2_conv1 Type:convolutional)
0
0
layer_065 (Name:stage3_unit2_relu1 Type:neuron)
0
0
layer_066 (Name:stage3_unit2_conv2 Type:convolutional)
0
0
layer_067 (Name:stage3_unit2_relu2 Type:neuron)
2619
2701
layer_068 (Name:stage3_unit2_conv3 Type:convolutional)
5645
5686
layer_069 (Name:stage3_unit2_plus Type:elementwise_op)
0
0
layer_070 (Name:stage3_unit2_relu Type:neuron)
7810
7963
layer_071 (Name:stage3_unit3_conv1 Type:convolutional)
0
0
layer_072 (Name:stage3_unit3_relu1 Type:neuron)
0
0
layer_073 (Name:stage3_unit3_conv2 Type:convolutional)
0
0
layer_074 (Name:stage3_unit3_relu2 Type:neuron)
2922
3171
layer_075 (Name:stage3_unit3_conv3 Type:convolutional)
5645
5671
layer_076 (Name:stage3_unit3_plus Type:elementwise_op)
0
0
layer_077 (Name:stage3_unit3_relu Type:neuron)
7769
7839
layer_078 (Name:stage3_unit4_conv1 Type:convolutional)
0
0
layer_079 (Name:stage3_unit4_relu1 Type:neuron)
0
0
layer_080 (Name:stage3_unit4_conv2 Type:convolutional)
0
0
layer_081 (Name:stage3_unit4_relu2 Type:neuron)
2634
2695
layer_082 (Name:stage3_unit4_conv3 Type:convolutional)
5945
6025
layer_083 (Name:stage3_unit4_plus Type:elementwise_op)
0
0
layer_084 (Name:stage3_unit4_relu Type:neuron)
7635
7841
layer_085 (Name:stage3_unit5_conv1 Type:convolutional)
0
0
layer_086 (Name:stage3_unit5_relu1 Type:neuron)
0
0
layer_087 (Name:stage3_unit5_conv2 Type:convolutional)
0
0
layer_088 (Name:stage3_unit5_relu2 Type:neuron)
2816
2910
layer_089 (Name:stage3_unit5_conv3 Type:convolutional)
5645
5671
layer_090 (Name:stage3_unit5_plus Type:elementwise_op)
0
0
Hello conan353,
On targets with DSP architecture v68(s888), each row in the detailed profiling report provides per op profiling result by cycle counts instead of time in microsecs.
There is no direct conversion method from cycle count to microsecs because of the parallelized execution
of Ops. Hence it is recommended to use the per layer cycle timings as a reference to compare/measure the
relative performance to know which of them are using lower/higher cycles to finish the execution.
Thanks
Hi lskraoc,
Thank you very much for the clarification, that's really helpful.
However, may I ask a few more questions please?
Is there any hw stall caused by hardware conflict (or something else) due to the parallelism? If no, then what is the frequency of v68 DSP? In this way we can at least roughly estimate the execution time of each op.
Also do you know any way to plot a simple timeline of each ops? That will be very helpful to understand the overlap condition between ops.
Thanks again,
Conan353