hey, I' m using customed model with five parallel heads output to perform vision tasks(such segmentation, detection) on Snapdragon 865 DSP runtime.
when running on CPU and GPU, it works fine. but when running on DSP, it seems that some heads don't work properly, and the result log is shown below:
--------------------------------------------------
Load: 161 us
Deserialize: 12008 us
Create: 107342 us
Init: 120932 us
De-Init: 20620 us
Create Network(s): 13864 us
RPC Init Time: 83541 us
Snpe Accelerator Init Time: 81663 us
Accelerator Init Time: 78929 us
Average SNPE Statistics:
------------------------------
Total Inference Time: 88549 us
Forward Propagate Time: 88496 us
RPC Execute Time: 83420 us
Snpe Accelerator Time: 80768 us
Accelerator Time: 80442 us
Misc Accelerator Time: 1437 us
Layer Times:
---------------
0: 1321 us : DSP
1: 0 us : DSP
2: 20714 us : DSP
3: 2569 us : DSP
4: 0 us : DSP
5: 2099 us : DSP
6: 2099 us : DSP
7: 0 us : DSP
8: 589 us : DSP
9: 0 us : DSP
10: 2102 us : DSP
11: 2103 us : DSP
12: 0 us : DSP
13: 584 us : DSP
14: 0 us : DSP
15: 1235 us : DSP
16: 2034 us : DSP
17: 682 us : DSP
18: 0 us : DSP
19: 294 us : DSP
20: 0 us : DSP
21: 2025 us : DSP
22: 2023 us : DSP
23: 0 us : DSP
24: 294 us : DSP
25: 0 us : DSP
26: 1348 us : DSP
27: 2072 us : DSP
28: 407 us : DSP
29: 0 us : DSP
30: 158 us : DSP
31: 0 us : DSP
32: 2060 us : DSP
33: 2074 us : DSP
34: 0 us : DSP
35: 165 us : DSP
36: 0 us : DSP
37: 1378 us : DSP
38: 2162 us : DSP
39: 315 us : DSP
40: 0 us : DSP
41: 87 us : DSP
42: 0 us : DSP
43: 2147 us : DSP
44: 2131 us : DSP
45: 0 us : DSP
46: 88 us : DSP
47: 136 us : DSP
48: 33 us : DSP
49: 177 us : DSP
50: 99 us : DSP
51: 381 us : DSP
52: 386 us : DSP
53: 321 us : DSP
54: 1492 us : DSP
55: 863 us : DSP
56: 1123 us : DSP
57: 8144 us : DSP
58: 0 us : DSP
59: 0 us : DSP
60: 0 us : DSP
61: 0 us : DSP
62: 0 us : DSP
63: 0 us : DSP
64: 0 us : DSP
65: 0 us : DSP
66: 0 us : DSP
67: 0 us : DSP
68: 0 us : DSP
69: 0 us : DSP
70: 0 us : DSP
71: 0 us : DSP
72: 0 us : DSP
73: 0 us : DSP
74: 0 us : DSP
75: 0 us : DSP
76: 0 us : DSP
77: 0 us : DSP
78: 0 us : DSP
79: 0 us : DSP
80: 0 us : DSP
81: 0 us : DSP
82: 0 us : DSP
83: 0 us : DSP
84: 0 us : DSP
85: 0 us : DSP
86: 0 us : DSP
87: 0 us : DSP
88: 0 us : DSP
89: 0 us : DSP
90: 0 us : DSP
91: 0 us : DSP
92: 0 us : DSP
93: 0 us : DSP
94: 0 us : DSP
95: 0 us : DSP
96: 0 us : DSP
97: 0 us : DSP
98: 0 us : DSP
99: 1268 us : DSP
100: 885 us : DSP
101: 49 us : DSP
102: 8 us : DSP
103: 719 us : DSP
104: 1115 us : DSP
105: 0 us : DSP
106: 1288 us : DSP
107: 898 us : DSP
layer 58 to 98 are the operations of first 4 heads, and layer 99 to 107 belong to the final head.
some layers are relu layer, so 0 running time is normal. but layer 58 to 98 don't seem to have been executed.
anybody meet the some proplem? or DSP runtime doesn't support multiple output ?
Dear customer,
What's exeuction commands you used on target devices? And what's SNPE SDK version you have used?
Regarding the multiple output node at input.txt, you need to fill with the following as reference.
#output_node <empty space> output_node2
input.raw
BR.
Wei