Hi
I have got both 855 and 865 HDK currently and have done some NN inference benchmarking.
As declared officially, S865 HMX can deliver 8TOPS (4x 855 HMX), while in total deliver 15TOPS (2x 855 7TOPS).
However, according to the benchmarking resutls, 855 perf is basically consistent with my expectation but 865 HMX is far from 4x speed up. The best case, ResNet50 (224x224) yields only ~30% speedup. The results are in tables below.
The number in the last column is calculated by adding the FPS of each module together.
Thus my questions are:
1. How can I unlock the full perf of S865? Is that because of the under optimization of SNPE? (the version I used is the latest 1.40)
2. How to concurrently use all of the modules(DSP, AIP, CPU, GPU) to achieve the peak perf (15TOPS)?
3. The exact compute capability of each module? Currently the number I assumed is shown in the table header.
Thanks,
Conan353
Network Total GMACs 855 HMX 855 HVX 855 GPU 855 CPU HMX+HVX+GPU
FPS (2T) FPS (3T) FPS (1T) FPS (1T) FPS (Assuming) (7T)
ResNet50v1.5 4.089 99 101 20 3 224
(224x224)
ResNet18 1.827 207 216 42 8 473
(224x224)
ResNet34 3.676 133 134 21 3 291
(224x224)
ResNet50 3.858 103 104 21 3 231
(224x224)
Mobilenet-V1 0.568 315 270 122 7 714
(224x224)
ResNeXt-101 7.97 9 21 8 1 39
(224x224)
SSD-Mobilenet-V1 1.237 - 156 55 3 211
(300x300)
SSD-ResNet34-300 15.882 37 51 6 1 93
(300x300)
SSD-ResNet34-1200 216.426 - 3 0.44 0 4
(1200x1200)
Inception v3 (TF) 5.715 86 97 14 2 210
(299x299)
DeepLab (TF) 8.832 - 14 7 0.4 21
(513x513)
Network Total GMACs 865 HMX 865 HVX GPU CPU HMX+HVX+GPU
FPS FPS FPS FPS FPS (15T)
(8T) (3T) (2T) (2T) (Assuming)
ResNet50v1.5 4.089 127 98 19 4 248
(224x224)
ResNet18 1.827 225 190 34 12 461
(224x224)
ResNet34 3.676 165 89 17 3 275
(224x224)
ResNet50 3.858 137 97 19 5 258
(224x224)
Mobilenet-V1 0.568 271 250 110 4 635
(224x224)
ResNeXt-101 7.97 - 23 7 2 32
(224x224)
SSD-Mobilenet-V1 1.237 133 146 55 2 336
(300x300)
SSD-ResNet34-300 15.882 69 49 7 1 125
(300x300)
SSD-ResNet34-1200 216.426 4 3 1 - 8
(1200x1200)
Inception v3 (TF) 5.715 111 94 13 3 222
(299x299)
DeepLab (TF) 8.832 13 13 7 0.7 34
(513x513)