SNPE execute GPU longer than CPU
I'm using a non-quantized DLC model for text recognize from jpeg.

The dlc input is 1,1,32,320.   1 Channel, H 32, W 320  jpeg. Including 6 charecters.

When I execute in CPU runtime, it returns in less 20ms.

But when I execute in GPU runtime, it returns in more than 1.2s !!

Could anyone encounted this problem, how to fix it?


Dear customer,

Could you please share the commands you used to analyze the problem.



hi yunxqin,

Thanks for your reply.

I checked "Limitations and Issues" in snpe document, and found

  • Convolution
    • For GPU runtime, when the number of groups is greater than 1, the number of output channels must be a multiple of 4 * the number of groups. For example, with 2 groups, the number of output channels must be a multiple of 8 (4*2=8).

And in my model, I reeally used nn.Conv2d by groups. Here is my model segment:

nn.Conv2d(32, 32, 3, 1, 1, groups=32, bias=False)
My output channel is same as groups 32. So Does this make GPU executing longer than CPU?
BTW, on snapdragon 8Gen1/2, GPU model output is errro. But on snapdragon 965, same GPU model output is right only slow speed.
My SDK version is latese SNPE 2.9.0
Wating for your reply, thank you.
