Hi,
I have a model which has attention mechanism and found the MatMul op in attention on cpu is 10x faster than on dsp.
Is there any possible solution can achieve run the MatMul op on cpu and others on dsp?
I have trried HTA partitions and cpu_fallbacks.
How to let some layers which DSP supports run on CPU
Posted: Mon, 2022-02-21 18:54
Hi,
I find UDO can satisfy my requirement and I write a MatMul UDO to override original MatMul.
The UDO works fine on CPU and GPU but not ok in DSP.
I've checked that MatMul on DSP runtime is work fine with CPU.
But I still get error meesages shows below
enviroment: snpe v1.59
platform: arm64-v8a