Hi everyone!
We are looking to run a fairly large model on device in a quick "burst", meaning as fast as possible but only once in a while and not continuously.
Would it make sense to run it on the GPU and HTP in parallel? We have to run on patches anyhow due to memory constraints and could therefore execute half of them on GPU and the other half on HTP.
To the best of our understanding, both GPU and HTP are QNN backends and both support float models (at least on selected devices.)
Thank you in advance, any help is appreciated!
Manuel
Dear developer,
The GPU and HTP are two different backend in our platform. You shall be applied tow SNPE instance in you application. One is for GPU backend and another presents to HTP backend.
BR.
Wei
Dear Wei,
Thank you for getting back to me!
So does that mean we can use the GPU and HTP at the same time if we create one SNPE instance for each of them at the same time?
And then we could process a given amount of data up to 2x as fast because we one half on the GPU and the other half on the HTP?
Thank you,
Manuel