I have tried runnning GPU & AIP in two threads paralley, then I have seen GPU throughput of 27.5 inferences/second & AIP throughtput of 116.1 inferences/second. But when I ran model on AIP in 2 threads in parallel, I got throughtput of AIP thread1 -> 97.6 & AIP thread2 -> 92.8 inferences/second respectively. From this observation I understood that, AIP is acually capable of delivering 190 inferences/second, but when AIP is ran parallel with GPU, AIP performance is getting degraded, why is this happening. Is there any limitaion, can you please explain me.
Thanks in advance
Dear developer,
The runtime AIP is latency platform and no long support it. You could run the DSP backend instead.
BR.
Wei