Hi,
I am using dragonboard 810. I am able to run downscaleBy2 and see around ~3x performan difference between using ion memory (rpcmem) and hlos memory (malloc). I wrote a program to measure the latency of making a dsp rpc with different input and output ion buffer sizes. The dsp procedure does nothing and returns immediately. As a result, the latency grows along with the ion buffer size. Here comes the question: the results seem strange since I am using ion, there shoud be no memory copy occurred in the adsp driver according to the fastrpc FAQ in Hexagon SDK. Besides, if the bottleneck is in cache operations, the latency should stop growing at some point as the cache size is fixed. Is there anything I miss?
The following are the results:
INPUT size 4 KB: latency: 162.41 us
INPUT size 8 KB: latency: 133.72 us
INPUT size 16 KB: latency: 137.07 us
INPUT size 32 KB: latency: 141.83 us
INPUT size 64 KB: latency: 151.00 us
INPUT size 128 KB: latency: 168.49 us
INPUT size 256 KB: latency: 202.91 us
INPUT size 512 KB: latency: 271.18 us
INPUT size 1024 KB: latency: 506.78 us
INPUT size 2048 KB: latency: 782.35 us
INPUT size 4096 KB: latency: 1336.25 us
INPUT size 8192 KB: latency: 2431.27 us
INPUT size 16384 KB: latency: 4601.62 us
INPUT size 32768 KB: latency: 9008.67 us
OUTPUT size 4 KB: latency: 120.12 us
OUTPUT size 8 KB: latency: 118.86 us
OUTPUT size 16 KB: latency: 125.13 us
OUTPUT size 32 KB: latency: 127.09 us
OUTPUT size 64 KB: latency: 130.92 us
OUTPUT size 128 KB: latency: 146.01 us
OUTPUT size 256 KB: latency: 182.88 us
OUTPUT size 512 KB: latency: 265.71 us
OUTPUT size 1024 KB: latency: 394.02 us
OUTPUT size 2048 KB: latency: 478.90 us
OUTPUT size 4096 KB: latency: 732.15 us
OUTPUT size 8192 KB: latency: 1359.33 us
OUTPUT size 16384 KB: latency: 2699.10 us
OUTPUT size 32768 KB: latency: 5368.13 us
Thanks,
Roger