Hi,
I am trying to run benchmarking in a target running debian (Dragon board 820c).
SNPE version using 1.15.0.0
The bench marking script "snpe_bench.py " running in the host machine is using "Adb" to communicate with the target device. So how can the bench marking run in non-android platforms? The document says that "In addition to Android, there is limited support for LinuxEmbedded, where only the timing measurement is supported". So could some one please tell me how to run the bench marking in my debian target?
As a temporary solution, I have made a hack, by changing the adb shell execution into the direct shell execution. Also I had to split the code into two parts. One for running in the Aarch64 embedded linux target ( It usues the bin and libs in the folders "bin/aarch64-linux-gcc4.9" and "/lib/aarch64-linux-gcc4.9" respectively). Second part for analysing the profile information and creating the report, which runs in the host pc.
With this modification I could run the bench marking in my debian target for CPU runtime. And could generate the report in CSV format.
My questions are...
1) Whether this hack is correct?
2) In the report I could see the fields such as Load,Deserialize and Create which are not explained in the latest documentation for SNPE SDK.
What do these fields mean?
3) Why the total of the execution time in each layer is not equal to the "Forward Propagate" time ? (I see an error of 0.7%)
Thanks In Advance.
Hi,
It is great if the modified script is working for you and getting you the results.
Load: It could be the time taken to load the application into ram and start the execution.
Deserialization: It is the process of converting the data stored in memory to an object/data-structure format it needs to be to represent.
Ex: the image can be stored in the form of pixel values. Deserialization is converting the CSV format to image data.
Create: It can be the process of creating the object with the data available.
The below description should clarify your doubt on the 0.7% error you were mentioning,
Total Inference Time measures the entire execution time of one inference pass. This includes any input and output processing, copying of data, etc. This is measured at the start and end of the execute call.
Forward Propogate measures the time spent executing one inference pass excluding processing overheads on one of the accelerator cores. For example, in the case of the GPU this represents the execution time of all the GPU kernels running on the GPU HW.
Hi Sharvin,
I was wondering if you can share your modified benchmark script.
Thank you