Hello Everyone:
I run the example (caculator_multi_domain) in the Hexagon_Examples,
I add the time-record code at the start and end for "sum", and add them to both ARM(local) and DSP
they both run 1000 times, and the data count is 1024, at last, I am surprised at their result:
DSP's time-cost is about 800ms
ARM's time-cost is about 0.7ms
I'm very confused that why my dsp program is so slow?
Is there any one would like to tell me why?
My Qualcomm Chip is DM845, and my compile cmd is: make tree VERBOSE=1 V=hexagon_Release_dynamic_toolv83_v65