Qualcomm DSP Hexagon v5
Cycles:
In programmers_ref_v5.pdf mentioned that one instruction or instruction packet executions in one cycle (1.3.3 Sequencer).
For measuring cycles I use hexagon_sim_read_cycles like mentioned in \libcore\SigProc\iir\ examples.
But I see that each instruction occupies more than one cycle. Even clear iir_cas_bq.S file (for testing) and add any one instruction cycle counter will be increased +3 (but I expected +1 cycle).
So if code has 5 instructions I get about 15 cycles instead of 5. Why?
Optimization:
When I try to test assembler function (including C test wrapper) without optimization options (for wrapper) project works properly.
If use optimization (any level) project does not work! Why?
Test wrapper uses read/write file functions.
If use the same wrapper with C prototype function – everything is OK
About cycles:
Hexagon V5 has 3 threads.
Example \libcore\SigProc\iir\ works with 1 thread only, other 2 threads are idle. Simulation is in IMT (interleaved multi-threading) mode. Therefore cycle measurement mechanism interprets one cycle of one thread like 3 cycles of common DSP. DMT (dynamic multi-threading) mode makes calculation faster but worst case is IMT (when other threads are busy).