I tried out the QHBLAS functions given in the Hexagon SDK 5.5.0.1 on a Snapdragon 8 Gen 3 device. I was able to use the floating point 32 matrix multiplication functions without any issues.
But with the fixed-point matrix multiplication, I didn't get the results which I expected. I wanted to do integer matrix multiplication - so I passed in integers of the required type but the outputs were not as expected. I observed the following 2 trends:
1. For the functions having output with same type as the input - such as int8 output for int8 inputs, int16 output for int16 inputs, and int32 output for int32 inputs, the results don't have the last 7 bits. In other words, we get very small numbers here.
2. For the functions with a higher sized output type than the input - such as int16 output for int8 inputs, int32 output for int16 inputs, the results are exactly double the expected output. In other words, each element in the result matrix is 2 of what I expected it to be.
This trend is also observed in the dot product functions listed in the qhblas fixed-point category.
Also, interestingly, I observed that in the test cases, the reference output is right-shifted by 7 bits and saturated for the matching input-output types and left shifted by 1 (to halve the output) in the case of the output types which are higher than the input types. The test cases pass here.
Could you please help me understand why is the output scaled (bit-shifted) and how to get the correct results? Is there something wrong with how I used these functions/built it? Or is this expected and do we have some examples on how to fix this?
Is there some documentation/article on how to work with fixed point/integer multiplication?
This is the documentation I referred to while using the functions:
Hexagon_SDK/5.5.0.1/docs/doxygen/qhl_hvx/group__qhmath__hvx__fixed__point__functions.html
Hi Vijay,
The calculations in the matrix multiplication are based on Q7 and Q15 fixed point formats, below is the explanation:
For qhblas_hvx_ah_matrix_vector_mpy_ab()- inputs are in q7 and output is in q15, it is expected that you will see double the expected results if you don't use inputs in Q7 and outputs in Q15.
Internal calculations:
Two inputs in Q7 if we multiply together.
Input1(Q7) = actual input1 x 2^7 (Q7)
Input2(Q7) = actual input2 x 2^7 (Q7)
Output(Q15) = input1 x input2 x 2 = actual input 1 x 2^7 x actual input2 x 2^7 x 2 = actual output x 2^15 -> Actual Output in Q15
If you ignore the Q factors you will see Actual output x 2 as result.
For qhblas_hvx_matrix_matrix_mpy_ab() without 2^7 factor for input.
Internal calculations:
Input1(Q7) = actual input1 x 2^7 (Q7)
Input2(Q7) = actual input2 x 2^7 (Q7)
Output(Q7) = input1 x input2 /2^7 = actual input 1 x 2^7 x actual input2 x 2^7 / 2^7 = actual output x 2^7 -> Actual Output in Q7
Here you will see all zeros as output for results < 128 if you don't consider outputs and inputs in Q7.
Thanks,
Gayathri.