Dear All,
I am trying to translate C code to inline assembly to check whether I get a (timing) performance improvement.
However, I am having problems to use my variables from C code into inluine assembly code.
For example, I have:
u8x8_out = vqmovn_u16(u16x8_tmp);
and the following assembly code:
And the error message I get is:
Error: Neon quad precision register expected -- `vqmovn.u16 d22,d22'
I have seen other examples but they always seem to show "r" inputs/outputs. Is it that I always need load data via registers? Or though an intermediate quadword/doubleword?
Thanks in advance for the help.
Sorry for the late response.
I tried the example you used and I get vqmovn generated without errors. What is the type of the input arg (u16x8_tmp). Is it defined as uint16x8_t?
#include "arm_neon.h"
uint8x8_t foo(uint16x8_t u16x8_tmp)
{
uint8x8_t u8x8_out;
asm(
"VQMOVN.U16 %0, %1;\n"
:"=w"(u8x8_out)
:"w"(u16x8_tmp)
:
);
return u8x8_out;
}
clang -mfloat-abi=softfp -mfpu=neon -ccc-gcc-name -mcpu=krait2 -S asm.c
foo:
@ BB#0:
vmov d17, r2, r3
vmov d16, r0, r1
@APP
VQMOVN.U16 d16, q8;
.code 16
@NO_APP
vmov r0, r1, d16
bx lr
.Ltmp0:
.size foo, .Ltmp0-foo
Hi Raja,
Thanks for your answer. I was using gcc to compile this codeand for some reason it cmplained about this snippet. This specific one I tried from a bug report I found on the web. I did manage to get it to work and switched the real snippet I intended to use (UDIV insruction)
For the sake of completeness:
What is the type of the input arg (u16x8_tmp). Is it defined as uint16x8_t? Yes it is defined as that type.
I also put here the code snippet, for whoever may need it:
I compiled with the suggested flags and it worked. ;)
Francisco