Forums - Using C neon intrinsics in inline assembly code.

3 posts / 0 new
Last post
Using C neon intrinsics in inline assembly code.
the.sound.of.ye...
Join Date: 19 Feb 14
Posts: 5
Posted: Mon, 2014-02-24 07:08

Dear All,

I am trying to translate C code to inline assembly to check whether I get a (timing) performance improvement.

However, I am having problems to use my variables from C code into inluine assembly code.

 

For example, I have:

u8x8_out             = vqmovn_u16(u16x8_tmp);

and the following assembly code:

asm(
 "VQMOVN.U16 %[out], %[in]\n"
 :[out] "=w" (u8x8_out)
 :[in] "w" (u16x8_tmp)
 :
);

 

And the error message I get is: 

 Error: Neon quad precision register expected -- `vqmovn.u16 d22,d22'

I have seen other examples but they always seem to show "r" inputs/outputs. Is it that I always need load data via registers? Or though an intermediate quadword/doubleword?

Thanks in advance for the help.

 

  • Up0
  • Down0
Raja Moderator
Join Date: 17 Apr 13
Posts: 42
Posted: Wed, 2014-03-05 16:02

Sorry for the late response.

I tried the example you used and I get vqmovn generated without errors. What is the type of the input arg (u16x8_tmp). Is it defined as uint16x8_t?

#include "arm_neon.h"
uint8x8_t foo(uint16x8_t u16x8_tmp)
{
  uint8x8_t u8x8_out;
  asm(
   "VQMOVN.U16 %0, %1;\n"
      :"=w"(u8x8_out)
       :"w"(u16x8_tmp)
       :
  );
  return u8x8_out;
}
 

clang -mfloat-abi=softfp -mfpu=neon -ccc-gcc-name -mcpu=krait2 -S asm.c

foo:
@ BB#0:
 vmov d17, r2, r3
 vmov d16, r0, r1
 @APP
 VQMOVN.U16 d16, q8;

 .code 16
 @NO_APP
 vmov r0, r1, d16
 bx lr
.Ltmp0:
 .size foo, .Ltmp0-foo

 

 

 

 

 

  • Up0
  • Down0
the.sound.of.ye...
Join Date: 19 Feb 14
Posts: 5
Posted: Fri, 2014-03-07 07:28

Hi Raja, 

Thanks for your answer. I was using gcc to compile this codeand for some reason it cmplained about this snippet. This specific one I tried from a bug report I found on the web. I did manage to get it to work and switched   the real snippet I intended to use (UDIV insruction)

For the sake of completeness:

 What is the type of the input arg (u16x8_tmp). Is it defined as uint16x8_t? Yes it is defined as that type.

 

I also put here the code snippet, for whoever may need it:

 

uint8_t udiv_function(uint32_t numerator, uint16_t denominator)
{
    uint8_t result;
    asm volatile (
        "UDIV %[out], %[num], %[den]"
        :[out] "=r" (result)
        :[num] "r" (numerator), [den] "r" (denominator)
        :
    );
    return result;

 

}

 

 

I compiled with the suggested flags and it worked. ;)

 

 

Francisco

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.