LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-13-2024, 12:18 PM   #1
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Rep: Reputation: 84
Inline assembly - How to deal with arrays in AT&T style?


Hello all,

I know that there are many choices in the world of programming ...
In my case, I have some code that I had written in Intel style for Microsoft VC++ 6. It uses 32 bit addresses. It uses some ordinary x86 instructions and also MMX at some places and SSE at some places.

On Linux, I use Qt Creator as my IDE. I think that underneath it, it uses the g++ compiler.

Step 1: I updated the code to use 64 bit addresses.
Step 2: There are more registers. So I did minor changes to use more registers.
Step 3: Compile the Intel code under Linux? Some people mention using a compiler flag for gcc. I did not do this.
Step 4: So, I learned AT&T style. I learned asm extended assembly for gcc and went ahead and converted from Intel Style to AT&T style. (Maybe I’m nuts?)
Step 5: That %0, %1, %2 stuff. Oh boy! Too late did I learn that it is possible to use labels.

MY MAIN QUESTION:
In my Intel style code, I have
Code:
addps xmm0, xmmword ptr[Global_NfloatArray]
where Global_NfloatArray is some float array.

In AT&T style
Code:
addps %0, %%xmm0;
where %0 represents Global_NfloatArray.
It compiles but doesn’t work.
I think it is because it copies the address to xmm0 instead of referencing the RAM pointed to by pointer Global_NfloatArray

So, I guess I need to write
Code:
addps (%0), %%xmm0;
but that doesn’t compile.


Another case:
In my Intel style code, I have
Code:
fld dword ptr[t1]
fld dword ptr[t1+4]
In AT&T style
Code:
fld %1; ?????????
fld %1+4;
where %1 is float t1[100]

I tried to learn by example but can’t find what I am looking for.
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
 
Old 03-13-2024, 07:56 PM   #2
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
After a week of searching, I found the solution.
Apparently, if you have an array that is global, there is something special you need to do.
You need to write (*Global_t1)

For example:
Code:
sint64 Test4000()
{
	sint64 returnVal;
	Global_t1[0]=57.0;
	Global_t1[1]=11.0;

	//This is 64 bit code for Linux
	asm volatile
	(
		"movaps		%0, %%xmm0;"
		"mulps		%%xmm0, %%xmm0;"
		"movaps		%%xmm0, %0;"
		:
		: "m" (*Global_t1)
		:
	);

	return returnVal;
}
 
Old 03-13-2024, 09:58 PM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by vmelkon View Post
Apparently, if you have an array that is global, there is something special you need to do.
You need to write (*Global_t1)
Hmm, I get the same code output with (Global_t1) as with (*Global_t1). I would suggest using register constraints and then letting the compiler figure out how to move the data instead of hard-coding movaps though:

Code:
    asm ("mulps %0, %0\n"
      : "+v" (Global_t1) /* + means read/write, v means "Any EVEX encodable SSE
                            register (%xmm0-%xmm31)." */);
This is generating the following for me at -O1:
Code:
        movq    xmm0, QWORD PTR Global_t1[rip]
        mulps   xmm0, xmm0

        movq    QWORD PTR Global_t1[rip], xmm0
References:
https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html
https://gcc.gnu.org/onlinedocs/gcc/M...nstraints.html (search for x86 family)
 
Old 03-14-2024, 03:54 PM   #4
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
movq seems to copy 64 bit but I want 128 bit, since it would be for copying 4 floats to a xmm0. (According to https://www.felixcloutier.com/x86/movq)

I find that variable constraint thing confusing.There is "r" and "+r" and "g" and various codes.
Some codes seem to be suggestions for loading into a register like "a" is suppose to mean rax.
 
Old 03-14-2024, 06:09 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by vmelkon View Post
movq seems to copy 64 bit but I want 128 bit, since it would be for copying 4 floats to a xmm0. (According to https://www.felixcloutier.com/x86/movq)
Oh, I saw 2 float in your example and assumed that was the whole array. Actually, now I'm seeing my suggestion only works with an array up to size 4 (and the size has to be visible at compile time). So it's probably not what you want anyway.

Your code in #2 looks good except that the constraints you put don't tell the compiler that you are using xmm0, and reading from Global_t1. I would update it like this:
Code:
    asm (
         "movaps %0, %%xmm0;\n"
         "mulps %%xmm0, %%xmm0;\n"
         "movaps %%xmm0, %0"
         : "+m" (*Global_t1) /* out (+ means also read as input) */
         :                   /* in */
         : "xmm0"            /* clobber (tell compiler we are overwriting xmm0) */
         );
 
Old 03-15-2024, 01:47 PM   #6
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
For the clobber list, it doesn't seem to recognize r8, r9 and all the way to r15.
This is weird since I used r8 and r9 in my assembly code. (I did not run the code yet).

Also, I forgot to give the other example:
If you want to operate from RAM directly
Intel style code is something like
Code:
float t1[100];      //Globally declared variable
fld dword ptr[t1]
fld dword ptr[t1+4]
fadd dword ptr[t1+8], st(0)
AT&T style would be
Code:
fld %0;
fld 4%0;        //Notice the 4. This adds 4 bytes to the address of t1
fadd 8%0, %%st(0);      //Notice the 8. This adds 4 bytes to the address of t1
 
Old 03-15-2024, 10:57 PM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by vmelkon View Post
For the clobber list, it doesn't seem to recognize r8, r9 and all the way to r15.
This is weird since I used r8 and r9 in my assembly code. (I did not run the code yet).
Hmm, the following compiles for me:
Code:
  /* Add the first 3 elements of t1, and put the result in x */
  float x;
  asm (
       "flds %1;\n"
       "flds %2;\n"
       "faddp;\n"
       "fadds %3;\n"
       : "=t" (x)     /* out. t is "Top of 80387 floating-point stack (%st(0))." */
       : "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
       : "st(1)"                         /* clobber */
       );
Quote:
Also, I forgot to give the other example:
If you want to operate from RAM directly
Intel style code is something like
Code:
float t1[100];      //Globally declared variable
fld dword ptr[t1]
fld dword ptr[t1+4]
fadd dword ptr[t1+8], st(0)
I'm not sure exactly how to get this offset addressing thing working, the compiler seems to prefer %rip relative instead. For example:

Code:
  /* Add the first 3 elements of t1, and put the result in x */
  float x;
  asm (
       "flds %1;\n"
       "flds %2;\n"
       "faddp;\n"
       "fadds %3;\n"
    :  "=t" (x)                      /* out */
    :  "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
    :  "st(1)"                         /* clobber */
       );
produces this disassembly:
Code:
   0x00000001400014e5 <+4>:     flds   0x6b15(%rip)        # 0x140008000 <t1>
   0x00000001400014eb <+10>:    flds   0x6b13(%rip)        # 0x140008004 <t1+4>
   0x00000001400014f1 <+16>:    faddp  %st,%st(1)
   0x00000001400014f3 <+18>:    fadds  0x6b0f(%rip)        # 0x140008008 <t1+8>
   0x00000001400014f9 <+24>:    fstps  0xc(%rsp)
(hopefully I got the suffixes right; it seems to give the right output, but I was basically just guessing until it stopped throwing warnings at me)
 
Old 03-16-2024, 04:39 PM   #8
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
I didn't even know that you could write "=t" (x).

I think you are having trouble because you wrote
: "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
If your array size is huge, that is too much work.
I would do
Code:
float Global_t1[100];
void function()
{
float x;
  asm volatile (
       "flds %0;\n"
       "flds 4%0;\n" /////Address of Global_t1 + 4 bytes
       "fadds 8%0, st(0);\n"  /////Address of Global_t1 + 8 bytes
    :  "=t" (x)                      /* out */
    :  "m" (*Global_t1) /* in */
    :  "st(1)"                         /* clobber */
       );
}
and maybe st(0) needs to be in the clobber as well.

I nicer solution is to use labels to avoid the %0, %1, %2 stuff.

Code:
float Global_t1[100];
void function()
{
float x;
  asm volatile (
       "flds %[Global_t1];\n"
       "flds 4%[Global_t1];\n" /////Address of Global_t1 + 4 bytes
       "fadds 8%[Global_t1], st(0);\n"  /////Address of Global_t1 + 8 bytes
    :  "=t" (x)                      /* out */
    :  [Global_t1] "m" (*Global_t1) /* in */
    :  "st(1)"                         /* clobber */
       );
}

Last edited by vmelkon; 03-16-2024 at 04:41 PM.
 
Old 03-16-2024, 07:45 PM   #9
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by vmelkon View Post
Code:
float x;
  asm volatile (
       "flds %0;\n"
       "flds 4%0;\n" /////Address of Global_t1 + 4 bytes
       "fadds 8%0, st(0);\n"  /////Address of Global_t1 + 8 bytes
Hmm, it compiles on godbolt.org, but on Mingw I get "Error: junk `((%rcx))' after expression" and on my Debian box I get "Error: invalid instruction suffix for `fld'".

Quote:
and maybe st(0) needs to be in the clobber as well.
The "t" is st(0), and it's already listed as an output, so it doesn't need to be in clobber.
 
Old 03-17-2024, 12:47 AM   #10
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
This one works.
x receives 70.55
if I put : [x] "=t" (x)
x gets a NAN

Code:
sint64 Test4000()
{
	sint64 returnVal;
	Global_t1[0]=57.0;
	Global_t1[1]=11.0;
	Global_t1[2]=2.55;
	float x;

	//This is 64 bit code for Linux
	asm volatile
	(
		"fld		%[Global_t1];"			//Load to register st(0)
		"fadd		4%[Global_t1];"			//Add value to what is already in st(0)
		"fadd		8%[Global_t1];"			//Add value to what is already in st(0)
		"fstp		%[x];"			//Store and pop FPU stack. Write to x
		//"fld		4%0;"
		//"faddp		%%st(1), %%st(0);"
		//"fstp		4%0;"
		//"movaps		%0, %%xmm0;"
		//"mulps		%%xmm0, %%xmm0;"
		//"movaps		%%xmm0, %0;"
		: [x] "=m" (x)
		: [Global_t1] "m" (*Global_t1)
		: "st(1)"
	);

	return returnVal;
}
I don't know what exactly is flds. It seems to be a gcc invention. There are others like fldt (https://docs.oracle.com/cd/E19455-01...0ah/index.html)
I think those aren't real x86 FPU instructions, so I replaced with fld.
 
Old 03-17-2024, 09:36 PM   #11
vmelkon
Member
 
Registered: Feb 2007
Location: Canada
Distribution: Kubuntu 22.04
Posts: 549

Original Poster
Rep: Reputation: 84
OK, if you use
if I put : [x] "=t" (x)

then
"fstp %[x];" //Store and pop FPU stack. Write to x
needs to be commented out and x receives 70.55.
 
Old 03-17-2024, 11:11 PM   #12
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by vmelkon View Post
OK, if you use
if I put : [x] "=t" (x)

then
"fstp %[x];" //Store and pop FPU stack. Write to x
needs to be commented out and x receives 70.55.
Yeah, because otherwise it will try to pop st(0) into st(0) which makes no sense.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to deal with bash[$style] arrays in a POSIX compliant script? GrapefruiTgirl Programming 8 01-02-2015 11:17 PM
puzzle about g++ compile options for inline assembly program markbeth Programming 2 09-20-2004 04:30 AM
Inline Assembly Question tjt Programming 3 08-08-2004 04:38 AM
undefined reference to ... when using inline assembly in C Annie0716 Programming 3 08-01-2004 12:50 AM
subrouine_call in gcc inline assembly sakeeb Programming 4 08-15-2002 10:22 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration