[SOLVED] Inline assembly - How to deal with arrays in AT&T style?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Inline assembly - How to deal with arrays in AT&T style?
Hello all,
I know that there are many choices in the world of programming ...
In my case, I have some code that I had written in Intel style for Microsoft VC++ 6. It uses 32 bit addresses. It uses some ordinary x86 instructions and also MMX at some places and SSE at some places.
On Linux, I use Qt Creator as my IDE. I think that underneath it, it uses the g++ compiler.
Step 1: I updated the code to use 64 bit addresses.
Step 2: There are more registers. So I did minor changes to use more registers.
Step 3: Compile the Intel code under Linux? Some people mention using a compiler flag for gcc. I did not do this.
Step 4: So, I learned AT&T style. I learned asm extended assembly for gcc and went ahead and converted from Intel Style to AT&T style. (Maybe I’m nuts?)
Step 5: That %0, %1, %2 stuff. Oh boy! Too late did I learn that it is possible to use labels.
MY MAIN QUESTION:
In my Intel style code, I have
Code:
addps xmm0, xmmword ptr[Global_NfloatArray]
where Global_NfloatArray is some float array.
In AT&T style
Code:
addps %0, %%xmm0;
where %0 represents Global_NfloatArray.
It compiles but doesn’t work.
I think it is because it copies the address to xmm0 instead of referencing the RAM pointed to by pointer Global_NfloatArray
After a week of searching, I found the solution.
Apparently, if you have an array that is global, there is something special you need to do.
You need to write (*Global_t1)
For example:
Code:
sint64 Test4000()
{
sint64 returnVal;
Global_t1[0]=57.0;
Global_t1[1]=11.0;
//This is 64 bit code for Linux
asm volatile
(
"movaps %0, %%xmm0;"
"mulps %%xmm0, %%xmm0;"
"movaps %%xmm0, %0;"
:
: "m" (*Global_t1)
:
);
return returnVal;
}
Apparently, if you have an array that is global, there is something special you need to do.
You need to write (*Global_t1)
Hmm, I get the same code output with (Global_t1) as with (*Global_t1). I would suggest using register constraints and then letting the compiler figure out how to move the data instead of hard-coding movaps though:
Code:
asm ("mulps %0, %0\n"
: "+v" (Global_t1) /* + means read/write, v means "Any EVEX encodable SSE
register (%xmm0-%xmm31)." */);
I find that variable constraint thing confusing.There is "r" and "+r" and "g" and various codes.
Some codes seem to be suggestions for loading into a register like "a" is suppose to mean rax.
Oh, I saw 2 float in your example and assumed that was the whole array. Actually, now I'm seeing my suggestion only works with an array up to size 4 (and the size has to be visible at compile time). So it's probably not what you want anyway.
Your code in #2 looks good except that the constraints you put don't tell the compiler that you are using xmm0, and reading from Global_t1. I would update it like this:
Code:
asm (
"movaps %0, %%xmm0;\n"
"mulps %%xmm0, %%xmm0;\n"
"movaps %%xmm0, %0"
: "+m" (*Global_t1) /* out (+ means also read as input) */
: /* in */
: "xmm0" /* clobber (tell compiler we are overwriting xmm0) */
);
For the clobber list, it doesn't seem to recognize r8, r9 and all the way to r15.
This is weird since I used r8 and r9 in my assembly code. (I did not run the code yet).
Also, I forgot to give the other example:
If you want to operate from RAM directly
Intel style code is something like
For the clobber list, it doesn't seem to recognize r8, r9 and all the way to r15.
This is weird since I used r8 and r9 in my assembly code. (I did not run the code yet).
Hmm, the following compiles for me:
Code:
/* Add the first 3 elements of t1, and put the result in x */
float x;
asm (
"flds %1;\n"
"flds %2;\n"
"faddp;\n"
"fadds %3;\n"
: "=t" (x) /* out. t is "Top of 80387 floating-point stack (%st(0))." */
: "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
: "st(1)" /* clobber */
);
Quote:
Also, I forgot to give the other example:
If you want to operate from RAM directly
Intel style code is something like
I'm not sure exactly how to get this offset addressing thing working, the compiler seems to prefer %rip relative instead. For example:
Code:
/* Add the first 3 elements of t1, and put the result in x */
float x;
asm (
"flds %1;\n"
"flds %2;\n"
"faddp;\n"
"fadds %3;\n"
: "=t" (x) /* out */
: "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
: "st(1)" /* clobber */
);
I think you are having trouble because you wrote
: "m" (t1), "m"(t1[1]), "m"(t1[2]) /* in */
If your array size is huge, that is too much work.
I would do
Hmm, it compiles on godbolt.org, but on Mingw I get "Error: junk `((%rcx))' after expression" and on my Debian box I get "Error: invalid instruction suffix for `fld'".
Quote:
and maybe st(0) needs to be in the clobber as well.
The "t" is st(0), and it's already listed as an output, so it doesn't need to be in clobber.
This one works.
x receives 70.55
if I put : [x] "=t" (x)
x gets a NAN
Code:
sint64 Test4000()
{
sint64 returnVal;
Global_t1[0]=57.0;
Global_t1[1]=11.0;
Global_t1[2]=2.55;
float x;
//This is 64 bit code for Linux
asm volatile
(
"fld %[Global_t1];" //Load to register st(0)
"fadd 4%[Global_t1];" //Add value to what is already in st(0)
"fadd 8%[Global_t1];" //Add value to what is already in st(0)
"fstp %[x];" //Store and pop FPU stack. Write to x
//"fld 4%0;"
//"faddp %%st(1), %%st(0);"
//"fstp 4%0;"
//"movaps %0, %%xmm0;"
//"mulps %%xmm0, %%xmm0;"
//"movaps %%xmm0, %0;"
: [x] "=m" (x)
: [Global_t1] "m" (*Global_t1)
: "st(1)"
);
return returnVal;
}
I don't know what exactly is flds. It seems to be a gcc invention. There are others like fldt (https://docs.oracle.com/cd/E19455-01...0ah/index.html)
I think those aren't real x86 FPU instructions, so I replaced with fld.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.