LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Observation of Assembly code produced by GCC 4.3.2 for various -march parameters (https://www.linuxquestions.org/questions/programming-9/observation-of-assembly-code-produced-by-gcc-4-3-2-for-various-march-parameters-678957/)

Mr_Shameless 10-25-2008 04:49 AM

Observation of Assembly code produced by GCC 4.3.2 for various -march parameters
 
Hi,

I did this out of curiousity and found out something that I could not explain.

I tested this on Mandriva 2009, gcc 4.3.2

Here is the source code of the C program:
Code:

#include <stdio.h>

int main(void)
{
        int sum = 0;
        int i;
        for(i = 1; i < 5; ++i)
        {
                sum += i;
        }
       
        return 0;
}

I first tried compiling the code with
Code:

~$ gcc -g -march=i386 main.c
, then used gdb to observe the program:
Code:

~$ gdb -q a.out
(gdb) disassemble main

Here is what I obtained for i386: (notice the line in bold)
Code:

Dump of assembler code for function main:
0x08048344 <main+0>:    lea    ecx,[esp+0x4]
0x08048348 <main+4>:    and    esp,0xfffffff0
0x0804834b <main+7>:    push  DWORD PTR [ecx-0x4]
0x0804834e <main+10>:  push  ebp
0x0804834f <main+11>:  mov    ebp,esp
0x08048351 <main+13>:  push  ecx
0x08048352 <main+14>:  sub    esp,0x10
0x08048355 <main+17>:  mov    DWORD PTR [ebp-0xc],0x0
0x0804835c <main+24>:  mov    DWORD PTR [ebp-0x8],0x1
0x08048363 <main+31>:  jmp    0x804836e <main+42>
0x08048365 <main+33>:  mov    eax,DWORD PTR [ebp-0x8]
0x08048368 <main+36>:  add    DWORD PTR [ebp-0xc],eax
0x0804836b <main+39>:  inc    DWORD PTR [ebp-0x8]
0x0804836e <main+42>:  cmp    DWORD PTR [ebp-0x8],0x4
0x08048372 <main+46>:  jle    0x8048365 <main+33>
0x08048374 <main+48>:  mov    eax,0x0
0x08048379 <main+53>:  add    esp,0x10
0x0804837c <main+56>:  pop    ecx
0x0804837d <main+57>:  leave 
0x0804837e <main+58>:  lea    esp,[ecx-0x4]
0x08048381 <main+61>:  ret   
End of assembler dump.

Doing the same thing again with i686:
Code:

gcc -g -march=i686 main.c
, I obtained:
Code:

Dump of assembler code for function main:
0x08048344 <main+0>:    lea    ecx,[esp+0x4]
0x08048348 <main+4>:    and    esp,0xfffffff0
0x0804834b <main+7>:    push  DWORD PTR [ecx-0x4]
0x0804834e <main+10>:  push  ebp
0x0804834f <main+11>:  mov    ebp,esp
0x08048351 <main+13>:  push  ecx
0x08048352 <main+14>:  sub    esp,0x10
0x08048355 <main+17>:  mov    DWORD PTR [ebp-0xc],0x0
0x0804835c <main+24>:  mov    DWORD PTR [ebp-0x8],0x1
0x08048363 <main+31>:  jmp    0x804836f <main+43>
0x08048365 <main+33>:  mov    eax,DWORD PTR [ebp-0x8]
0x08048368 <main+36>:  add    DWORD PTR [ebp-0xc],eax
0x0804836b <main+39>:  add    DWORD PTR [ebp-0x8],0x1
0x0804836f <main+43>:  cmp    DWORD PTR [ebp-0x8],0x4
0x08048373 <main+47>:  jle    0x8048365 <main+33>
0x08048375 <main+49>:  mov    eax,0x0
0x0804837a <main+54>:  add    esp,0x10
0x0804837d <main+57>:  pop    ecx
0x0804837e <main+58>:  pop    ebp
0x0804837f <main+59>:  lea    esp,[ecx-0x4]
0x08048382 <main+62>:  ret   
End of assembler dump.

Notice that the line in bold is different for different CPUs I compile for. I tested with i386, i686, pentium3, pentium4, prescott, and core2. My result can be summarized below:
Code:

        i386:                0x0804836b <main+39>:  inc    DWORD PTR [ebp-0x8]
        i686:                0x0804836b <main+39>:  add    DWORD PTR [ebp-0x8],0x1
        pentium3:        0x0804836b <main+39>:  inc    DWORD PTR [ebp-0x8]
        pentium4:        0x0804836b <main+39>:  add    DWORD PTR [ebp-0x8],0x1
        prescott:        0x0804836b <main+39>:  add    DWORD PTR [ebp-0x8],0x1       
        core2:                0x0804836b <main+39>:  inc    DWORD PTR [ebp-0x8]

Here are my questions:
  1. What is the difference between the ADD and INC instructions? Is one faster than another?
  2. Assume there is a faster of the two, then why is INC used for i386, then replaced with ADD in i686, then used again for pentium3, then replaced in pentium4, then used again for core2?

Thank you very much :)

pinniped 10-25-2008 05:54 AM

Hmm... that's amusing. To see if ADD (1) is faster than INC, you will need to look at the datasheet for each CPU (don't assume that there were no changes in the design). Other subtle differences may lie in how the flags (including overflow and carry) registers are affected by the two instructions.


All times are GMT -5. The time now is 10:34 PM.