How to learn ASM

MTK358 · 03-08-2010, 07:44 AM

Quote:

Originally Posted by johnsfine

Same way it know the sequence, meaning, types etc. of its parameters in a normal function (that obeys the ABI):

The author of the functions decides on the number of parameters and the types and sequence and meaning etc.

In a high level language, a lot of that info is specified by the function declaration. But there is usually still some that must be specified by comments or other documentation.

The author of a call to a function should read the comments and/or documentation to find out how to call the function.

In asm, less is specified by declarations, so more must be specified by comments and documentation.

If a function doesn't obey the ABI, a whole lot of details of its interface must be specified by comments or documentation, so it will be possible to write correct calls to that function.

I don't get it. If some functions use the stack and some use registers, it seems like they aren't following the same ABI, and won't work on other CPU architectures.

Quote:

Originally Posted by johnsfine

main() in C code is compiled as an ordinary function, obeying the ABI the same way other functions do.

main() is not the starting point of a program. The starting point is usually some asm code that gcc tells the linker to link in. That code does various initialization, then calls the function main(). If main() returns, that startup code preserves the return code from main and calls some version of exit() (which is also a function, but doesn't return).

OK, so GCC creates some asm code that gets the arguments, calls main with the arguments as parameters, and then calls exit() with main()'s return value, right?

And another thing I wanted to ask, why is the size of int different on different CPUs? That just seems like asking for trouble. Why aren't C programmers using types like int8_t, uint8_t, int16_t, etc.?

johnsfine · 03-08-2010, 08:00 AM

Quote:

Originally Posted by MTK358

I don't get it. If some functions use the stack and some use registers, it seems like they aren't following the same ABI,

Right. Such functions would not be following the same (if any) ABI.

Quote:

and won't work on other CPU architectures.

Asm code only works on one architecture and an ABI only specifies the rules for one architecture.

In x86_64 any function that obeys the ABI will use the six registers rdi, rsi, rdx, rcx, r8 and r9 for receiving integer, pointer and some other kinds of parameters before using stack for those parameter types.

It doesn't vary between functions (if they obey the ABI). It varies between architectures.

Quote:

why is the size of int different on different CPUs? That just seems like asking for trouble. Why aren't C programmers using types like int8_t, uint8_t, int16_t, etc.?

C was invented as a low level programming language, above asm but not too far above asm. C was designed to be more portable than asm, but not to force the programmer to sacrifice efficiency for portability.

Some of the first (maybe including the very first) machines C ran on had 18 bit words and no efficient means to work with any size other than 18 bits. Making 18 bits the portable standard size int for the C language would have been absurd. But making any size other than 18 bits the size of int for those machines would also have been absurd. So the decision to have no portable standard for the number of bits in an int was the only practical choice.

Even now, many machines have a single size that is significantly more efficient for int than any other size. As long as that size is large enough for the expected uses of int on that machine, it makes sense to use it rather than some portable size that risks being much less efficient.

MTK358 · 03-08-2010, 08:28 AM

Quote:

Originally Posted by johnsfine

Right. Such functions would not be following the same (if any) ABI.

Asm code only works on one architecture and an ABI only specifies the rules for one architecture.

In x86_64 any function that obeys the ABI will use the six registers rdi, rsi, rdx, rcx, r8 and r9 for receiving integer, pointer and some other kinds of parameters before using stack for those parameter types.

It doesn't vary between functions (if they obey the ABI). It varies between architectures.

I thought that the C ABI specifies that parameters are on the stack.

Quote:

Originally Posted by johnsfine

C was invented as a low level programming language, above asm but not too far above asm. C was designed to be more portable than asm, but not to force the programmer to sacrifice efficiency for portability.

Some of the first (maybe including the very first) machines C ran on had 18 bit words and no efficient means to work with any size other than 18 bits. Making 18 bits the portable standard size int for the C language would have been absurd. But making any size other than 18 bits the size of int for those machines would also have been absurd. So the decision to have no portable standard for the number of bits in an int was the only practical choice.

Even now, many machines have a single size that is significantly more efficient for int than any other size. As long as that size is large enough for the expected uses of int on that machine, it makes sense to use it rather than some portable size that risks being much less efficient.

Still, why use and int when you want 8 bits? Why waste so much memory just because a machine with a smaller amount if bits might need to run it?

johnsfine · 03-08-2010, 08:48 AM

Quote:

Originally Posted by MTK358

I thought that the C ABI specifies that parameters are on the stack.

There is no "the C ABI". There is a different C ABI for each architecture.

C does specify some support for functions with a variable number of arguments in a way that makes the most sense for parameters passed on the stack and pushed in a sequence that depends on whether the stack grows upwards or downwards. Most architectures have stacks growing downwards, so that implies pushing the first parameter last.

The x86_64 ABI is not optimized for that variable argument support. It is optimized for the more common case that all, or at least the first few, parameters have fixed count and meaning.

The x86_64 ABI does have a few details designed in support of the variable argument features of C. The compiler can detect that case that the C code has used that variable argument support and used it in a way that the compiler can't optimize back to a register passing system. In that case, the compiler adds code within the called function to store the parameters in the right places on the stack. That makes those rare cases just slightly less efficient than they would have been in a fully stack based architecture, while all the common cases are more efficient.

Quote:

Still, why use and int when you want 8 bits? Why waste so much memory just because a machine with a smaller amount if bits might need to run it?

Code:

for (unsigned int i=0; i<100; ++i) {...}

If you chose an 8 bit (or even 7 bit) type for i in the above code, would that save memory? 7 bits is plenty to hold any number from 0 to 100.

If you learn asm, you will more easily know that for most architectures and most reasonable uses of i in the {...} code, you save significant code size by having i be the natural integer size of the architecture. Maybe there is a memory cost of sizeof(int)-1 to use N bytes instead of 1 to store i. If so, that would be trivial compared to the code size likely saved. But in practice there is usually no such cost. The optimizer probably keeps i in a register anyway.

Contrast that with

Code:

int small_numbers[1000000];  /*each number is 0..100 */
... do systematic things with all of those numbers ...

If we chose a smaller type instead of int for that array, we would probably cause the code that manipulates the array to be larger and slightly slower (in internal CPU processing). But it still would be a good idea to pick a smaller data type, because the memory savings from making 1000000 elements smaller dwarfs the memory cost of the code to process it. Also, the cache misses from accessing the larger array dwarf the internal CPU speed differences of the code, so a smaller data type would make it run faster.

Programming involves estimating those tradeoffs.

MTK358 · 03-08-2010, 09:17 AM

But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:

johnsfine · 03-08-2010, 09:27 AM

Quote:

Originally Posted by MTK358

But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:

No. I'm saying x86 is not the same architecture as x86_64. I have said that several times and ways in this thread, despite getting some disagreement from others.

You were looking at a book that taught asm and ABI for 32 bit x86, so I answered questions based on that architecture.

I posted an example doing a similar function (integer to an integer power) in a different architecture (x86_64). So I answered questions about that architecture.

The architecture is different, so the answers are different.

You have a Linux system in which either of those two architectures can run with full native speed (not emulation). So you can use examples of either.

The instruction sets are very similar and look even more similar than they actually are. So you need to be careful when looking at any example that you know which of those two architectures it is.

MTK358 · 03-08-2010, 09:28 AM

So x86 uses the stack and x86_64 uses registers?

Mol_Bolom · 03-08-2010, 09:32 AM

Quote:

Originally Posted by MTK358

But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:

Write a few simple programs in C, then use a debugger to disassemble the program and see how the compiler handled the code.

When I began I had used asm with C, and it was debugging the code that showed me how to do it. So I think this would be a great help for you.

MTK358 · 03-08-2010, 09:45 AM

Quote:

Originally Posted by Mol_Bolom

Write a few simple programs in C, then use a debugger to disassemble the program and see how the compiler handled the code.

When I began I had used asm with C, and it was debugging the code that showed me how to do it. So I think this would be a great help for you.

How to do that?

johnsfine · 03-08-2010, 09:51 AM

Quote:

Originally Posted by MTK358

So x86 uses the stack and x86_64 uses registers?

x86 (32 bit) uses the stack for parameter passing.

x86_64 uses registers for passing a small number of parameters and uses both registers and the stack when passing a large number.

Those rules are simple when passing just integers (any size 1 to 64 bits) pointers and (C++) references. The first six go in registers, the rest on the stack.

But if some of the parameters are floats or doubles or structs passed by value etc., then I don't have the rules memorized. Some cases are complicated enough that it is a good idea to code a call to a function with the same signature in C or C++ and then compile it with gcc -S and see how the parameters are passed.

Notice Mol_Bolom's related suggestion to look at the asm code from a C compile using gdb. In most cases that is harder, even if you know gdb well, than using gcc -S. For me it is a lot harder because I don't remember gdb commands and need to look them up every time I use them.

Here is a fun example:

Code:

   printf("%f %d %f %d\n", 1.5, 2, 4, 3.5);

Compile that (inside a program) in x86_64 with options such that gcc will ignore the bug in the code and make an executable anyway. Then run it. The output is:
1.500000 2 3.500000 4

If you look at the generated .s file from gcc -S you will understand why the 4 and 3.5 got reversed.

MTK358 · 03-08-2010, 10:06 AM

I tried this:

Code:

#include <stdio.h>

int main() {
	puts("Hello");
}

And it compiled into this:

Code:

	.file	"asm-c.c"
	.section	.rodata
.LC0:
	.string	"Hello"
	.text
.globl main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	leave
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.4.3"
	.section	.note.GNU-stack,"",@progbits

And I really hardly understand any of it.

Mol_Bolom · 03-08-2010, 10:10 AM

Quote:

Originally Posted by johnsfine

Notice Mol_Bolom's related suggestion to look at the asm code from a C compile using gdb. In most cases that is harder, even if you know gdb well, than using gcc -S. For me it is a lot harder because I don't remember gdb commands and need to look them up every time I use them.

For his arch, isn't gdb the only compiler that would allow him to do so?

<Edit>

Sergei Steshenko · 03-08-2010, 10:13 AM

Quote:

Originally Posted by MTK358

I tried this:

Code:

#include <stdio.h>

int main() {
	puts("Hello");
}

And it compiled into this:

Code:

	.file	"asm-c.c"
	.section	.rodata
.LC0:
	.string	"Hello"
	.text
.globl main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	leave
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.4.3"
	.section	.note.GNU-stack,"",@progbits

And I really hardly understand any of it.

You don't even understand items in red ?

MTK358 · 03-08-2010, 10:15 AM

I have no idea what the items in red mean.

I do understand what "call puts" does, though.

johnsfine · 03-08-2010, 10:19 AM

There are a lot of directives in there giving extra info to the linker and/or debugger and/or I don't know what.

I don't know this assembler syntax well enough to know what all that stuff means. I just know which of it I can ignore when I want to understand the compiler output and which of it I can skip when I want to write a function in asm myself.

The parts you should understand are:

Code:

	.section	.rodata

That means the following lines should generate read only data.

Code:

.LC0:

that names the current address (which is then used below).

Code:

	.string	"Hello"

That deposits the string "Hello" in data memory.

Code:

	.text

That means the following lines should generate program code (so we're no longer generating data).

Code:

.globl main

That means the label main will be passed to the linker so other modules can reference it.

Code:

main:

That names the current address main.

Code:

	pushq	%rbp
	movq	%rsp, %rbp

Conventional prolog code, used on entry to a function that uses rbp in the conventional way.

Code:

	movl	$.LC0, %edi

Move the address of "Hello" (see above) into register rdi, which is the first parameter of the function soon to be called.

Code:

	call	puts

Call that function

Code:

	leave

Conventional epilog code for a function that uses rbp in the conventional way.

Code:

ret

return to the caller of main.

What could be simpler