LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-08-2010, 07:44 AM   #121
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723

Quote:
Originally Posted by johnsfine View Post
Same way it know the sequence, meaning, types etc. of its parameters in a normal function (that obeys the ABI):

The author of the functions decides on the number of parameters and the types and sequence and meaning etc.

In a high level language, a lot of that info is specified by the function declaration. But there is usually still some that must be specified by comments or other documentation.

The author of a call to a function should read the comments and/or documentation to find out how to call the function.

In asm, less is specified by declarations, so more must be specified by comments and documentation.

If a function doesn't obey the ABI, a whole lot of details of its interface must be specified by comments or documentation, so it will be possible to write correct calls to that function.
I don't get it. If some functions use the stack and some use registers, it seems like they aren't following the same ABI, and won't work on other CPU architectures.

Quote:
Originally Posted by johnsfine View Post
main() in C code is compiled as an ordinary function, obeying the ABI the same way other functions do.

main() is not the starting point of a program. The starting point is usually some asm code that gcc tells the linker to link in. That code does various initialization, then calls the function main(). If main() returns, that startup code preserves the return code from main and calls some version of exit() (which is also a function, but doesn't return).
OK, so GCC creates some asm code that gets the arguments, calls main with the arguments as parameters, and then calls exit() with main()'s return value, right?

And another thing I wanted to ask, why is the size of int different on different CPUs? That just seems like asking for trouble. Why aren't C programmers using types like int8_t, uint8_t, int16_t, etc.?

Last edited by MTK358; 03-08-2010 at 07:45 AM.
 
Old 03-08-2010, 08:00 AM   #122
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
I don't get it. If some functions use the stack and some use registers, it seems like they aren't following the same ABI,
Right. Such functions would not be following the same (if any) ABI.

Quote:
and won't work on other CPU architectures.
Asm code only works on one architecture and an ABI only specifies the rules for one architecture.

In x86_64 any function that obeys the ABI will use the six registers rdi, rsi, rdx, rcx, r8 and r9 for receiving integer, pointer and some other kinds of parameters before using stack for those parameter types.

It doesn't vary between functions (if they obey the ABI). It varies between architectures.

Quote:
why is the size of int different on different CPUs? That just seems like asking for trouble. Why aren't C programmers using types like int8_t, uint8_t, int16_t, etc.?
C was invented as a low level programming language, above asm but not too far above asm. C was designed to be more portable than asm, but not to force the programmer to sacrifice efficiency for portability.

Some of the first (maybe including the very first) machines C ran on had 18 bit words and no efficient means to work with any size other than 18 bits. Making 18 bits the portable standard size int for the C language would have been absurd. But making any size other than 18 bits the size of int for those machines would also have been absurd. So the decision to have no portable standard for the number of bits in an int was the only practical choice.

Even now, many machines have a single size that is significantly more efficient for int than any other size. As long as that size is large enough for the expected uses of int on that machine, it makes sense to use it rather than some portable size that risks being much less efficient.

Last edited by johnsfine; 03-08-2010 at 08:03 AM.
 
Old 03-08-2010, 08:28 AM   #123
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by johnsfine View Post
Right. Such functions would not be following the same (if any) ABI.



Asm code only works on one architecture and an ABI only specifies the rules for one architecture.

In x86_64 any function that obeys the ABI will use the six registers rdi, rsi, rdx, rcx, r8 and r9 for receiving integer, pointer and some other kinds of parameters before using stack for those parameter types.

It doesn't vary between functions (if they obey the ABI). It varies between architectures.
I thought that the C ABI specifies that parameters are on the stack.

Quote:
Originally Posted by johnsfine View Post
C was invented as a low level programming language, above asm but not too far above asm. C was designed to be more portable than asm, but not to force the programmer to sacrifice efficiency for portability.

Some of the first (maybe including the very first) machines C ran on had 18 bit words and no efficient means to work with any size other than 18 bits. Making 18 bits the portable standard size int for the C language would have been absurd. But making any size other than 18 bits the size of int for those machines would also have been absurd. So the decision to have no portable standard for the number of bits in an int was the only practical choice.

Even now, many machines have a single size that is significantly more efficient for int than any other size. As long as that size is large enough for the expected uses of int on that machine, it makes sense to use it rather than some portable size that risks being much less efficient.
Still, why use and int when you want 8 bits? Why waste so much memory just because a machine with a smaller amount if bits might need to run it?
 
Old 03-08-2010, 08:48 AM   #124
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
I thought that the C ABI specifies that parameters are on the stack.
There is no "the C ABI". There is a different C ABI for each architecture.

C does specify some support for functions with a variable number of arguments in a way that makes the most sense for parameters passed on the stack and pushed in a sequence that depends on whether the stack grows upwards or downwards. Most architectures have stacks growing downwards, so that implies pushing the first parameter last.

The x86_64 ABI is not optimized for that variable argument support. It is optimized for the more common case that all, or at least the first few, parameters have fixed count and meaning.

The x86_64 ABI does have a few details designed in support of the variable argument features of C. The compiler can detect that case that the C code has used that variable argument support and used it in a way that the compiler can't optimize back to a register passing system. In that case, the compiler adds code within the called function to store the parameters in the right places on the stack. That makes those rare cases just slightly less efficient than they would have been in a fully stack based architecture, while all the common cases are more efficient.

Quote:
Still, why use and int when you want 8 bits? Why waste so much memory just because a machine with a smaller amount if bits might need to run it?
Code:
for (unsigned int i=0; i<100; ++i) {...}
If you chose an 8 bit (or even 7 bit) type for i in the above code, would that save memory? 7 bits is plenty to hold any number from 0 to 100.

If you learn asm, you will more easily know that for most architectures and most reasonable uses of i in the {...} code, you save significant code size by having i be the natural integer size of the architecture. Maybe there is a memory cost of sizeof(int)-1 to use N bytes instead of 1 to store i. If so, that would be trivial compared to the code size likely saved. But in practice there is usually no such cost. The optimizer probably keeps i in a register anyway.

Contrast that with
Code:
int small_numbers[1000000];  /*each number is 0..100 */
... do systematic things with all of those numbers ...
If we chose a smaller type instead of int for that array, we would probably cause the code that manipulates the array to be larger and slightly slower (in internal CPU processing). But it still would be a good idea to pick a smaller data type, because the memory savings from making 1000000 elements smaller dwarfs the memory cost of the code to process it. Also, the cache misses from accessing the larger array dwarf the internal CPU speed differences of the code, so a smaller data type would make it run faster.

Programming involves estimating those tradeoffs.

Last edited by johnsfine; 03-08-2010 at 08:56 AM.
 
1 members found this post helpful.
Old 03-08-2010, 09:17 AM   #125
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:
Attached Files
File Type: txt c-func-call.txt (1.9 KB, 12 views)
 
Old 03-08-2010, 09:27 AM   #126
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:
No. I'm saying x86 is not the same architecture as x86_64. I have said that several times and ways in this thread, despite getting some disagreement from others.

You were looking at a book that taught asm and ABI for 32 bit x86, so I answered questions based on that architecture.

I posted an example doing a similar function (integer to an integer power) in a different architecture (x86_64). So I answered questions about that architecture.

The architecture is different, so the answers are different.

You have a Linux system in which either of those two architectures can run with full native speed (not emulation). So you can use examples of either.

The instruction sets are very similar and look even more similar than they actually are. So you need to be careful when looking at any example that you know which of those two architectures it is.

Last edited by johnsfine; 03-08-2010 at 09:31 AM.
 
Old 03-08-2010, 09:28 AM   #127
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
So x86 uses the stack and x86_64 uses registers?
 
Old 03-08-2010, 09:32 AM   #128
Mol_Bolom
Member
 
Registered: Nov 2008
Location: S.W. Kansas
Distribution: Slackware64 14.0 / 14.2
Posts: 245
Blog Entries: 2

Rep: Reputation: 41
Quote:
Originally Posted by MTK358 View Post
But I told you how I understood the x86 C ABI , and it used the stack to pass parameters, and you said it was correct!?!

Please explain to me how it works.

This it what I posted before and you said it was correct, and now you are saying that it's not:
Write a few simple programs in C, then use a debugger to disassemble the program and see how the compiler handled the code.

When I began I had used asm with C, and it was debugging the code that showed me how to do it. So I think this would be a great help for you.
 
Old 03-08-2010, 09:45 AM   #129
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Mol_Bolom View Post
Write a few simple programs in C, then use a debugger to disassemble the program and see how the compiler handled the code.

When I began I had used asm with C, and it was debugging the code that showed me how to do it. So I think this would be a great help for you.
How to do that?
 
Old 03-08-2010, 09:51 AM   #130
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
So x86 uses the stack and x86_64 uses registers?
x86 (32 bit) uses the stack for parameter passing.

x86_64 uses registers for passing a small number of parameters and uses both registers and the stack when passing a large number.

Those rules are simple when passing just integers (any size 1 to 64 bits) pointers and (C++) references. The first six go in registers, the rest on the stack.

But if some of the parameters are floats or doubles or structs passed by value etc., then I don't have the rules memorized. Some cases are complicated enough that it is a good idea to code a call to a function with the same signature in C or C++ and then compile it with gcc -S and see how the parameters are passed.

Notice Mol_Bolom's related suggestion to look at the asm code from a C compile using gdb. In most cases that is harder, even if you know gdb well, than using gcc -S. For me it is a lot harder because I don't remember gdb commands and need to look them up every time I use them.

Here is a fun example:

Code:
   printf("%f %d %f %d\n", 1.5, 2, 4, 3.5);
Compile that (inside a program) in x86_64 with options such that gcc will ignore the bug in the code and make an executable anyway. Then run it. The output is:
1.500000 2 3.500000 4

If you look at the generated .s file from gcc -S you will understand why the 4 and 3.5 got reversed.

Last edited by johnsfine; 03-08-2010 at 10:05 AM.
 
Old 03-08-2010, 10:06 AM   #131
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I tried this:

Code:
#include <stdio.h>

int main() {
	puts("Hello");
}
And it compiled into this:

Code:
	.file	"asm-c.c"
	.section	.rodata
.LC0:
	.string	"Hello"
	.text
.globl main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	leave
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.4.3"
	.section	.note.GNU-stack,"",@progbits
And I really hardly understand any of it.
 
Old 03-08-2010, 10:10 AM   #132
Mol_Bolom
Member
 
Registered: Nov 2008
Location: S.W. Kansas
Distribution: Slackware64 14.0 / 14.2
Posts: 245
Blog Entries: 2

Rep: Reputation: 41
Quote:
Originally Posted by johnsfine View Post
Notice Mol_Bolom's related suggestion to look at the asm code from a C compile using gdb. In most cases that is harder, even if you know gdb well, than using gcc -S. For me it is a lot harder because I don't remember gdb commands and need to look them up every time I use them.
For his arch, isn't gdb the only compiler that would allow him to do so?

<Edit>
 
Old 03-08-2010, 10:13 AM   #133
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I tried this:

Code:
#include <stdio.h>

int main() {
	puts("Hello");
}
And it compiled into this:

Code:
	.file	"asm-c.c"
	.section	.rodata
.LC0:
	.string	"Hello"
	.text
.globl main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	leave
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.4.3"
	.section	.note.GNU-stack,"",@progbits
And I really hardly understand any of it.
You don't even understand items in red ?
 
Old 03-08-2010, 10:15 AM   #134
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I have no idea what the items in red mean.

I do understand what "call puts" does, though.
 
Old 03-08-2010, 10:19 AM   #135
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
There are a lot of directives in there giving extra info to the linker and/or debugger and/or I don't know what.

I don't know this assembler syntax well enough to know what all that stuff means. I just know which of it I can ignore when I want to understand the compiler output and which of it I can skip when I want to write a function in asm myself.

The parts you should understand are:

Code:
	.section	.rodata
That means the following lines should generate read only data.
Code:
.LC0:
that names the current address (which is then used below).
Code:
	.string	"Hello"
That deposits the string "Hello" in data memory.
Code:
	.text
That means the following lines should generate program code (so we're no longer generating data).
Code:
.globl main
That means the label main will be passed to the linker so other modules can reference it.
Code:
main:
That names the current address main.
Code:
	pushq	%rbp
	movq	%rsp, %rbp
Conventional prolog code, used on entry to a function that uses rbp in the conventional way.
Code:
	movl	$.LC0, %edi
Move the address of "Hello" (see above) into register rdi, which is the first parameter of the function soon to be called.
Code:
	call	puts
Call that function
Code:
	leave
Conventional epilog code for a function that uses rbp in the conventional way.
Code:
	ret
return to the caller of main.

What could be simpler

Last edited by johnsfine; 03-08-2010 at 10:22 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ASM or C++? Hb_Kai Programming 16 01-20-2010 09:12 AM
Is ASM dangerous? MrCode Programming 37 11-18-2009 08:29 AM
ASM x32 vs ASM x64 Tegramon Programming 3 02-27-2008 02:26 PM
I/O in ASM Mercurius Programming 10 11-16-2006 07:02 PM
ASM question zWaR Programming 2 06-26-2004 11:42 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration