How to learn ASM

MTK358 · 03-04-2010, 07:12 PM

You posted so much that I am not going to go through the trouble of quoting it.
Can you just talk about stuff that I can still understand and reply to and not bombard me with so much of it?

I have a general idea of what registers are. It would be nice to have a reference of them, though.

What is a stack-based machine?

I don't understand what the different versions of assembler are. I just care that I can use the GNU Assembler.

I would like to start with the modern stuff and Linux system calls.

I don't have access to 8086 and similar machines. Maybe there is some way to emulate them?

I don't understand the differences between x86 and x86_64 asm, except that they are 32 and 64 bit. And I have no clue what an "ABI" is.

Sergei Steshenko · 03-04-2010, 07:18 PM

Quote:

Originally Posted by MTK358

You posted so much that I am not going to go through the trouble of quoting it.
Can you just talk about stuff that I can still understand and reply to and not bombard me with so much of it?

I have a general idea of what registers are. It would be nice to have a reference of them, though.

What is a stack-based machine?

I don't understand what the different versions of assembler are. I just care that I can use the GNU Assembler.

I would like to start with the modern stuff and Linux system calls.

I don't have access to 8086 and similar machines. Maybe there is some way to emulate them?

I don't understand the differences between x86 and x86_64 asm, except that they are 32 and 64 bit. And I have no clue what an "ABI" is.

Just start doing WEB search.

For starters, here is on stack machine:
http://en.wikipedia.org/wiki/Stack_machine .

ABI == Application Binary Interface - a set of rules/conventions which ensures compatibility at binary level, i.e. , for example, possiblity to link together a number of object files in a manner the linked code will really work regarding, say, parameter passing. For example, Pascal and "C" pass parameters through stack "the opposite" way.

FWIW, there is GNU Pascal, so use it too to understand how a high level language translates into assembly.

Sergei Steshenko · 03-04-2010, 07:31 PM

By the way, MTK358 - a good start WRT stack and function arguments passing is to understand how/why the very well known *printf family of functions works. I.e. the functions can cope with variable number of arguments. It's not a compiler trick - the functions really accept variable number of arguments at runtime; a friendly compiler helps to identify wrong WRT the format string number of arguments.

johnsfine · 03-04-2010, 07:33 PM

Quote:

Originally Posted by MTK358

Can you just talk about stuff that I can still understand and reply to and not bombard me with so much of it?

Sorry. That is hard to achieve when multiple people who seriously disagree with each other are providing advice.

Quote:

I have a general idea of what registers are. It would be nice to have a reference of them, though.

You should understand that eax, ebx, ecx, edx, esi, edi, ebp and esp are each 32 bit parts of the CPU that can each hold a 32 bit integer or pointer.

You should understand that almost all operations on integers or pointers are done in those registers.

That is enough to get started with.

Quote:

What is a stack-based machine?

You should understand what a stack is. You should understand that the esp register always points to the newest allocation on the stack.

You should not worry about abstractions like "stack-based machine" nor what parts (if any) of x86 asm are fundamental aspects of asm vs. specific to x86. You are learning just x86, maybe 32 bit, maybe 64 bit (if you follow my advice), maybe both.

Quote:

I don't understand what the different versions of assembler are. I just care that I can use the GNU Assembler.

Good.

Quote:

I would like to start with the modern stuff

Good.

Quote:

and Linux system calls.

Bad, but you make your own choices.

Quote:

I don't have access to 8086 and similar machines. Maybe there is some way to emulate them?

It's buried in your CPU. It can be accessed, but don't worry about that. You don't want to learn 16 bit x86 now.

Quote:

I don't understand the differences between x86 and x86_64 asm, except that they are 32 and 64 bit.

That's why I started to explain some of the other differences, such as having more registers in x86_64, and having more generalized (fewer special rules and restrictions) addressing modes. And I explained the similarities, such as almost identical instruction sets.

Quote:

And I have no clue what an "ABI" is.

Application Binary Interface. That means a set of rules for one piece of software to call another. There is an ABI (the one I was talking about) for any program or function to call any function. There is another ABI (I don't know it) for a user space process to call the Linux kernel.

MTK358 · 03-04-2010, 07:46 PM

Quote:

You should understand that eax, ebx, ecx, edx, esi, edi, ebp and esp are each 32 bit parts of the CPU that can each hold a 32 bit integer or pointer.

OK, so %ebp is just another general-purpose register?

And %esp is the pointer to the top of the stack (which is actually upside-down), and changing %esp will change where the CPU thinks the top of the stack is?

And why are Linux system calls bad? Won't I need to be able to do file operations and memory allocation, etc.?

smeezekitty · 03-04-2010, 08:46 PM

Quote:

OK, so %ebp is just another general-purpose register?

No, ebp and bp are the Base Pointer.
It is used for memory access in C programs.

resetreset · 03-05-2010, 02:32 AM

One asm can't be simpler than the other - it's the same CHIP! Under Linux, you'll obviously get Linux's system calls to play with, and there will be a LOT to learn if you're using those.
Under Linux, you're in protected mode, so I don't think, knowing Linux, that you'll have to reboot the computer if your proggy crashes. So you have that layer of protection, but in DOS - the whole machine is to yourself which doesn't give you that layer, but it is a much more *liberating* experience. Challenge yourself - when the pain of rebooting gets to you, you'll prolly work harder to write a program which DOESN'T crash.

MTK358 · 03-05-2010, 06:28 AM

@smeezekitty

Cat it just be used to store whatever value, or if you use a different calling convention?

I just don't understand whether it's just a general storage place that is traditionally used for this purpose, or does it have any special significance in the CPU's architecture?

@resetreset

I don't even have an old DOS box. And why do you want me to go back to M$ again?

If I wanted to go that route, doesn't it seem better to use an 8080 emulator or something?

johnsfine · 03-05-2010, 08:26 AM

Quote:

Originally Posted by MTK358

OK, so %ebp is just another general-purpose register?

Correct. There are a few special instructions in the instruction set and a few optimizations in the addressing modes that provide some hardware support for the conventional use of ebp as a frame pointer (a pointer to a specific place within the current function's stack use). But that use is still just a convention.

Even the C and C++ calling standards (ABIs) do not require you to use ebp in the conventional way. You can use ebp for any pointer or integer you like.

The debugger is a lot more likely to get confused if you don't use ebp in the conventional way. But if you know asm, you will know how to debug even when the debugger is confused. Also hopefully, you won't need to debug every function you write.

Quote:

And %esp is the pointer to the top of the stack (which is actually upside-down), and changing %esp will change where the CPU thinks the top of the stack is?

Correct. The instruction set would let you put any other pointer or integer into esp and use it as a general register rather than a stack pointer. But unlike the situation with ebp, using esp in an unconventional way would almost certainly mess up or crash your program. Far too many things assume esp is used always and only as the stack pointer.

Quote:

And why are Linux system calls bad? Won't I need to be able to do file operations and memory allocation, etc.?

That comment was part of my viewpoint that learning or using asm is better when mixing asm with C rather than writing whole programs in asm.

If the program is in C and the most important computational functions are in asm, you probably don't want to do I/O from the asm code anyway.

But if you do want to do I/O from asm code, hopefully you have learned how to call functions from asm when those functions obey the C ABI. That means you have the same complete library of functions (fopen, fprintf, write, etc.) available to your asm programs that you already know how to use in C.

Using all those functions for their intended purposes leaves you free to focus on the interesting/useful aspects of asm, rather than needing to learn a harder and less portable way of doing the I/O that you already know how to do easily any portably.

Quote:

Originally Posted by resetreset

One asm can't be simpler than the other - it's the same CHIP!

Why does being in the same chip imply anything like that? (It doesn't).

An x86_64 chip supports three fundamentally different machine/asm languages (16 bit, 32 bit and 64 bit x86). Within 16 bit and 32 bit there are also some seriously different modes that would almost qualify as additional different machine languages.

Quote:

Challenge yourself - when the pain of rebooting gets to you, you'll prolly work harder to write a program which DOESN'T crash.

I will never understand that point of view beyond having complete confidence that it is wrong.

People do not learn things like programming better by being thrown into the deep end and sinking. Learning works better when it is focused and leveraged. Focused by learning a manageable chunk at once. Leveraged by using knowledge you already have to support the new learning (such as using printf etc. to get output from asm if you happen to already know C before asm).

MTK358 · 03-05-2010, 08:57 AM

Quote:

Originally Posted by Programming From the Ground Up

Before executing a function, a program pushes all of the parameters for the function onto the
stack in the reverse order that they are documented. Then the program issues a call instruction
indicating which function it wishes to start. The call instruction does two things. First it pushes
the address of the next instruction, which is the return address, onto the stack. Then it modiﬁes
the instruction pointer (%eip) to point to the start of the function. So, at the time the function
starts, the stack looks like this (the "top" of the stack is at the bottom on this example):

Code:

Parameter #N
...
Parameter 2
Parameter 1
Return Address <--- (%esp)

Each of the parameters of the function have been pushed onto the stack, and ﬁnally the return
address is there. Now the function itself has some work to do.
The ﬁrst thing it does is save the current base pointer register, %ebp, by doing pushl %ebp. The
base pointer is a special register used for accessing function parameters and local variables. Next,
it copies the stack pointer to %ebp by doing movl %esp, %ebp. This allows you to be able to
access the function parameters as ﬁxed indexes from the base pointer. You may think that you can
use the stack pointer for this. However, during your program you may do other things with the
stack such as pushing arguments to other functions.

The part that confuses me is in bold.

Why are they saving the value of %ebp to the stack when it wasn't yet even set?

crts · 03-05-2010, 09:10 AM

Quote:

Originally Posted by MTK358

The part that confuses me is in bold.

Why are they saving the value of %ebp to the stack when it wasn't yet even set?

Keep in mind that this is part of a function call. Now this function could have been called from inside another function. After your function finishes, you will have to restore the base pointer for the calling function.

Sergei Steshenko · 03-05-2010, 09:14 AM

Quote:

Originally Posted by MTK358

...
Why are they saving the value of %ebp to the stack when it wasn't yet even set?

I suggest to first learn how HW interrupts and their handlers work - had you known that, you wouldn't have asked the question, though you've already been given an answer.

MTK358 · 03-05-2010, 09:44 AM

Tell me if this is correct (see attachment):

EDIT: I like the fact that ".doc" isn't listed in valid attachment extensions!

johnsfine · 03-05-2010, 09:53 AM

Quote:

Originally Posted by MTK358

Tell me if this is correct (see attachment):

Yes it is correct.

I hope you also understand that parts of this are rules of the C ABI (such as the sequence of the parameters on the stack, and the fact that the caller not the callee removes those parameters from the stack). But parts are just convention, such as the use of ebp.

Restoring ebp to its previous value before return (if you used it) is part of the ABI. The specific method of save/restore and the fact that you use it at all are convention, not part of the ABI.

Quote:

Originally Posted by MTK358

Why are they saving the value of %ebp to the stack when it wasn't yet even set?

On entry to each function, the seven registers eax, ebx, ecx, edx, esi, edi and ebp all hold whatever values they held in the calling function at the moment of the call.

On return from each function, the ABI mandates that specific registers hold the same values they held on entry to the function. If the function uses those registers, it is responsible for saving them before use and restoring them after use.

In the 16 bit ABI, those registers are bp, si and di.

In the 64 bit ABI, those registers are rbp, rbx, and r12-r15.

I'm having a momentary mental block about the 32 bit ABI. Those register(s) are ebp and I can't remember which if any others.

Any register other than esp or the specific ones I just described may be used by any function and left modified on return from the function, so the caller cannot assume values in such registers will be valid after the call.

MTK358 · 03-05-2010, 10:15 AM

So you read the attachment and it is correct? That's good.

And I do understand that all that is just a convention, so that code by other programmers will behave as you expect.

And so if you don't need any parameters, return values, or local vars, you can just use call and return?

Also, are all parameters and local vars accessed as offsets from %ebp in the main body of the function?