[SOLVED] I need help with understand a recursive assembly program.

joeBuffer · 08-12-2009, 08:04 PM

One last tiny question, and I'm sure I could get this by reading it maybe ... but just in case:
I'm still a little thrown off ... in _start, before the first call to factorial, why only one addl $4, %esp after all that?
The rest of them don't clear the arguments from the stack...? I understand how it's working now, much better ... I'm just not sure I understand this quite yet.
This is just the first book I've ever read on assembly, and I'm still getting used to it, and this example really threw me off, I was learning a lot and really understanding things well up until this.

Code:

movl %ebp, %esp
popl %ebp
ret

what about clearing the stack of the argument?
maybe I'm still confused.

neonsignal · 08-12-2009, 08:19 PM

Normally you would fix up the stack pointer at the end of each function by using the frame pointer in bp. The main factorial code does this at the end:

Code:

movl %ebp, %esp

In the initial function, they save a few instructions by not using a frame pointer, and just resetting the stack pointer directly (effectively a pop)

Code:

addl $4, %esp

joeBuffer · 08-12-2009, 08:34 PM

Now, the most obvious question I could possibly ask.
How do the values of ALL of the parameters to ALL of the recursive calls get combined into one return value?
Then, where do all of the parameters passed in recursively go, before you get back to _start and addl $4, %esp?

neonsignal · 08-12-2009, 08:41 PM

Sorry, perhaps that wasn't very clear. Each time you come back up a level, you are throwing away the parameter from that level

Code:

movl %ebp, %esp

They don't all get combined at once. Each level multiplies the parameter into the result, so the result is 'accumulated' in eax.

movl 8(%ebp), %ebx
imull %ebx, %eax

So the stack will go through the following stages:
5
5 4
5 4 3
5 4 3 2
5 4 3 2 1 (result = 1)
5 4 3 2 (result = 1*2)
5 4 3 (result = 1*2*3)
5 4 (result = 1*2*3*4)
5 (result = 1*2*3*4*5

joeBuffer · 08-12-2009, 08:44 PM

Isn't it happening in the opposite direction, then?
Wouldn't it call the factorial function recursively, until the parameter was 1, then 1 is in %eax, then %ebx is multiplied by that %eax in the factorial function with the parameter of 2, then %ebx is multiplied by that %ebx in the factorial function with the parameter of 3, etc.?
It isn't like an actual factorial, 4 * 3 * 2 * 1,
it ends up being 1 * 2 * 3 * 4?

neonsignal · 08-12-2009, 08:48 PM

Yes, it is happening in the opposite way to the way you would typically express a factorial. Fortunately multiply operations are commutative (meaning you can swap the order and get the same result)!

They are also associative, which means you can also do
5 * (4 * (3 * (2 * (1)))) and get the same result.

So there are several ways to calculate the answer.

Interestingly, this is not true of all other mathematical objects. The chess game example I gave is one where you would have to be more careful about doing things in the correct order when recursing.

joeBuffer · 08-12-2009, 08:49 PM

I'm sorry to drag this out like this, but I'm very thoroughly under this impression:

the way the stack is for the function, is this:
the argument
the return address
old %ebp <- ebp

When the kaleidoscope of functions here come to an end, this happens:

Code:

movl %ebp, %esp
popl %ebp
ret

for every one.
Every one of them has "the argument" at the highest address in the stack.
then it's this:

Code:

movl %ebp, %esp:
    the argument
    the return address
    old %ebp <- %esp
pop %ebp:
    the argument
    the return address
ret:
    the argument (since ret pops the return address)

This happens for EVERY call to the function, then at the end of ALL of these calls, you use:
addl $4, %esp

So after all those recursive calls to the factorial function, you clear the stack of one argument. The argument for each individual function is not cleared in the end_factorial code. Where did they go?

joeBuffer · 08-12-2009, 08:51 PM

Thanks for the additional information ... I did know that, though. I know about the commutative and associative and distributive properties, etc. I'm just not really advanced in algebra.
Thanks for the additional information, though.

neonsignal · 08-12-2009, 09:06 PM

Quote:

The argument for each individual function is not cleared in the end_factorial code. Where did they go?

It is cleared each time by the copying of 'ebp' into 'esp' (or, not so much cleared as skipping back over it). 'esp' is the actual stack pointer, so overwriting it causes arguments to be lost from the stack. Actually, all a 'pop' does too is to copy something from the stack and then adjust the stack pointer.

So adjusting (or setting) the stack pointer is the same as one or more pops, while ignoring what was on the stack. Effectively the end_factorial code does simply add 4 to the stack pointer (since the value in ebp is esp+4).

In older architectures, there was no frame pointer, and the stack had to be fixed up explicitly. There are big advantages to having a frame pointer: it makes it easier to access parameters from a known point; and it makes it trivial to restore the stack pointer at the end.

In reality, everything is still in the stack memory, but all that matters is where the stack pointer is pointing to. The next time you do a function call, the old information will get overwritten.

joeBuffer · 08-12-2009, 09:15 PM

Quote:

They don't all get combined at once. Each level multiplies the parameter into the result, so the result is 'accumulated' in eax.

movl 8(%ebp), %ebx
imull %ebx, %eax

This and the other stuff in this post, and the one I thanked you for before, is what I was really *missing*, where I thought I had no idea what I was looking at.
-----

In this book, it's this way through the whole book:
push the parameter
call the function
push %ebp
mov %esp, %ebp
then the local variables or whatever go here.

this would leave the parameter untouched, if you use:
mov %ebp, %esp
pop %ebp
ret

it changes the stack from:
parameter
return address
old %ebp <- %ebp
local variables, or whatever

to:
parameter

and I've had a good understanding of the stack pointer ... but in this book, it is always mov %ebp, %esp and pop %ebp goes back to where the return address is, then you use ret, then use the addl $4, %esp(or whatever the case may be) to clear the parameter from the calling function. In this case, it's a single addl $4, %esp, after the recursive function having the parameters pushed and not cleared repeatedly. It appears.

Code:

movl 8(%ebp), %eax #This moves the first argument to %eax
                   #4(%ebp) holds the return address, and
                   #8(%ebp) holds the first parameter

joeBuffer · 08-12-2009, 09:33 PM

I've just been thinking that since I have been learning a lot and had very good comprehension of the rest of the examples in this book, so far, maybe I should just read the rest of the book, and worry about recursive functions later, or when I absolutely have to. I don't think there are any other recursive functions in this book, and it has a lot of information in it.

neonsignal · 08-12-2009, 09:34 PM

Yeah, I can see that it could be confusing. They haven't cleaned up the stack directly after the call to factorial, because it is taken care of in the end code. It is a common optimization; it doesn't matter how much you push on the stack in the middle of the function, the end code will take care of it.

Interestingly, compilation of some other high level languages had the opposite convention; it was the callee that cleaned up the stack, rather than the caller. The nature of the C language (particular in allowing variable numbers of arguments) has tended to favor the caller cleaning up the stack. There are also some advantages in optimization, since the caller knows what the subsequent code is, so the cleanup can be delayed as much as possible).

But I agree - if you already understand the recursive process in C, you won't learn much more by being able to follow every line of the assembler. Understanding recursion is much more important than being able to follow a few assembler optimizations.

joeBuffer · 08-12-2009, 09:37 PM

Well, thank you for your time and help, but I still don't really see how the parameters are being cleaned up in the factorial function or afterward ... can you give a brief description? It can be a little technical, if I don't really understand I'll just look it up ... but I've been able to understand what's in this book up until this example very easily ... I know it doesn't seem like it, but I guess it's just this example.
The examples of C recursion are easy for me to understand.

neonsignal · 08-12-2009, 09:57 PM

Code:

1 pushl %ebp
2 movl %esp, %ebp
...
3 pushl %eax
4 call factorial
...
5 movl %ebp, %esp
6 popl %ebp
7 ret

Okay, this is the relevant bits of the code. Let us say that esp starts at the value 42.

0. the call to the function pushes the return address. esp is now 38.
1. ebp is saved on the stack. esp is now 34.
2. esp is copied to ebp. both are 34.
3. eax is pushed. esp is now 30.
4. factorial is called. Let us assume for the sake of the argument that it has fixed up the stack at the end. so esp is still 30.
5. ebp is moved to esp. this sets esp to 34 (see step 3)
6. ebp is restored to the calling frame pointer. esp is now 38.
7. the function returns by popping the return address to the program counter. esp is now 42 (back to the original point).

Notice that even if they added 4 to esp after the recursive call to factorial, it would have made no difference, because ebp was used to overwrite esp in step 5 anyway.

Even if the recursive call to factorial totally messed up the stack (it can't, or its return would have failed), the stack would still be fixed up by step 5.

joeBuffer · 08-12-2009, 10:10 PM

I see. It's just that the factorial function uses the stack and the function call in a different way than _start and pretty much all the other examples in this book, other than this one.
Thanks. This was very helpful.
I know a few others in this book, they use the stack and things differently, but for some reason this example really got to me. It was very annoying, and seemed simple at first.