[SOLVED] dynamic memory allocation in assembly

hda7 · 10-26-2009, 04:45 PM

I have been doing some small projects in assembly, and wondered if there is a straightforward way to dynamically allocate memory in assembly. Any help would be welcome.

lutusp · 10-26-2009, 05:49 PM

Quote:

Originally Posted by hda7

I have been doing some small projects in assembly, and wondered if there is a straightforward way to dynamically allocate memory in assembly. Any help would be welcome.

Memory allocation and management are high-level features of languages like C. Assembly has access to a large amount of memory, but it is not managed in the same way as it is in C and/or C++. It can be argued that it is not managed at all.

smeezekitty · 10-26-2009, 05:55 PM

link with the C library and call malloc().

johnsfine · 10-26-2009, 06:04 PM

Quote:

Originally Posted by hda7

I have been doing some small projects in assembly, and wondered if there is a straightforward way to dynamically allocate memory in assembly.

What platform do these small projects run on?

Why are they in assembly?

Do they need to be entirely in assembly?

If you are running on some embedded system or other very limited platform, the nature of that system will have a significant impact on your choices for memory allocation.

If you are talking about Linux or other full featured platform, then dynamic memory allocation has two aspects:

1) Requesting large chunks of page aligned memory from the OS.
2) Managing the division of those chunks into smaller chunks of used and free memory.

It's not clear which, or both of those you are requesting.

For either or both, unless you need to be pure about "projects in assembly", the simple answer is to call the malloc function in the C library.

If you want full featured support of memory allocation and you want to be pure about assembly, then there isn't much choice other than writing a malloc function in assembly.

There usually isn't a good reason to write malloc in assembly, so the examples of malloc you can find to show you how to write malloc, will be in C.

My guess is you are doing assembly projects for learning, rather than because there is some reason those projects are better in assembly. If that is the case, my advice (based on a lot of relevant knowledge and experience) is stop being purist about it. Mix assembly with C or C++.

Doing in assembly some part of a project that makes sense to be done in assembly, is more interesting and a more useful learning experience then getting bogged down in the details of doing in assembly the parts of a project that most clearly should be done in C or C++.

Edit:

Quote:

Originally Posted by smeezekitty

link with the C library and call malloc().

I agree. While I was slowly composing the above complicated answer, you gave the concise answer. I wouldn't bet hda7 will accept that answer, but it is probably the best answer.

hda7 · 10-27-2009, 11:02 AM

I'll probably end up linking against libc and using malloc, but I am not sure how to call C code from assembly. Any hints?

johnsfine · 10-27-2009, 01:52 PM

Quote:

Originally Posted by hda7

I'll probably end up linking against libc and using malloc, but I am not sure how to call C code from assembly.

There are various places you can read about the ABI for x86 or x86_64. But that isn't the method I would advise for getting this info, because you might get the wrong version or misunderstand it.

The easy way is to write a trivial C function to call the C function you want to call, then use gcc -S to compile that trivial function into a .s file giving the asssembly code for the call you want.

For example to see a meaninful call to malloc

Code:

void foo()
{
*(int*)malloc(sizeof(int)) = 7;
}

The relevant part of the .s file that resulted (in x86_64) was

Code:

        movl    $4, %edi
        call    malloc
        movl    $7, (%rax)

I know that $4 represents my sizeof(int)
I know that ordinary integer and pointer results of functions are returned in rax, which I see confirmed by the way rax is used after the function call in the way my C code used the result of the function call.
I had forgotten (debugging more Win64 asm code than Linux asm code lately) that the leftmost ordinary arg is passed in rdi (in Win64 it is in rcx), but that was easy to discover by seeing what was done with that $4.

For x86 (32 bit) I recompiled that with gcc -m32 -S

Code:

        subl    $8, %esp
        movl    $4, (%esp)
        call    malloc
        movl    $7, (%eax)

So you can see args are passed on the stack, not in registers (simpler, but rotten for performance).
Notice also that esp is kept divisible by 8. So even though just 4 bytes are passed on the stack, the code allocates space for 8.

A couple more details you might need to understand by seeing the whole function

Code:

foo:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movl    $4, (%esp)
        call    malloc
        movl    $7, (%eax)
        leave
        ret

The return address is 4 bytes put on when esp is divisible by 8, so on entry to the function esp is always divisible by 4 but never by 8. After pushing ebp, esp is again divisible by 8.

It is typical to delay cleanup of args on the stack. The called function does not clean up the stack and in this case the calling code does not directly clean it up either. The leave instruction cleans the stack of any excess pushed onto it during this function.

If your call to a C function is inside a loop, you need to be more careful about managing the amount of stack used. But it is generally wasteful to clean the stack after each call.

hda7 · 10-27-2009, 05:50 PM

I am using nasm, not gas, for my assembly projects, and it uses Intel syntax instead of AT&T, and I am not familiar with the AT&T syntax that gcc produces. Can you explain a little of the syntax? By the way, I might take a look at the ABI specification; could you direct me to a copy?

johnsfine · 10-27-2009, 08:12 PM

I've forgotten too much NASM syntax in the many years since I was involved in that project.

It should be pretty obvious from the samples I just posted that AT&T syntax prefixes every register name with % and prefixes constants with $ and has operands in the SRC,DST sequence.

NASM uses the opposite sequence for operands and IIRC doesn't have any prefix for registers. Other details I forget.

But the meaning of those samples in AT&T syntax ought to be obvious to you if you know x86 assembly at all. So if you know NASM sequence, you ought to have no trouble coding the same thing in NASM sequence.

I'm not sure where to find the 32 bit X86 ABI (I'm assuming you want 32 bit, not 64 bit).

hda7 · 10-28-2009, 08:01 AM

OK, I get it now. What confused me was first what $8 and $4 meant. Another thing is that nasm uses mov instead of movl.

Thanks for the help!

johnsfine · 10-28-2009, 08:25 AM

Quote:

Originally Posted by hda7

Another thing is that nasm uses mov instead of movl.

That is the thing I hated most about Intel syntax, especially the MS extensions to Intel syntax.

In a high level language, operand size is clearly a characteristic of the operand, not the operator.

But in assembly, operand size should be a characteristic of the opcode, not the operand.

A memory reference in assembly can point to any data type. Typed pointers are for high level languages.

Intel/MS assembly syntax is turned into a giant kludge by all the things you need to do to pretend that memory references in assembly have data types. It is so much easier to work with ordinary assembly languages where the operand sizes and types are normally implied or specified by the opcode.