LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Echo recreation in assembly review. (https://www.linuxquestions.org/questions/programming-9/echo-recreation-in-assembly-review-914454/)

David2010 11-19-2011 08:34 PM

Echo recreation in assembly review.
 
I am trying to win a bet with a friend of mine on whos "echo" recreation is more efficient. Executable size also matters but pure raw speed is what is important.

This really isn't a "help me" kind of question but more of a "any ideas for improvement" kind of question.

He is making his echo recreation in C and I am making mine in 64-bit assembly using the nasm assembler. We both wanted to know if the GCC compiler makes programming in assembly pointless.

The code compiles into a "1.9 KB" executable and uses a total of roughly "2 KB" of RAM. That's pretty darn small to me. I did reference some C libraries however I made sure with him that doing so would be acceptable.

Code:


extern strcat
extern puts

segment .text
        global main

main:
        ;Set up stack
        push r12
        push rbp
        mov        rbp, rsi
        push rbx
        mov        ebx, edi
        sub        rsp, 48
       
        ;If argc == 1, no arguments
        cmp        edi, 1
        je .done

        ;Else continue
.start:
        lea        rdi, [rsp+16]
        mov        ecx, 8
        mov esi, 0
        mov        [rsp+8], esi
        mov r12d, 0
       
        jmp        .print
       
.loop:
        mov        rsi, [rbp+0+r12*8]
        lea        rdi, [rsp+8]
        call strcat
       
        lea        rdi, [rsp+8]
        mov        esi, space
        call strcat

.print:
        inc        r12
        cmp        ebx, r12d
        jg        .loop
       
        ;Print out result
        lea        rdi, [rsp+8]
        call puts
       
.done:
        ;End program
        mov        eax,1               
        mov        ebx,0               
        int        80h       
        ret
       
section .data
        space db " ", 0


SigTerm 11-19-2011 09:14 PM

Quote:

Originally Posted by David2010 (Post 4528632)
We both wanted to know if the GCC compiler makes programming in assembly pointless.

I can already tell you that GCC does not make assembly pointless. Assembly is and will remain useful in certain very specific circumstances that cannot be handled by compiler. The point of assembly it that it provides maximum control over CPU resources and does not hide anything. The point of higher-level language is that they give you less control, but allow to port your code to different CPUs and maybe develop code faster than you would with assembly.

firstfire 11-20-2011 03:33 AM

Hi.

It may be useful and fun for you to look at this and links therein.

resetreset 11-20-2011 07:08 AM

firstfire, that isn't a link.


My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!
David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.

May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little :)

Thanks! :)

firstfire 11-20-2011 07:22 AM

Quote:

Originally Posted by resetreset (Post 4528877)
firstfire, that isn't a link.

Oops.. sorry :) Fixed. Hope now it works.

resetreset 11-20-2011 07:23 AM

Oh and by the way, you can replace "mov esi,0" with "xor esi, esi" in .start - (I bloody well HOPE that's an instruction) - that'll make it a wee bit faster, I think!

David2010 11-20-2011 04:19 PM

Quote:

Originally Posted by resetreset (Post 4528877)

David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.

May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little :)

Thanks! :)

A "different" friend of mine helped me with both 32 and 64-bit assembly. He knows a LOT more about assembly than I do and claims he even helped out in the creation of GCC. Cool huh?

Anyhow this is the updated code. I suck at documentation so hopefully you should get the gist of it:

Note that this code is considered "tainted" or "dangerous" because it doesn't follow the standard GCC guidelines for assembly. Or at least according to the friend I was talking about earlier.

I would be more than happy to help you out in an assembly related problem but I am no expert by any means so.... yeah. :-)

Code:


extern strcat        ;Used to add to the char buffer
extern puts                ;Used for screen printing

segment .text
        global main ;Needed for the linker. GCC looks for main

main:
        ;Set up stack
        mov        rbp, rsi
        mov        ebx, edi
       
        ;If argc == 1, no arguments
        cmp        edi, 1               
        je .done        ;Jump if equal

        ;Else continue
.start:
        lea        rdi, [rsp+16]        ;Lea is used here for address calculation
        xor r12d, r12d                ;Used to prevent nasm size errors for null values
        mov        [rsp+8], r12d        ;Set the buffer to null
       
        jmp        .print
       
.loop:
        ;strcat(rdi, rsi)
        ;Add the next argument to the buffer
        mov        rsi, [rbp+0+r12*8]
        lea        rdi, [rsp+8]        ;Lea is used here to get the address of buffer
        call strcat
       
        ;strcat(esi, rdi)
        ;Add a space to seperate the words to the buffer
        lea        rdi, [rsp+8]        ;Lea is used here to get the address of buffer
        mov        rsi, space       
        call strcat

.print:
        inc        r12                        ;Increase I by one
       
        ;If ebx > 0 goto .loop
        ;Change r12d to esi
        cmp        ebx, r12d       
        jg        .loop               
       
        ;Print out result
        ;puts(rdi)
        lea        rdi, [rsp+8]        ;Lea is used here to get the address of buffer
        call puts
       
.done:
        ;This prevents any cross executable interference.
        ;In "normal" cases Linux will clear the registers after a program exits
        ;however since GCC feels the need to clear them, I should too.
        xor ebx, ebx
        xor rdi, rdi
        xor r12d, r12d
        xor rsp, rsp
        xor r12, r12
        xor rsi, rsi
        xor rbp, rbp
        xor rax, rax
        xor rbx, rbx
        xor rcx, rcx
        xor rdx, rdx

        ;Uses the standard linux syscalls for exit
        mov        eax,1               
        mov        ebx,0               
        int        80h       
        ret
       
section .data
        space db " ", 0


Nominal Animal 11-20-2011 05:21 PM

How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).

Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.
Code:

        global        _start

        section .data
space:
        db 0x20                                ; Space

newline:
        db 0x0A                                ; Newline


        section        .text

; [esp] = number of arguments
; [esp+4] = argv[0]
; [esp+8] = argv[1]

; eax = 4, int 0x80: write(ebx, ecx, edx)

_start:
        mov        ebp, esp
        xor        edi, edi
        inc        edi

        ; No arguments?
        cmp        edi, [ebp]
        jae        last

arg:
        ; ecx = esi = arg
        inc        edi
        mov        ecx, [ebp + edi * 4]
        mov        esi, ecx

        ; Find EOS
        dec        esi
len:
        inc        esi
        cmp        [esi], ah
        jne        len

        ; edx = length
        mov        edx, esi
        sub        edx, ecx

        ; write(1, arg, length)
        xor        eax, eax
        xor        ebx, ebx
        mov        al, 4
        inc        bl
        int        0x80

        ; last argument?
        cmp        edi, [ebp]
        jae        last

        ; write(1, space, 1)
        xor        eax, eax
        xor        ebx, ebx
        mov        al, 4
        inc        bl
        mov        ecx, space
        mov        edx, ebx
        int        0x80
        jmp        arg

last:
        ; write one newline
        xor        eax, eax
        xor        ebx, ebx
        mov        al, 4
        inc        bl
        mov        ecx, newline
        mov        edx, ebx
        int        0x80

        ; exit(0)
        xor        eax, eax
        xor        ebx, ebx
        inc        eax
        int        0x80

If you save the above as echo.asm, you can compile and link it using
Code:

nasm -felf32 -o echo.o echo.asm
ld -s -o echo echo.o

The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.

syg00 11-20-2011 06:58 PM

Quote:

Originally Posted by firstfire (Post 4528788)
Hi.

It may be useful and fun for you to look at this and links therein.

lol ... that is one twisted puppy.

Made my day.

David2010 11-20-2011 07:19 PM

Quote:

Originally Posted by Nominal Animal (Post 4529240)
How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).

Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.

--SNIP--

The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.

Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o

Nominal Animal 11-21-2011 12:23 AM

Quote:

Originally Posted by David2010 (Post 4529299)
Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o

Ugly? Hey, I take offense at that. I think. :p

As a punishment, here is more code:
Code:

        global _start
        section .text
        default rel

_start:
        mov        esi, [esp + 8]
        xor        eax, eax
        or        esi, esi
        jnz        params

        xor        ebx, ebx
        push        dword 0x000A
        inc        bl
        xor        eax, eax
        mov        ecx, esp
        mov        al, 4
        mov        edx, ebx
        int        0x80
        jmp        exit

params:
        mov        ebx, [esp]
        mov        ecx, esi
        mov        edx, [esp + 4*ebx]

        cld
        mov        edi, esi
        jmp        find

skip:
        lodsb
        or        al, al
        jz        skip
        mov        [edi], byte 0x20
        inc        edi
next:
        stosb
find:
        lodsb
        or        al, al
        jnz        next
        cmp        esi, edx
        jbe        skip
        mov        al, 0x0a
        stosb

        xor        ebx, ebx
        mov        edx, edi
        mov        al, 4
        sub        edx, ecx
        inc        bl
        int        0x80

exit:
        xor        eax, eax
        xor        ebx, ebx
        mov        al, 1
        int        0x80

Now this is getting nearer the theoretical minimum, in terms of both code size, memory use (four bytes of stack if no parameters, none otherwise), and number of CPU cycles used. It has no data segment, just 83 bytes of code; the x86 ELF32 executable comes to 316 bytes. It only does one syscall to output, and one to exit.

Now this one is as ugly as the posterior opening of a bird of prey, and as hacky as a rabid barbarian on mushrooms. But it should work, even for very large command line argument lists. (Up to kernel limits for me.)

What I am really interested in, is how are you going to measure the time taken? Wall clock? CPU cycles? Normal use, with just a few parameters, or with a huge parameter list?

I would personally first agree on a suitable test set, then run it a few dozen times to get rid of outliers (runs where other stuff running on the machine slowed the run down). I'd pick either the minimum time, or the most typical time.

Good luck,

NevemTeve 11-21-2011 12:50 AM

Quote:

Originally Posted by resetreset (Post 4528877)
My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!

That would require a clairvoyant -- how else could the assembler guess what processor will be the program running on? These cycle-counts are different for every CPU-(sub)version.

resetreset 11-21-2011 07:10 AM

What kind of a register is r12d? Is it any relation to r12?

Sorry to sound like a FOOL, but I thought it's better to know, and get my doubts cleared up than simply REMAIN a fool. (I've never even *installed* a 64-bit Linux, let alone programmed in 64-bit asm!).

I have many more questions about the logic of the program, but I'll save those for later...



David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? :) Which country are you in, btw?


Nominal Animal, since you're in Finland - are you a member of the demoscene?

David2010 11-21-2011 05:35 PM

Quote:

Originally Posted by resetreset (Post 4529653)
What kind of a register is r12d? Is it any relation to r12?

I have many more questions about the logic of the program, but I'll save those for later...

David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? :) Which country are you in, btw?

I live in the USA. Sorry I don't have a webcam so no video chatting for me.

I visit this site all the time so a PM would be just fine. I added your friend request.

Once again I suck at documentation so I am sorry for the confusing logic. I actually presumed it was quiet readable but strings in nasm can be a little confusing.

I actually thought the Intel and AMD (not AS good) manuals were a godsend but then again I have a LOT of programmming experience.

The only thing I regret about switching to linux is a lack of "practical" examples for GUI work through XORG comparable to http://win32assembly.online.fr/tutorials.html

I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers. Even the assembly GCC outputs disturbs me but that is a completely different topic.

I always thought I was some sort of mental masochist to enjoy assembly programming and learning as many languages as I can. Vielleicht bin ich rechts :-)

This should hopefully explain your confusion about r12d. There are several more registers that I don't use to remain as compatible with older processors as possible.

Code:


Where an instruction requires a register value, it is already implicit in the encoding of the rest of the instruction what type of register is intended: an 8-bit general-purpose register, a segment register, a debug register, an MMX register, or whatever. Therefore there is no problem with registers of different types sharing an encoding value.

Please note that for the register classes listed below, the register extensions (REX) classes require the use of the REX prefix, in which is only available when in long mode on the x86-64 processor. This pretty much goes for any register that has a number higher than 7.

The encodings for the various classes of register are:

    8-bit general registers: AL is 0, CL is 1, DL is 2, BL is 3, AH is 4, CH is 5, DH is 6 and BH is 7. Please note that AH, BH, CH and DH are not addressable when using the REX prefix in long mode.
    8-bit general register extensions (REX): SPL is 4, BPL is 5, SIL is 6, DIL is 7, R8B is 8, R9B is 9, R10B is 10, R11B is 11, R12B is 12, R13B is 13, R14B is 14 and R15B is 15.
    16-bit general registers: AX is 0, CX is 1, DX is 2, BX is 3, SP is 4, BP is 5, SI is 6, and DI is 7.
    16-bit general register extensions (REX): R8W is 8, R9W is 9, R10w is 10, R11W is 11, R12W is 12, R13W is 13, R14W is 14 and R15W is 15.
    32-bit general registers: EAX is 0, ECX is 1, EDX is 2, EBX is 3, ESP is 4, EBP is 5, ESI is 6, and EDI is 7.
    32-bit general register extensions (REX): R8D is 8, R9D is 9, R10D is 10, R11D is 11, R12D is 12, R13D is 13, R14D is 14 and R15D is 15.
    64-bit general register extensions (REX): RAX is 0, RCX is 1, RDX is 2, RBX is 3, RSP is 4, RBP is 5, RSI is 6, RDI is 7, R8 is 8, R9 is 9, R10 is 10, R11 is 11, R12 is 12, R13 is 13, R14 is 14 and R15 is 15.
    Segment registers: ES is 0, CS is 1, SS is 2, DS is 3, FS is 4, and GS is 5.
    Floating-point registers: ST0 is 0, ST1 is 1, ST2 is 2, ST3 is 3, ST4 is 4, ST5 is 5, ST6 is 6, and ST7 is 7.
    64-bit MMX registers: MM0 is 0, MM1 is 1, MM2 is 2, MM3 is 3, MM4 is 4, MM5 is 5, MM6 is 6, and MM7 is 7.
    128-bit XMM (SSE) registers: XMM0 is 0, XMM1 is 1, XMM2 is 2, XMM3 is 3, XMM4 is 4, XMM5 is 5, XMM6 is 6 and XMM7 is 7.
    128-bit XMM (SSE) register extensions (REX): XMM8 is 8, XMM9 is 9, XMM10 is 10, XMM11 is 11, XMM12 is 12, XMM13 is 13, XMM14 is 14 and XMM15 is 15.
    Control registers: CR0 is 0, CR2 is 2, CR3 is 3, and CR4 is 4.
    Control register extensions: CR8 is 8.
    Debug registers: DR0 is 0, DR1 is 1, DR2 is 2, DR3 is 3, DR6 is 6, and DR7 is 7.
    Test registers: TR3 is 3, TR4 is 4, TR5 is 5, TR6 is 6, and TR7 is 7.

(Note that wherever a register name contains a number, that number is also the register value for that register.)


resetreset 11-23-2011 06:51 AM

Quote:

Originally Posted by David2010 (Post 4530157)
I live in the USA. Sorry I don't have a webcam so no video chatting for me.

I visit this site all the time so a PM would be just fine. I added your friend request.

How about Google Talk then? It'd require me to go into Windoze, but I think that's the only VOIP program I have. Do you have a Google email acct?


I'm sending you a message through this site, check your messages.


Quote:

Originally Posted by David2010 (Post 4530157)


I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers.



You are not Mad at all! :) I would love to do the exact same thing, but a full 3D game in Asm is just TOO hard a proposition!
You should look into the demoscene - Google it and read the Wikipedia link, that'll tell you everything. Go to www.pouet.net for a site where demosceners hang out.



Quote:

Originally Posted by David2010 (Post 4530157)


This should hopefully explain your confusion about r12d.



Actually it didn't - I repeat my question, is it a 32-bit register? What does "extension" mean?


All times are GMT -5. The time now is 11:03 AM.