ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am trying to win a bet with a friend of mine on whos "echo" recreation is more efficient. Executable size also matters but pure raw speed is what is important.
This really isn't a "help me" kind of question but more of a "any ideas for improvement" kind of question.
He is making his echo recreation in C and I am making mine in 64-bit assembly using the nasm assembler. We both wanted to know if the GCC compiler makes programming in assembly pointless.
The code compiles into a "1.9 KB" executable and uses a total of roughly "2 KB" of RAM. That's pretty darn small to me. I did reference some C libraries however I made sure with him that doing so would be acceptable.
Code:
extern strcat
extern puts
segment .text
global main
main:
;Set up stack
push r12
push rbp
mov rbp, rsi
push rbx
mov ebx, edi
sub rsp, 48
;If argc == 1, no arguments
cmp edi, 1
je .done
;Else continue
.start:
lea rdi, [rsp+16]
mov ecx, 8
mov esi, 0
mov [rsp+8], esi
mov r12d, 0
jmp .print
.loop:
mov rsi, [rbp+0+r12*8]
lea rdi, [rsp+8]
call strcat
lea rdi, [rsp+8]
mov esi, space
call strcat
.print:
inc r12
cmp ebx, r12d
jg .loop
;Print out result
lea rdi, [rsp+8]
call puts
.done:
;End program
mov eax,1
mov ebx,0
int 80h
ret
section .data
space db " ", 0
We both wanted to know if the GCC compiler makes programming in assembly pointless.
I can already tell you that GCC does not make assembly pointless. Assembly is and will remain useful in certain very specific circumstances that cannot be handled by compiler. The point of assembly it that it provides maximum control over CPU resources and does not hide anything. The point of higher-level language is that they give you less control, but allow to port your code to different CPUs and maybe develop code faster than you would with assembly.
My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!
David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.
May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little
Oh and by the way, you can replace "mov esi,0" with "xor esi, esi" in .start - (I bloody well HOPE that's an instruction) - that'll make it a wee bit faster, I think!
David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.
May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little
Thanks!
A "different" friend of mine helped me with both 32 and 64-bit assembly. He knows a LOT more about assembly than I do and claims he even helped out in the creation of GCC. Cool huh?
Anyhow this is the updated code. I suck at documentation so hopefully you should get the gist of it:
Note that this code is considered "tainted" or "dangerous" because it doesn't follow the standard GCC guidelines for assembly. Or at least according to the friend I was talking about earlier.
I would be more than happy to help you out in an assembly related problem but I am no expert by any means so.... yeah. :-)
Code:
extern strcat ;Used to add to the char buffer
extern puts ;Used for screen printing
segment .text
global main ;Needed for the linker. GCC looks for main
main:
;Set up stack
mov rbp, rsi
mov ebx, edi
;If argc == 1, no arguments
cmp edi, 1
je .done ;Jump if equal
;Else continue
.start:
lea rdi, [rsp+16] ;Lea is used here for address calculation
xor r12d, r12d ;Used to prevent nasm size errors for null values
mov [rsp+8], r12d ;Set the buffer to null
jmp .print
.loop:
;strcat(rdi, rsi)
;Add the next argument to the buffer
mov rsi, [rbp+0+r12*8]
lea rdi, [rsp+8] ;Lea is used here to get the address of buffer
call strcat
;strcat(esi, rdi)
;Add a space to seperate the words to the buffer
lea rdi, [rsp+8] ;Lea is used here to get the address of buffer
mov rsi, space
call strcat
.print:
inc r12 ;Increase I by one
;If ebx > 0 goto .loop
;Change r12d to esi
cmp ebx, r12d
jg .loop
;Print out result
;puts(rdi)
lea rdi, [rsp+8] ;Lea is used here to get the address of buffer
call puts
.done:
;This prevents any cross executable interference.
;In "normal" cases Linux will clear the registers after a program exits
;however since GCC feels the need to clear them, I should too.
xor ebx, ebx
xor rdi, rdi
xor r12d, r12d
xor rsp, rsp
xor r12, r12
xor rsi, rsi
xor rbp, rbp
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
;Uses the standard linux syscalls for exit
mov eax,1
mov ebx,0
int 80h
ret
section .data
space db " ", 0
How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).
Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.
Code:
global _start
section .data
space:
db 0x20 ; Space
newline:
db 0x0A ; Newline
section .text
; [esp] = number of arguments
; [esp+4] = argv[0]
; [esp+8] = argv[1]
; eax = 4, int 0x80: write(ebx, ecx, edx)
_start:
mov ebp, esp
xor edi, edi
inc edi
; No arguments?
cmp edi, [ebp]
jae last
arg:
; ecx = esi = arg
inc edi
mov ecx, [ebp + edi * 4]
mov esi, ecx
; Find EOS
dec esi
len:
inc esi
cmp [esi], ah
jne len
; edx = length
mov edx, esi
sub edx, ecx
; write(1, arg, length)
xor eax, eax
xor ebx, ebx
mov al, 4
inc bl
int 0x80
; last argument?
cmp edi, [ebp]
jae last
; write(1, space, 1)
xor eax, eax
xor ebx, ebx
mov al, 4
inc bl
mov ecx, space
mov edx, ebx
int 0x80
jmp arg
last:
; write one newline
xor eax, eax
xor ebx, ebx
mov al, 4
inc bl
mov ecx, newline
mov edx, ebx
int 0x80
; exit(0)
xor eax, eax
xor ebx, ebx
inc eax
int 0x80
If you save the above as echo.asm, you can compile and link it using
The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.
Last edited by Nominal Animal; 11-20-2011 at 05:24 PM.
How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).
Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.
--SNIP--
The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.
Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o
Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o
Ugly? Hey, I take offense at that. I think.
As a punishment, here is more code:
Code:
global _start
section .text
default rel
_start:
mov esi, [esp + 8]
xor eax, eax
or esi, esi
jnz params
xor ebx, ebx
push dword 0x000A
inc bl
xor eax, eax
mov ecx, esp
mov al, 4
mov edx, ebx
int 0x80
jmp exit
params:
mov ebx, [esp]
mov ecx, esi
mov edx, [esp + 4*ebx]
cld
mov edi, esi
jmp find
skip:
lodsb
or al, al
jz skip
mov [edi], byte 0x20
inc edi
next:
stosb
find:
lodsb
or al, al
jnz next
cmp esi, edx
jbe skip
mov al, 0x0a
stosb
xor ebx, ebx
mov edx, edi
mov al, 4
sub edx, ecx
inc bl
int 0x80
exit:
xor eax, eax
xor ebx, ebx
mov al, 1
int 0x80
Now this is getting nearer the theoretical minimum, in terms of both code size, memory use (four bytes of stack if no parameters, none otherwise), and number of CPU cycles used. It has no data segment, just 83 bytes of code; the x86 ELF32 executable comes to 316 bytes. It only does one syscall to output, and one to exit.
Now this one is as ugly as the posterior opening of a bird of prey, and as hacky as a rabid barbarian on mushrooms. But it should work, even for very large command line argument lists. (Up to kernel limits for me.)
What I am really interested in, is how are you going to measure the time taken? Wall clock? CPU cycles? Normal use, with just a few parameters, or with a huge parameter list?
I would personally first agree on a suitable test set, then run it a few dozen times to get rid of outliers (runs where other stuff running on the machine slowed the run down). I'd pick either the minimum time, or the most typical time.
My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!
That would require a clairvoyant -- how else could the assembler guess what processor will be the program running on? These cycle-counts are different for every CPU-(sub)version.
What kind of a register is r12d? Is it any relation to r12?
Sorry to sound like a FOOL, but I thought it's better to know, and get my doubts cleared up than simply REMAIN a fool. (I've never even *installed* a 64-bit Linux, let alone programmed in 64-bit asm!).
I have many more questions about the logic of the program, but I'll save those for later...
David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? Which country are you in, btw?
Nominal Animal, since you're in Finland - are you a member of the demoscene?
What kind of a register is r12d? Is it any relation to r12?
I have many more questions about the logic of the program, but I'll save those for later...
David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? Which country are you in, btw?
I live in the USA. Sorry I don't have a webcam so no video chatting for me.
I visit this site all the time so a PM would be just fine. I added your friend request.
Once again I suck at documentation so I am sorry for the confusing logic. I actually presumed it was quiet readable but strings in nasm can be a little confusing.
I actually thought the Intel and AMD (not AS good) manuals were a godsend but then again I have a LOT of programmming experience.
I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers. Even the assembly GCC outputs disturbs me but that is a completely different topic.
I always thought I was some sort of mental masochist to enjoy assembly programming and learning as many languages as I can. Vielleicht bin ich rechts :-)
This should hopefully explain your confusion about r12d. There are several more registers that I don't use to remain as compatible with older processors as possible.
Code:
Where an instruction requires a register value, it is already implicit in the encoding of the rest of the instruction what type of register is intended: an 8-bit general-purpose register, a segment register, a debug register, an MMX register, or whatever. Therefore there is no problem with registers of different types sharing an encoding value.
Please note that for the register classes listed below, the register extensions (REX) classes require the use of the REX prefix, in which is only available when in long mode on the x86-64 processor. This pretty much goes for any register that has a number higher than 7.
The encodings for the various classes of register are:
8-bit general registers: AL is 0, CL is 1, DL is 2, BL is 3, AH is 4, CH is 5, DH is 6 and BH is 7. Please note that AH, BH, CH and DH are not addressable when using the REX prefix in long mode.
8-bit general register extensions (REX): SPL is 4, BPL is 5, SIL is 6, DIL is 7, R8B is 8, R9B is 9, R10B is 10, R11B is 11, R12B is 12, R13B is 13, R14B is 14 and R15B is 15.
16-bit general registers: AX is 0, CX is 1, DX is 2, BX is 3, SP is 4, BP is 5, SI is 6, and DI is 7.
16-bit general register extensions (REX): R8W is 8, R9W is 9, R10w is 10, R11W is 11, R12W is 12, R13W is 13, R14W is 14 and R15W is 15.
32-bit general registers: EAX is 0, ECX is 1, EDX is 2, EBX is 3, ESP is 4, EBP is 5, ESI is 6, and EDI is 7.
32-bit general register extensions (REX): R8D is 8, R9D is 9, R10D is 10, R11D is 11, R12D is 12, R13D is 13, R14D is 14 and R15D is 15.
64-bit general register extensions (REX): RAX is 0, RCX is 1, RDX is 2, RBX is 3, RSP is 4, RBP is 5, RSI is 6, RDI is 7, R8 is 8, R9 is 9, R10 is 10, R11 is 11, R12 is 12, R13 is 13, R14 is 14 and R15 is 15.
Segment registers: ES is 0, CS is 1, SS is 2, DS is 3, FS is 4, and GS is 5.
Floating-point registers: ST0 is 0, ST1 is 1, ST2 is 2, ST3 is 3, ST4 is 4, ST5 is 5, ST6 is 6, and ST7 is 7.
64-bit MMX registers: MM0 is 0, MM1 is 1, MM2 is 2, MM3 is 3, MM4 is 4, MM5 is 5, MM6 is 6, and MM7 is 7.
128-bit XMM (SSE) registers: XMM0 is 0, XMM1 is 1, XMM2 is 2, XMM3 is 3, XMM4 is 4, XMM5 is 5, XMM6 is 6 and XMM7 is 7.
128-bit XMM (SSE) register extensions (REX): XMM8 is 8, XMM9 is 9, XMM10 is 10, XMM11 is 11, XMM12 is 12, XMM13 is 13, XMM14 is 14 and XMM15 is 15.
Control registers: CR0 is 0, CR2 is 2, CR3 is 3, and CR4 is 4.
Control register extensions: CR8 is 8.
Debug registers: DR0 is 0, DR1 is 1, DR2 is 2, DR3 is 3, DR6 is 6, and DR7 is 7.
Test registers: TR3 is 3, TR4 is 4, TR5 is 5, TR6 is 6, and TR7 is 7.
(Note that wherever a register name contains a number, that number is also the register value for that register.)
I live in the USA. Sorry I don't have a webcam so no video chatting for me.
I visit this site all the time so a PM would be just fine. I added your friend request.
How about Google Talk then? It'd require me to go into Windoze, but I think that's the only VOIP program I have. Do you have a Google email acct?
I'm sending you a message through this site, check your messages.
Quote:
Originally Posted by David2010
I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers.
You are not Mad at all! I would love to do the exact same thing, but a full 3D game in Asm is just TOO hard a proposition!
You should look into the demoscene - Google it and read the Wikipedia link, that'll tell you everything. Go to www.pouet.net for a site where demosceners hang out.
Quote:
Originally Posted by David2010
This should hopefully explain your confusion about r12d.
Actually it didn't - I repeat my question, is it a 32-bit register? What does "extension" mean?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.