LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-19-2011, 08:34 PM   #1
David2010
Member
 
Registered: May 2009
Posts: 255

Rep: Reputation: 23
Echo recreation in assembly review.


I am trying to win a bet with a friend of mine on whos "echo" recreation is more efficient. Executable size also matters but pure raw speed is what is important.

This really isn't a "help me" kind of question but more of a "any ideas for improvement" kind of question.

He is making his echo recreation in C and I am making mine in 64-bit assembly using the nasm assembler. We both wanted to know if the GCC compiler makes programming in assembly pointless.

The code compiles into a "1.9 KB" executable and uses a total of roughly "2 KB" of RAM. That's pretty darn small to me. I did reference some C libraries however I made sure with him that doing so would be acceptable.

Code:
extern strcat
extern puts

segment .text
	global main

main:
	;Set up stack
	push r12
	push rbp
	mov	rbp, rsi
	push rbx
	mov	ebx, edi
	sub	rsp, 48
	
	;If argc == 1, no arguments
	cmp	edi, 1
	je .done

	;Else continue
.start:
	lea	rdi, [rsp+16]
	mov	ecx, 8
	mov esi, 0
	mov	[rsp+8], esi
	mov r12d, 0
	
	jmp	.print
	
.loop:
	mov	rsi, [rbp+0+r12*8]
	lea	rdi, [rsp+8]
	call strcat
	
	lea	rdi, [rsp+8]
	mov	esi, space
	call strcat

.print:
	inc	r12
	cmp	ebx, r12d
	jg	.loop
	
	;Print out result
	lea	rdi, [rsp+8]
	call puts
	
.done:
	;End program
	mov	eax,1		
	mov	ebx,0		
	int	80h	
	ret
	
section .data
	space db " ", 0
 
Old 11-19-2011, 09:14 PM   #2
SigTerm
Member
 
Registered: Dec 2009
Distribution: Slackware 12.2
Posts: 379

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by David2010 View Post
We both wanted to know if the GCC compiler makes programming in assembly pointless.
I can already tell you that GCC does not make assembly pointless. Assembly is and will remain useful in certain very specific circumstances that cannot be handled by compiler. The point of assembly it that it provides maximum control over CPU resources and does not hide anything. The point of higher-level language is that they give you less control, but allow to port your code to different CPUs and maybe develop code faster than you would with assembly.
 
Old 11-20-2011, 03:33 AM   #3
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

It may be useful and fun for you to look at this and links therein.

Last edited by firstfire; 11-20-2011 at 07:21 AM.
 
Old 11-20-2011, 07:08 AM   #4
resetreset
Senior Member
 
Registered: Mar 2008
Location: Cyberspace
Distribution: Dynebolic, Ubuntu 10.10
Posts: 1,340

Rep: Reputation: 62
firstfire, that isn't a link.


My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!
David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.

May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little

Thanks!
 
Old 11-20-2011, 07:22 AM   #5
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Quote:
Originally Posted by resetreset View Post
firstfire, that isn't a link.
Oops.. sorry Fixed. Hope now it works.
 
Old 11-20-2011, 07:23 AM   #6
resetreset
Senior Member
 
Registered: Mar 2008
Location: Cyberspace
Distribution: Dynebolic, Ubuntu 10.10
Posts: 1,340

Rep: Reputation: 62
Oh and by the way, you can replace "mov esi,0" with "xor esi, esi" in .start - (I bloody well HOPE that's an instruction) - that'll make it a wee bit faster, I think!
 
Old 11-20-2011, 04:19 PM   #7
David2010
Member
 
Registered: May 2009
Posts: 255

Original Poster
Rep: Reputation: 23
Quote:
Originally Posted by resetreset View Post

David, could I please ask you to comment your code line by line, since I am unfamiliar with the Linux environment ?, i.e. there's a lot I don't understand up there. I have been an asm freak for a long time, but it's been a while since I've done this sort of thing.

May I ask you how you picked up 64-bit asm? Are there any tuts on the web? (NOT software manuals from Intel etc. , I can't learn from those. I'd LOVE to learn from you, if you're willing to hold my hand a little

Thanks!
A "different" friend of mine helped me with both 32 and 64-bit assembly. He knows a LOT more about assembly than I do and claims he even helped out in the creation of GCC. Cool huh?

Anyhow this is the updated code. I suck at documentation so hopefully you should get the gist of it:

Note that this code is considered "tainted" or "dangerous" because it doesn't follow the standard GCC guidelines for assembly. Or at least according to the friend I was talking about earlier.

I would be more than happy to help you out in an assembly related problem but I am no expert by any means so.... yeah. :-)

Code:
extern strcat	;Used to add to the char buffer
extern puts		;Used for screen printing

segment .text
	global main ;Needed for the linker. GCC looks for main

main:
	;Set up stack
	mov	rbp, rsi
	mov	ebx, edi
	
	;If argc == 1, no arguments
	cmp	edi, 1		
	je .done	;Jump if equal

	;Else continue
.start:
	lea	rdi, [rsp+16]	;Lea is used here for address calculation
	xor r12d, r12d		;Used to prevent nasm size errors for null values
	mov	[rsp+8], r12d	;Set the buffer to null
	
	jmp	.print
	
.loop:
	;strcat(rdi, rsi)
	;Add the next argument to the buffer
	mov	rsi, [rbp+0+r12*8]
	lea	rdi, [rsp+8]	;Lea is used here to get the address of buffer
	call strcat
	
	;strcat(esi, rdi)
	;Add a space to seperate the words to the buffer
	lea	rdi, [rsp+8]	;Lea is used here to get the address of buffer
	mov	rsi, space	
	call strcat

.print:
	inc	r12			;Increase I by one
	
	;If ebx > 0 goto .loop
	;Change r12d to esi
	cmp	ebx, r12d	
	jg	.loop		
	
	;Print out result
	;puts(rdi)
	lea	rdi, [rsp+8]	;Lea is used here to get the address of buffer
	call puts
	
.done:
	;This prevents any cross executable interference.
	;In "normal" cases Linux will clear the registers after a program exits
	;however since GCC feels the need to clear them, I should too.
	xor ebx, ebx
	xor rdi, rdi
	xor r12d, r12d
	xor rsp, rsp
	xor r12, r12
	xor rsi, rsi
	xor rbp, rbp
	xor rax, rax
	xor rbx, rbx
	xor rcx, rcx
	xor rdx, rdx

	;Uses the standard linux syscalls for exit
	mov	eax,1		
	mov	ebx,0		
	int	80h	
	ret
	
section .data
	space db " ", 0

Last edited by David2010; 11-20-2011 at 04:20 PM.
 
Old 11-20-2011, 05:21 PM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).

Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.
Code:
	global	_start

	section .data
space:
	db 0x20				; Space

newline:
	db 0x0A				; Newline


	section	.text

; [esp] = number of arguments
; [esp+4] = argv[0]
; [esp+8] = argv[1]

; eax = 4, int 0x80: write(ebx, ecx, edx)

_start:
	mov	ebp, esp
	xor	edi, edi
	inc	edi

	; No arguments?
	cmp	edi, [ebp]
	jae	last

arg:
	; ecx = esi = arg
	inc	edi
	mov	ecx, [ebp + edi * 4]
	mov	esi, ecx

	; Find EOS
	dec	esi
len:
	inc	esi
	cmp	[esi], ah
	jne	len

	; edx = length
	mov	edx, esi
	sub	edx, ecx

	; write(1, arg, length)
	xor	eax, eax
	xor	ebx, ebx
	mov	al, 4
	inc	bl
	int	0x80

	; last argument?
	cmp	edi, [ebp]
	jae	last

	; write(1, space, 1)
	xor	eax, eax
	xor	ebx, ebx
	mov	al, 4
	inc	bl
	mov	ecx, space
	mov	edx, ebx
	int	0x80
	jmp	arg

last:
	; write one newline
	xor	eax, eax
	xor	ebx, ebx
	mov	al, 4
	inc	bl
	mov	ecx, newline
	mov	edx, ebx
	int	0x80

	; exit(0)
	xor	eax, eax
	xor	ebx, ebx
	inc	eax
	int	0x80
If you save the above as echo.asm, you can compile and link it using
Code:
nasm -felf32 -o echo.o echo.asm
ld -s -o echo echo.o
The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.

Last edited by Nominal Animal; 11-20-2011 at 05:24 PM.
 
Old 11-20-2011, 06:58 PM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,103

Rep: Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117Reputation: 4117
Quote:
Originally Posted by firstfire View Post
Hi.

It may be useful and fun for you to look at this and links therein.
lol ... that is one twisted puppy.

Made my day.
 
Old 11-20-2011, 07:19 PM   #10
David2010
Member
 
Registered: May 2009
Posts: 255

Original Poster
Rep: Reputation: 23
Quote:
Originally Posted by Nominal Animal View Post
How about ditching the C library altogether, and using only the write() syscall? To write to standard output using nothing but kernel-provided syscalls, set eax=4, ebx=1, ecx=pointer, edx=length, and use int 0x80. The call will clobber the registers (meaning their values will be random after the call).

Here is a 32-bit echo that compiles to a 404-byte executable file. It is nowhere near optimal, as it can be trimmed further quite a bit. I left only minimal comments into it. It uses no stack or heap at all (on top of what the kernel sets up and uses itself), only two bytes of static data and 88 bytes of code.

--SNIP--

The end result will contain 88 bytes of code and 2 bytes of data (see for yourself using objdump -x echo), the rest of the 404-byte file is ELF stuff. Note that you can use objdump -d echo to show you a disassembly of the code in AT&T syntax.
Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o
 
Old 11-21-2011, 12:23 AM   #11
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by David2010 View Post
Ouch. No offense but that code was really really ugly. Sure it may be smaller but I don't want to sacrifice that much readability for... THAT. O.o
Ugly? Hey, I take offense at that. I think.

As a punishment, here is more code:
Code:
	global _start
	section .text
	default rel

_start:
	mov	esi, [esp + 8]
	xor	eax, eax
	or	esi, esi
	jnz	params

	xor	ebx, ebx
	push	dword 0x000A
	inc	bl
	xor	eax, eax
	mov	ecx, esp
	mov	al, 4
	mov	edx, ebx
	int	0x80
	jmp	exit

params:
	mov	ebx, [esp]
	mov	ecx, esi
	mov	edx, [esp + 4*ebx]

	cld
	mov	edi, esi
	jmp	find

skip:
	lodsb
	or	al, al
	jz	skip
	mov	[edi], byte 0x20
	inc	edi
next:
	stosb
find:
	lodsb
	or	al, al
	jnz	next
	cmp	esi, edx
	jbe	skip
	mov	al, 0x0a
	stosb

	xor	ebx, ebx
	mov	edx, edi
	mov	al, 4
	sub	edx, ecx
	inc	bl
	int	0x80

exit:
	xor	eax, eax
	xor	ebx, ebx
	mov	al, 1
	int	0x80
Now this is getting nearer the theoretical minimum, in terms of both code size, memory use (four bytes of stack if no parameters, none otherwise), and number of CPU cycles used. It has no data segment, just 83 bytes of code; the x86 ELF32 executable comes to 316 bytes. It only does one syscall to output, and one to exit.

Now this one is as ugly as the posterior opening of a bird of prey, and as hacky as a rabid barbarian on mushrooms. But it should work, even for very large command line argument lists. (Up to kernel limits for me.)

What I am really interested in, is how are you going to measure the time taken? Wall clock? CPU cycles? Normal use, with just a few parameters, or with a huge parameter list?

I would personally first agree on a suitable test set, then run it a few dozen times to get rid of outliers (runs where other stuff running on the machine slowed the run down). I'd pick either the minimum time, or the most typical time.

Good luck,
 
Old 11-21-2011, 12:50 AM   #12
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,851
Blog Entries: 1

Rep: Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868Reputation: 1868
Quote:
Originally Posted by resetreset View Post
My KINGDOM for an assembler that would give you the cycle counts for each instruction after it assembles the program!
That would require a clairvoyant -- how else could the assembler guess what processor will be the program running on? These cycle-counts are different for every CPU-(sub)version.
 
Old 11-21-2011, 07:10 AM   #13
resetreset
Senior Member
 
Registered: Mar 2008
Location: Cyberspace
Distribution: Dynebolic, Ubuntu 10.10
Posts: 1,340

Rep: Reputation: 62
What kind of a register is r12d? Is it any relation to r12?

Sorry to sound like a FOOL, but I thought it's better to know, and get my doubts cleared up than simply REMAIN a fool. (I've never even *installed* a 64-bit Linux, let alone programmed in 64-bit asm!).

I have many more questions about the logic of the program, but I'll save those for later...



David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? Which country are you in, btw?


Nominal Animal, since you're in Finland - are you a member of the demoscene?
 
Old 11-21-2011, 05:35 PM   #14
David2010
Member
 
Registered: May 2009
Posts: 255

Original Poster
Rep: Reputation: 23
Quote:
Originally Posted by resetreset View Post
What kind of a register is r12d? Is it any relation to r12?

I have many more questions about the logic of the program, but I'll save those for later...

David, can I send you an email through this site? I already sent you a friend request, but I'd like to meet up with you on video chat, if that's possible? Which country are you in, btw?
I live in the USA. Sorry I don't have a webcam so no video chatting for me.

I visit this site all the time so a PM would be just fine. I added your friend request.

Once again I suck at documentation so I am sorry for the confusing logic. I actually presumed it was quiet readable but strings in nasm can be a little confusing.

I actually thought the Intel and AMD (not AS good) manuals were a godsend but then again I have a LOT of programmming experience.

The only thing I regret about switching to linux is a lack of "practical" examples for GUI work through XORG comparable to http://win32assembly.online.fr/tutorials.html

I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers. Even the assembly GCC outputs disturbs me but that is a completely different topic.

I always thought I was some sort of mental masochist to enjoy assembly programming and learning as many languages as I can. Vielleicht bin ich rechts :-)

This should hopefully explain your confusion about r12d. There are several more registers that I don't use to remain as compatible with older processors as possible.

Code:
Where an instruction requires a register value, it is already implicit in the encoding of the rest of the instruction what type of register is intended: an 8-bit general-purpose register, a segment register, a debug register, an MMX register, or whatever. Therefore there is no problem with registers of different types sharing an encoding value.

Please note that for the register classes listed below, the register extensions (REX) classes require the use of the REX prefix, in which is only available when in long mode on the x86-64 processor. This pretty much goes for any register that has a number higher than 7.

The encodings for the various classes of register are:

    8-bit general registers: AL is 0, CL is 1, DL is 2, BL is 3, AH is 4, CH is 5, DH is 6 and BH is 7. Please note that AH, BH, CH and DH are not addressable when using the REX prefix in long mode.
    8-bit general register extensions (REX): SPL is 4, BPL is 5, SIL is 6, DIL is 7, R8B is 8, R9B is 9, R10B is 10, R11B is 11, R12B is 12, R13B is 13, R14B is 14 and R15B is 15.
    16-bit general registers: AX is 0, CX is 1, DX is 2, BX is 3, SP is 4, BP is 5, SI is 6, and DI is 7.
    16-bit general register extensions (REX): R8W is 8, R9W is 9, R10w is 10, R11W is 11, R12W is 12, R13W is 13, R14W is 14 and R15W is 15.
    32-bit general registers: EAX is 0, ECX is 1, EDX is 2, EBX is 3, ESP is 4, EBP is 5, ESI is 6, and EDI is 7.
    32-bit general register extensions (REX): R8D is 8, R9D is 9, R10D is 10, R11D is 11, R12D is 12, R13D is 13, R14D is 14 and R15D is 15.
    64-bit general register extensions (REX): RAX is 0, RCX is 1, RDX is 2, RBX is 3, RSP is 4, RBP is 5, RSI is 6, RDI is 7, R8 is 8, R9 is 9, R10 is 10, R11 is 11, R12 is 12, R13 is 13, R14 is 14 and R15 is 15.
    Segment registers: ES is 0, CS is 1, SS is 2, DS is 3, FS is 4, and GS is 5.
    Floating-point registers: ST0 is 0, ST1 is 1, ST2 is 2, ST3 is 3, ST4 is 4, ST5 is 5, ST6 is 6, and ST7 is 7.
    64-bit MMX registers: MM0 is 0, MM1 is 1, MM2 is 2, MM3 is 3, MM4 is 4, MM5 is 5, MM6 is 6, and MM7 is 7.
    128-bit XMM (SSE) registers: XMM0 is 0, XMM1 is 1, XMM2 is 2, XMM3 is 3, XMM4 is 4, XMM5 is 5, XMM6 is 6 and XMM7 is 7.
    128-bit XMM (SSE) register extensions (REX): XMM8 is 8, XMM9 is 9, XMM10 is 10, XMM11 is 11, XMM12 is 12, XMM13 is 13, XMM14 is 14 and XMM15 is 15.
    Control registers: CR0 is 0, CR2 is 2, CR3 is 3, and CR4 is 4.
    Control register extensions: CR8 is 8.
    Debug registers: DR0 is 0, DR1 is 1, DR2 is 2, DR3 is 3, DR6 is 6, and DR7 is 7.
    Test registers: TR3 is 3, TR4 is 4, TR5 is 5, TR6 is 6, and TR7 is 7. 

(Note that wherever a register name contains a number, that number is also the register value for that register.)

Last edited by David2010; 11-21-2011 at 06:37 PM.
 
Old 11-23-2011, 06:51 AM   #15
resetreset
Senior Member
 
Registered: Mar 2008
Location: Cyberspace
Distribution: Dynebolic, Ubuntu 10.10
Posts: 1,340

Rep: Reputation: 62
Quote:
Originally Posted by David2010 View Post
I live in the USA. Sorry I don't have a webcam so no video chatting for me.

I visit this site all the time so a PM would be just fine. I added your friend request.
How about Google Talk then? It'd require me to go into Windoze, but I think that's the only VOIP program I have. Do you have a Google email acct?


I'm sending you a message through this site, check your messages.


Quote:
Originally Posted by David2010 View Post


I would love to create games in assembly using nasm. Its very rare to find decent assembly programmers.


You are not Mad at all! I would love to do the exact same thing, but a full 3D game in Asm is just TOO hard a proposition!
You should look into the demoscene - Google it and read the Wikipedia link, that'll tell you everything. Go to www.pouet.net for a site where demosceners hang out.



Quote:
Originally Posted by David2010 View Post


This should hopefully explain your confusion about r12d.


Actually it didn't - I repeat my question, is it a 32-bit register? What does "extension" mean?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep | xargs -I echo $(foo; bar; echo $(fee; fi; fo; fum)) == questionable output.. GrapefruiTgirl Programming 11 12-07-2010 07:02 PM
LXer: Mini Review: Open Source inHarvard Business Review LXer Syndicated Linux News 0 05-02-2008 05:10 AM
ls | echo, I got blank, why can't echo take the 2nd seat in a pipeline? elinuxqs Linux - Newbie 6 11-24-2006 08:25 AM
Kphone echo (echo echo) scabies Linux - Software 0 10-18-2004 02:59 PM
Echo /devPrinting doesn't work, echo /usb/lp0 works, Testpage works, Printing doesn't Hegemon Linux - General 3 08-15-2002 01:13 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration