[SOLVED] anyone willing to find bug in my nasm program?

rblampain · 11-02-2017, 04:20 AM

Bug should be in the first 78 lines of code since the remaining code (words-swapping) seems to work flawlessly. Only need to copy-and-paste to test.
The program makes 240 instances of swapping words (in byte order) and then unexpectedly finds r15 to be 0 and exit with contents only partly sorted.
I just cannot find what the bug is.

This is a "bublesort" of a sentence in foreign languages, in HTML submitted by a visitor in a Wiki, it is intended to find if the sentence is mostly correct text against the foreign list of words (byte-sorted dictionary). The sort helps to access the dictionary only once rather than once for each submitted word since both files are sorted in the same way.
In the previous "ununcoding" phase, each word is prepended by a byte indicating its length (including this "length byte" used to reduce the dictionary comparisons).

I must say this is assembly learned many decades ago and not much practical experience, I have poor experience with any debugger, I do not know any "C" programming and any other programming language is much too slow in comparing words against a "dictionary". A number of small portions of code are repeated to avoid unnecessary jumps or calls and accented letters (UTF-8) (absent in this test file) should make no difference.

Thank you for your help.

Code:

BITS 64
global main
extern printf
section .text
main:
r10 may be different on your machine, it is address of WordsListOrg +206 (last character of input string) it needs to be set manually here	
;PHASE2
sortWords:                  ; jump here from phase1
    xor r8,r8
    xor r9,r9
	xor r11,r11
	xor r12,r12
    xor r13,r13
    xor r14,r14
	xor rax,rax
	xor rcx,rcx
	xor rdx,rdx
    mov r15,1               ; must be positive to start   
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; words in WordsListOrg - compare word 1 to word 2 - to word n-1 to word n     ;;
;; sort is complete when there is no more swapping in a pass:  r15=0            ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
sortStart:
    cmp r15,0               ; flag can only be 0 or 1
    je PHASE3
    xor r15,r15             ; reset flag
	mov rsi,WordsListOrg	; pointer to beginning of string each preceded by its length byte
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; find addresses of word0 and word1 ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
outerLoop:
; load some registers
    mov r13b,[rsi]	; length word0 saved
    push rsi                ; EOF checked when word0 was word1
    push rsi                ;
    pop r11                 ; save rsi which is going to be clobbered
    pop rdi                 ; for address argB (word1)
; calculate and save address of argB
    mov al,[rdi]            ; length of word0 as rdi = rsi at this stage
    add rdi,rax             ; + length argA = address argB
; check EOF
    cmp rdi,r10
	jae sortStart 	        ; will be ja as r10 points to position of last character
    push rdi                ; save address word1
    pop r12                 ; save rdi which is going to be clobbered
    mov r14b,[rdi]	    ; length word1
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; determine how many characters 2 compare according to shortest of the 2 words ;;
;; if a word is 4 characters, length will be 5 due to added length byte         ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;char2compare:
    push r13                ; assumed to be shorter
    pop rcx                 ; conditionally becomes count
    cmp r13b,r14b			; find which is shortest
    jbe compare             ; keep cl - word pointed by rsi is shorter or equal in length
    push r14                ; word pointed by rdi is shorter
    pop rcx			        ; r14b becomes count
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; compare two consecutive words to find which is alphabetically first ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; innerLoop					;
compare:			        ; 
	dec cl                  ; 
    jrcxz .doWeSwap
	inc rsi					; at first iteration skip length byte argA and point to next character
	inc rdi					; at first iteration skip length byte argB and point to next character
	mov al,[rsi]			; compare letter word list 1 (argA)
	mov bl,[rdi]			; compare letter word list 2 (argB)
	cmp al,bl               ; ja & jb far more common than je
	ja .swapWords			; argB < argA - must swap
	je compare				; characters are = - need check next character pair - (much rarer)
; calculate address of next argA
    push r12                ; continue if jb - alphabetically in order
    pop rsi                 ; word1 becomes word0
    jmp outerLoop           ; continue with comparison next word
.doWeSwap:                  ; one word is a part of the other
    cmp r13b,r14b           ; if length word0 < length word1
    jbe outerLoop           ; else continue with swap words
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; swap words because word1 alphabetically smaller than word0 ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.swapWords:				; swap addresses in bubble sort
    mov r15,1               ; set flag swapping is taking place
    xor rcx,rcx
    mov rdx,words2swap      ; write buffer for words exchange including their length byte
    push r12                ; save word1 to buffer
    pop rdi                 ; saved address of word1
    mov cl,[rdi]            ; length word1 is count
    inc rcx                 ; because loop start with dec
.copyWord1:
    dec rcx                 ; count of characters
    jrcxz .word9
    mov al,[rdi]            ; 1st iteration move length 
    mov [rdx],al            ; moved
    inc rdx                 ; ready for next
    inc rdi
    jmp .copyWord1
.word9:
    push r11                ; saved address of word0
    pop rsi
    mov cl,[rsi]            ; length of word0 is count
    inc rcx                 ; because loop start with dec
; save word0 to buffer
.copyWord0:
    dec rcx                 ; count of characters
    jrcxz .word8
    mov al,[rsi]            ; 1st iteration move length or word
    mov [rdx],al            ; moved
    inc rdx                 ; ready for next
    inc rsi
    jmp .copyWord0
; length of what to overwrite in file is length of arg0 + length of arg1
.word8:
    push r11                ; saved address of word0 is start of target zone
    pop rdi                 ; restore that address in register
    mov rdx,words2swap      ; buffer to read for exchange
    mov rcx,r13             ; length word0 plus
    add rcx,r14             ; length word1 = copy count
    inc rcx                 ; because loop start with dec
; buffer of swapped words overwrite zone in file
.swapInFile:
    dec rcx                 ; count of characters
    jrcxz .word7
    mov al,[rdx]            ; reading byte
    mov [rdi],al            ; writing byte
    inc rdi
    inc rdx
    jmp .swapInFile              ; go for next
; correct address of next word0
.word7:                     ; recalculate changed parameters: len w0 & w1, address w1
;call dsplswap	this showed swapping has occured 240 times
    push r11                ; address word0 is unchanged
    pop rsi                 ; but its length may have changed
    mov r13b,[rsi]          ; saved length word0
    add rsi,r13             ; address word1 becomes next word0
    jmp outerLoop        ; & continue
;=========================================================================================
PHASE3:
exit:
	mov rax,60            		; sys_exit
	mov rdi,0             		; 0
	syscall
;=========================================================================================
section .data
words2swap:	        times 64 db 0           ; buffer for 2 words to be swapped maxlen 32 ea
;WordsListOrg:		times 384	db	0       ; list of words preceded by their length byte
WordsListOrg:   db 3,106,101,3,110,101,5,118,111,105,115,4,112,97,115,9,112,111,117,114,113,117,111,105,5,110,111,117,115,7,100,101,118
                db 111,110,115,11,115,105,109,112,108,101,109,101,110,116,9,114,101,103,97,114,100,101,114,3,117,110,5,112,97,121,115
                db 8,100,101,118,101,110,105,114,11,99,111,109,109,117,110,105,115,116,101,2,97,6,99,97,117,115,101,3,100,101,3,108,97
                db 10,115,116,117,112,105,100,105,116,101,3,100,101,4,115,111,110,7,112,101,117,112,108,101,2,99,4,101,115,116,9,98,101
                db 97,117,99,111,117,112,5,116,114,111,112,10,105,109,112,111,114,116,97,110,116,5,112,111,117,114,8,108,97,105,115,115
                db 101,114,4,108,101,115,8,99,104,105,108,101,97,110,8,118,111,116,101,117,114,115,8,100,101,99,105,100,101,114,5,112
                db 111,117,114,4,101,117,120,6,109,101,109,101,115,0,0,0,0,0,0,0,0,0
;WordsListOrg is 207 positions in buffer of 216 positions

ntubski · 11-03-2017, 04:28 AM

Quote:

Originally Posted by rblampain

Bug should be in the first 78 lines of code since the remaining code (words-swapping) seems to work flawlessly.

How about you start by posting a 78 line version of the code showing the bug then?

Quote:

any other programming language is much too slow in comparing words against a "dictionary".

I just can't bring myself to believe that. Maybe the real problem is that you are using bubblesort.

rblampain · 11-03-2017, 11:22 PM

Thank you for the answer.

I should have said: "any other programming language I KNOW (interpreted) is much too slow.." (this is voluntary work). My apologies for the lack of clarity.
The program does not produce any output as such, there is a 207 character string in the data section which is a quote by Henry Kissinger translated in French (missing accents though), the program sorts these words "alphabetically" (byte order) within the string, so if the initial string is:

Quote:

5this3is4the7string

the same string at the completion of the sort should be

Quote:

3is7string4the5this

The previous phase of the program converts all words to lower-case and an average sentence length will be around 10 to 20 words, not the 35 words in this test, hence the lack of necessity to investigate (learn) a more complicated sorting option, I know bubble sort is highly innefficient compared to those. Bubble sort has the advantage that 2 words swapped occupy the same space even with their prepended length expressed in 1 byte.
I have continued in vain to isolate the bug and am now getting a worse result, 200 iterations instead of 240 initially (same testing string), I am beginning to wonder if the bug could be in the way I present the different sections of code to the assembler. How I check the result is by displaying the text only of the string before and after the sort with a small BASIC program as I could not find how to do that with the debugger, One extra difficulty you may have if testing is the fact that the "length byte" contains a non-printable value (as a character) for "printf" (1 to 31 in theory but 1 to around 12 in most cases),
What I have found is that the program terminates having a lower-case character value in a register that should only contain a word-length value and the r15 flag is set to 0.

ntubski · 11-04-2017, 10:12 AM

Quote:

Originally Posted by rblampain

I have continued in vain to isolate the bug and am now getting a worse result, 200 iterations instead of 240 initially (same testing string), I am beginning to wonder if the bug could be in the way I present the different sections of code to the assembler. How I check the result is by displaying the text only of the string before and after the sort with a small BASIC program as I could not find how to do that with the debugger

Assuming gdb

Code:

p/c $rax ; show the contents of the rax register as a character
p/d $rax ; show the contents of the rax register as a decimal number
p/x $rax ; show the contents of the rax register as a hex number
x/bc $rax ; show byte at memory pointed to by rax register as a character
x/bd $rax ; show byte at memory pointed to by rax register as a decimal number
x/bx $rax ; show byte at memory pointed to by rax register as a hex number

I suggest testing with shorter word lists first. Start from something which works; a list of a single word (that does work, right?). Then two words. Check if using different permuations affects things.

I have a couple of questions about your code:

Quote:

Code:

    push rsi                ; EOF checked when word0 was word1
    push rsi                ;
    pop r11                 ; save rsi which is going to be clobbered
    pop rdi                 ; for address argB (word1)

Is there some reason not to use mov reg, reg?

Code:

mov r11, rsi
mov rdi, rsi

Quote:

Code:

WordsListOrg:   db 3,106,101,3,110,101,5,118,111,105,115,4,112,97,115,9,112,111,117,114,113,117,111,105,5,110,111,117,115,7,100,101,118

What assembler are you using, and does it not support an ascii notation:

Code:

WordsListOrg:   db 3,'j','e',3,'n','e',5,[etc...]

Quote:

Code:

main:
;; r10 may be different on your machine, it is address of WordsListOrg +206 (last character of input string) it needs to be set manually here

How about doing

Code:

main:
mov r10, WordListEnd
[...]
WordsListOrg:   db 3,'j','e',3,'n','e',5,
                [...]
WordListEnd:    db 0,0,0,0,0,0,0,0,0

dugan · 11-04-2017, 10:17 AM

Quote:

Originally Posted by rblampain

I do not know any "C" programming and any other programming language is much too slow in comparing words against a "dictionary".

Quote:

Originally Posted by rblampain

I should have said: "any other programming language I KNOW (interpreted) is much too slow.." (this is voluntary work)...

Really?

What platform is this going to run on, exactly?

rblampain · 11-05-2017, 12:42 AM

Thank you for the tips, very useful.

Quote:

Is there some reason not to use mov reg, reg?

I was under the impression a push and pop was a fraction faster.

Quote:

What assembler are you using, and does it not support an ascii notation:

nasm 2.10.01-1+deb7u1
Although the test is on pure ASCII, the program is intended to be used on UTF-8, lots of it in 2 bytes once outside of the Latin alphabets.

Quote:

What platform is this going to run on, exactly?

Not sure yet, going to be hosted, final decision depends on what is available when ready but most likely to be "entry level" with easy option to grow, definitely Linux, if possible same distro than home machine. There seems to be a wide variety of processors used by hosting companies. Program may have to be recompiled/tested on server then.

ntubski · 11-05-2017, 07:21 AM

Quote:

Originally Posted by rblampain

I was under the impression a push and pop was a fraction faster.

I'm under the opposite impression, but I doubt it makes any significant difference.

https://en.wikiquote.org/wiki/Donald_Knuth

Quote:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

dugan · 11-05-2017, 08:00 AM

Quote:

Originally Posted by rblampain

There seems to be a wide variety of processors used by hosting companies.

Use an interpreted language. The speed difference will be literally zero.

Either that, or the interpreted language will be faster due to its more efficient dictionary implementation (likely).

The decades-old "interpreted languages are too slow!" dogma is still true if you're targeting an arduino.

rblampain · 11-06-2017, 11:36 PM

It seems my old conceptions were completely wrong, thank you for picking that up and demonstrating its not a personal view.