Hashing for ID generation (hash32_enum algorithm and ASM)
Posted 12-24-2012 at 10:45 AM by rainbowsally
You're not going to get fool-proof uniqueness from a 32 bit hash, but this one gets pretty close.
[cont'd from previous blog entry where the C/C++ version can be seen (and tested). http://www.linuxquestions.org/questi...-lists-35223/]
If you're curious about what the hash32_enum thing in the previous blog entry would look like in assembler, this is the output using the --save-temps gcc flag.
It's hard to optimize much better than this, generated with the -O2 flag in gcc.
And this is the C code optimized in the section below "no align". GCC eliminated a conditional branch, potentially saving several clock cycles and is the furthest thing from intuitive you'd ever hope to see! :-)
And the alignment code?
I stripped it out but one, two, three, up to 5 bytes can be traversed in a single opcode, usually involving reserved registers like esi, edi, or even ezi, whatever that is.
There used to be a web page called 'SafeArith' (couldn't find it last time I looked) that had lots of cool algorithms for eliminating conditional branches, like for clamping numbers to within certain ranges, which is great for audio clipping -- because you don't want your sign bit to flip and you need it to be fast!. But I don't do much assembler programming anymore because GCC is so freaking HOT! :-)
Note: We have played with another algorithm for 32 bit hash IDs that runs even faster than this one, which is probably better for string lookups where you'll have to strcmp() after prefiltering the possibilities anyway, but this one in tests got zero duplicate hash IDs for 6,000 random words even with three bits masked off while the faster one got several duplicate IDs for the same word set.
This is a good one.
The tests are in the latest libLQ-qt-mc2 d/load.
[cont'd from previous blog entry where the C/C++ version can be seen (and tested). http://www.linuxquestions.org/questi...-lists-35223/]
If you're curious about what the hash32_enum thing in the previous blog entry would look like in assembler, this is the output using the --save-temps gcc flag.
Code:
; extern "C" ; .text ; .align 4 ; .globl hash32_enum ; .type hash32_enum, @function hash32_enum: pushl %ebp ; save stack frame xorl %eax, %eax ; eax.rval = 0 movl %esp, %ebp ; new stack fram movl 8(%ebp), %ecx ; ecx.str = str testl %ecx, %ecx ; if ecx.str != 0 continue to jne .L9 ; L9 (init loop) ; else fall through ; .align 4 .L3: ; return popl %ebp ; restore old stack frame ret ; and return ; .align 4 .L9: ; init the loop movzbl (%ecx), %edx ; edx.int = *ecx.str movl $-1, %eax ; eax.rval = -1 testb %dl, %dl ; if edx.char = 0 je .L3 ; then goto return movl $-2128831035, %eax ; (else) rval = magic # ; ; .align 4 .L4: ; -> top of loop movsbl %dl, %edx ; edx.int = edx.char addl $1, %ecx ; ecx.str++; xorl %eax, %edx ; edx.int = edx.int ^ eax.rval imull $16777619, %edx, %eax ; eax.rval = eax.rval * edx.int movzbl (%ecx), %edx ; edx.char = *ecx.str testb %dl, %dl ; if edx.char != 0 jne .L4 ; <- then loop to .L4 ; else fall through ; no align movl %eax, %edx ; edx.int = eax.rval sarl $31, %edx ; if edx.int < 0 then edx.int = -1 else edx.int = 0 xorl %edx, %eax ; eax.rval = eax.rval ^ edx.int notl %eax ; eax.rval = ~ eax.rval popl %ebp ; restore old stack frame ret ; return
Code:
if(rval < 0) return rval else return ~ rval
And the alignment code?
I stripped it out but one, two, three, up to 5 bytes can be traversed in a single opcode, usually involving reserved registers like esi, edi, or even ezi, whatever that is.
There used to be a web page called 'SafeArith' (couldn't find it last time I looked) that had lots of cool algorithms for eliminating conditional branches, like for clamping numbers to within certain ranges, which is great for audio clipping -- because you don't want your sign bit to flip and you need it to be fast!. But I don't do much assembler programming anymore because GCC is so freaking HOT! :-)
Note: We have played with another algorithm for 32 bit hash IDs that runs even faster than this one, which is probably better for string lookups where you'll have to strcmp() after prefiltering the possibilities anyway, but this one in tests got zero duplicate hash IDs for 6,000 random words even with three bits masked off while the faster one got several duplicate IDs for the same word set.
This is a good one.
The tests are in the latest libLQ-qt-mc2 d/load.
Total Comments 0