Hashing for ID generation (hash32_enum algorithm and ASM)

Posted 12-24-2012 at 10:45 AM by rainbowsally

You're not going to get fool-proof uniqueness from a 32 bit hash, but this one gets pretty close.

[cont'd from previous blog entry where the C/C++ version can be seen (and tested). http://www.linuxquestions.org/questi...-lists-35223/]

If you're curious about what the hash32_enum thing in the previous blog entry would look like in assembler, this is the output using the --save-temps gcc flag.

Code:

; extern "C"
; .text
; .align 4
; .globl hash32_enum
; .type hash32_enum, @function
hash32_enum:
  pushl %ebp                    ; save stack frame
  xorl  %eax, %eax              ; eax.rval = 0
  movl  %esp, %ebp              ; new stack fram
  movl  8(%ebp), %ecx           ; ecx.str = str
  testl %ecx, %ecx              ; if ecx.str != 0 continue to
  jne .L9                       ; L9 (init loop)
                                ; else fall through
; .align 4
.L3:                            ; return
  popl  %ebp                    ; restore old stack frame
  ret                           ; and return

; .align 4
.L9:                            ; init the loop
  movzbl  (%ecx), %edx          ; edx.int = *ecx.str
  movl  $-1, %eax               ; eax.rval = -1
  testb %dl, %dl                ; if edx.char = 0
  je  .L3                       ; then goto return
  movl  $-2128831035, %eax      ; (else) rval = magic #
                                ; 
; .align 4
.L4:                            ; -> top of loop
  movsbl  %dl, %edx             ; edx.int = edx.char
  addl  $1, %ecx                ; ecx.str++;
  xorl  %eax, %edx              ; edx.int = edx.int ^ eax.rval
  imull $16777619, %edx, %eax   ; eax.rval = eax.rval * edx.int
  movzbl  (%ecx), %edx          ; edx.char = *ecx.str
  testb %dl, %dl                ; if edx.char != 0
  jne .L4                       ; <- then loop to .L4
                                ; else fall through
; no align
  movl  %eax, %edx              ; edx.int = eax.rval
  sarl  $31, %edx               ; if edx.int < 0 then edx.int = -1 else edx.int = 0
  xorl  %edx, %eax              ; eax.rval = eax.rval ^ edx.int
  notl  %eax                    ; eax.rval = ~ eax.rval
  popl  %ebp                    ; restore old stack frame
  ret                           ; return

It's hard to optimize much better than this, generated with the -O2 flag in gcc.

Code:

if(rval < 0)
  return rval
else 
  return ~ rval

And this is the C code optimized in the section below "no align". GCC eliminated a conditional branch, potentially saving several clock cycles and is the furthest thing from intuitive you'd ever hope to see! :-)

And the alignment code?

I stripped it out but one, two, three, up to 5 bytes can be traversed in a single opcode, usually involving reserved registers like esi, edi, or even ezi, whatever that is.

There used to be a web page called 'SafeArith' (couldn't find it last time I looked) that had lots of cool algorithms for eliminating conditional branches, like for clamping numbers to within certain ranges, which is great for audio clipping -- because you don't want your sign bit to flip and you need it to be fast!. But I don't do much assembler programming anymore because GCC is so freaking HOT! :-)

Note: We have played with another algorithm for 32 bit hash IDs that runs even faster than this one, which is probably better for string lookups where you'll have to strcmp() after prefiltering the possibilities anyway, but this one in tests got zero duplicate hash IDs for 6,000 random words even with three bits masked off while the faster one got several duplicate IDs for the same word set.

This is a good one.

The tests are in the latest libLQ-qt-mc2 d/load.

Posted in Uncategorized

Views 797 Comments 0

« Prev Main Next »

Total Comments 0