in the world of risc or intel, given any OS, is there a method(s) to compare two values of large byte length for equality in an effort to get convergence in least amount of time?
as example, say i create a large flat file database that has two fields per row; string|md5-hash
this file contains the md5 hashes for all unique strings ranging in size 1-64 bytes long, and, each byte comes from pool of 256 ascii char.
now, i create a search agent, takes a md5 input and then searches the database for match(es) and returns the associated string.
so, would it be faster to search for subset indicators vs doing a grep (or similar) for the whole md5 input?
my very large database has this hash pair
is there a better way to build a "converging search" that would converge faster than just searching for the whole input string?
what about adding fields to the database while hash is being created, like this:
where 4byte-value is a weird "hex-to-decimal" conversion of each sequential 4byte hex chars
so, break up 7B8B965AD4BCA0E41AB51DE7B31363A1 to
7B8B 965A D4BC A0E4 1AB5 1DE7 B313 63A1
then add each char in each group by converting hex chars to decimal and adding
7B8B = 7+11+8+11 = 37
965A = 9+6+5+10 = 30
D4BC = 13+4+11+12 = 40
A0E4 = 10+0+14+4 = 28
1AB5 = 1+10+11+5 = 27
1DE7 = 1+13+14+7 = 35
B313 = 11+3+1+3 = 18
63A1 = 6+3+10+1 = 20
so now each line in my db looks something like this
so now when i start my search i break down the input into these eight segments, calc the segment values, then try to 1st match 1st input segment value to the 1st field of a line in db, and if a match exists continue to try and match 2nd segment, etc etc. if a field doesnt match then move on to the next line starting over from segment one, and if all match then i have a hit that can be added to the "hit pool".
as you can see, the byte sizes in question have been reduced in size (matching eight values of 2 bytes each vs 32 bytes of md5 hash), but i am not sure if such a method would speed things up or slow them down. i am only interested in speeding up the search, creating the db is not in question.
noted: i have increased the collision pool with false-negatives using this method. it might not make sense in this method to compare the input hash against db hash after all segment matches were true, etc, if i did then using this method is of no value, etc. i guess if this method was way faster then simple grep and the match was way way down in the file then possibly its faster ?? eg; if this method can reach the bottom 1/3 of the large db file 10x faster than grep alone, then i have time to compare input hash to actual db hash and still being faster than grep alone, etc.