Welcome to the most active Linux Forum on the web.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 04-04-2004, 06:01 AM   #1
Registered: Nov 2003
Posts: 55

Rep: Reputation: 15
Hash-Function (string to int)

I have a dictionary of 114,648 words (ranging from 1 character to 69 characters per word).

I need a hash-function (in C) that takes a word as input and returns a 'long' (or an 'int') !!

[Could I find a hash-function that does not assign the same number to more than two words?]

Thank you for your time.
Old 04-04-2004, 02:02 PM   #2
Registered: Feb 2004
Location: Lincoln, UK
Distribution: red hat 9
Posts: 67

Rep: Reputation: 15
thats an interesting one...

Are the words just going to use normal letters [A-Z] and does it need to be case - sensitive??
Old 04-04-2004, 04:51 PM   #3
Registered: Feb 2004
Location: Lincoln, UK
Distribution: red hat 9
Posts: 67

Rep: Reputation: 15
try this as a long shot:-

int hashValue;
char* myWord;


no idea if it will work as I havn't access to a compiler at the moment and Im no big C expert but worth a try perhaps??
Old 04-04-2004, 08:04 PM   #4
LQ Guru
Registered: Feb 2003
Location: Colorado Springs, CO
Distribution: Gentoo
Posts: 2,018

Rep: Reputation: 48
I don't know if there's any way to guarantee that two words won't hash to the same value, but you can make it pretty unlikely by choosing a hash function that's pretty well spread out, and making sure you have a workaround in case you do get a duplicate hash (such as choosing the next value in sequence or whatever - not sure what you're planning to do with this).

Anyhow, a google search for 'hash functions' will give you plenty of material to look at.
Old 04-04-2004, 10:46 PM   #5
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
Given that a word can be up to 69 characters long, the number of possible words is greater than 26^69, so there's no way to build a function which given a word will return a 8 byte (or less) value unique to that word.

For hashing function, there are many algorithms which try to produce very well distributed output, but your program would probably work fine with something lazy like adding the ascii values of all the characters in the word together. (I'm sure this is a horrible way to hash ascii text, but my point is that it probably won't be noticeable ) The typical way to use this is to have linked lists for each group of words which hash to the same value. The lists will be pretty short.
Old 04-05-2004, 02:53 AM   #6
Registered: Jul 2003
Location: bangalore . india
Distribution: openSUSE 10.3
Posts: 251

Rep: Reputation: 33
there is a pretty simple hash function in K&R that takes the string as input
Chapter 6, section 6.6.

unsigned hash(char *s)
unsigned hashval;

for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Create MD5 hash from string trouby Linux - General 9 02-07-2012 08:08 PM
converting a int to string irfanhab Programming 6 07-30-2005 10:40 PM
C++ std::string to int Slaxx Programming 1 10-30-2004 11:03 PM
string to int BadTaste Programming 11 08-18-2004 11:23 AM
string to int in C h/w Programming 2 12-05-2003 04:47 PM

All times are GMT -5. The time now is 12:08 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration