LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-04-2004, 05:01 AM   #1
Hady
Member
 
Registered: Nov 2003
Posts: 55

Rep: Reputation: 15
Hash-Function (string to int)


I have a dictionary of 114,648 words (ranging from 1 character to 69 characters per word).

I need a hash-function (in C) that takes a word as input and returns a 'long' (or an 'int') !!

[Could I find a hash-function that does not assign the same number to more than two words?]

Thank you for your time.
 
Old 04-04-2004, 01:02 PM   #2
woodywellhung
Member
 
Registered: Feb 2004
Location: Lincoln, UK
Distribution: red hat 9
Posts: 67

Rep: Reputation: 15
thats an interesting one...

Are the words just going to use normal letters [A-Z] and does it need to be case - sensitive??
 
Old 04-04-2004, 03:51 PM   #3
woodywellhung
Member
 
Registered: Feb 2004
Location: Lincoln, UK
Distribution: red hat 9
Posts: 67

Rep: Reputation: 15
try this as a long shot:-

int hashValue;
char* myWord;

hashValue=strcmp(myWord,NULL);

no idea if it will work as I havn't access to a compiler at the moment and Im no big C expert but worth a try perhaps??
 
Old 04-04-2004, 07:04 PM   #4
wapcaplet
Guru
 
Registered: Feb 2003
Location: Colorado Springs, CO
Distribution: Gentoo
Posts: 2,018

Rep: Reputation: 48
I don't know if there's any way to guarantee that two words won't hash to the same value, but you can make it pretty unlikely by choosing a hash function that's pretty well spread out, and making sure you have a workaround in case you do get a duplicate hash (such as choosing the next value in sequence or whatever - not sure what you're planning to do with this).

Anyhow, a google search for 'hash functions' will give you plenty of material to look at.
 
Old 04-04-2004, 09:46 PM   #5
aluser
Member
 
Registered: Mar 2004
Location: Massachusetts
Distribution: Debian
Posts: 557

Rep: Reputation: 42
Given that a word can be up to 69 characters long, the number of possible words is greater than 26^69, so there's no way to build a function which given a word will return a 8 byte (or less) value unique to that word.

For hashing function, there are many algorithms which try to produce very well distributed output, but your program would probably work fine with something lazy like adding the ascii values of all the characters in the word together. (I'm sure this is a horrible way to hash ascii text, but my point is that it probably won't be noticeable ) The typical way to use this is to have linked lists for each group of words which hash to the same value. The lists will be pretty short.
 
Old 04-05-2004, 01:53 AM   #6
shishir
Member
 
Registered: Jul 2003
Location: bangalore . india
Distribution: openSUSE 10.3
Posts: 251

Rep: Reputation: 33
there is a pretty simple hash function in K&R that takes the string as input
Chapter 6, section 6.6.


unsigned hash(char *s)
{
unsigned hashval;

for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Create MD5 hash from string trouby Linux - General 9 02-07-2012 07:08 PM
converting a int to string irfanhab Programming 6 07-30-2005 09:40 PM
C++ std::string to int Slaxx Programming 1 10-30-2004 10:03 PM
string to int BadTaste Programming 11 08-18-2004 10:23 AM
string to int in C h/w Programming 2 12-05-2003 03:47 PM


All times are GMT -5. The time now is 03:24 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration