LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-11-2005, 12:11 AM   #1
AquamaN
Member
 
Registered: Oct 2002
Location: Ohio, USA
Distribution: OS X 10.4.8, Ubuntu 6.10
Posts: 146

Rep: Reputation: 15
Hash Tables - Newb to These, simple questions


Hello, I am currently messing around with hash tables and I figured the best way to do so was with other peoples hash functions to see how to go about making my own. I seem to have hit a road bump though... I just want to write something that will take in words from a list (text doc) and then pass them to the other people's hashing functions but I am not sure... I'm new to the idea of hashing so this is a tad different. I made a pointer to the functions (pointer array, actually) and I am stuck after there as to what I should do. I know I need to generate a key value to be passed to these functions, but do I just do the modulus of the total amount of words? Also, looking at these other hasing functions it looks like they only take in the key value and not the actual word itself... I also want to track the number of collisions to see which hashing function would be best to model my own future hashing function after. Any insight would be helpful. Below is what I've managed to whip together in a few minutes.

Code:
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;


// http://www.cs.yorku.ca/~oz/hash.html
int hash2( char *str )
{
    unsigned int hash = 0;
    int c;
    while ( c = *str++ )
        hash += c;
    return hash;
}

// http://www.softlab.od.ua/algo/other/alg/node29.html
int hash3 ( char *str )
{
    int total = 0;
    while ( *str )
    {
        total += *str++;
    }
    return ( total % 256 );
}

/* Peter Weinberger's */
//http://www.sparknotes.com/cs/searching/hashtables/section2.rhtml
int hash4(char *s)
{
	char *p;
	unsigned int h, g;
	h = 0;
	for(p=s; *p!='\0'; p++){
		h = (h<<4) + *p;
		if (g = h&0xF0000000) {
			h ^= g>>24;
			h ^= g;
		}
	}
	return h % 211;
}

int main()
{
	ifstream inputText;
	vector<char> inputString;
	//vector<char> hashTbl;
	char * temp;
	typedef int (*hashValType)(char*);
	hashValType hashArray[5];

	hashArray[1] = &hash2;
	hashArray[2] = &hash3;
	hashArray[3] = &hash4;
	
	inputText.open("wordlist.txt");

	//take in words from file
	while(!inputText.eof())
	{
		inputText >> temp;
		inputString.push_back(*temp);
		cout << temp;
	}
	
       //is that correct syntax for the hash table?
	vector<int> hashTbl[113];
	
	int x, y;
       //loop for amount of hash tables
	for(x = 0; x < 5; x++)
	{
             //loop for words in list to put in each hash table individually
		for(y = 0; y < 50; y++)
		{
			
			//I should do the key value and pass to the functions in here
                     //something like hashArray[x] (key);
		}
	}
	
}
Thanks for your time,
aqua
 
Old 12-11-2005, 02:42 PM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
I'm not sure where you have a problem, in fact. The code looks rather correct.

I see one thing in it, however. You have three functions, each of them operated on a different range, so you can't compare them. If a hash function returns values from a wider range, you have (usually) less collisions etc.

Quote:
I know I need to generate a key value to be passed to these functions, but do I just do the modulus of the total amount of words?
Has function means you do *something* with the whole file and return one single value (int or something like that). Key is not required.
 
Old 12-11-2005, 10:24 PM   #3
AquamaN
Member
 
Registered: Oct 2002
Location: Ohio, USA
Distribution: OS X 10.4.8, Ubuntu 6.10
Posts: 146

Original Poster
Rep: Reputation: 15
Ok, I have made some changes to the program and now I run into a problem everytime one of the hash functions tries to access the key that was passed into it.

Code:
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

vector<int>::size_type hash1(char *key){
    vector<int>::size_type hash = 0;
    register unsigned long temp;
    while ( *key ){
        hash = ( hash << 4 ) + *key++;
        if ( (temp = ( hash & 0xf0000000 )) ){
            hash ^= ( temp >> 24 );
            hash ^= temp;
        }
    }
    return hash;
}

// http://www.softlab.od.ua/algo/other/alg/node29.html
vector< int>::size_type hash2 ( char *str )
{
    vector<int>::size_type total = 0;

    while ( *str )
    {
        total += *str++;
    }

    return ( total % 256 );
}

/* Peter Weinberger's */
//http://www.sparknotes.com/cs/searching/hashtables/section2.rhtml
vector< int>::size_type hash3(char *s)
{
	char *p;
	vector<int>::size_type h, g;
	h = 0;
	for(p=s; *p!='\0'; p++){
		h = (h<<4) + *p;
		if (g = h&0xF0000000) {
			h ^= g>>24;
			h ^= g;
		}
	}
	return h % 211;
}

int main()
{
	ifstream inputText;
	int tableSize = 113, collisions = 0, x;
	vector<char *> inputString;
	vector<int> hashTbl;
	vector<int> indexList;
	char * temp;
	vector<int>::size_type (*hashArray[5])(char *) = {hash1, hash2, hash3};

	inputText.open("mispro.txt");

	//take in words from file
	while(!inputText.eof())
	{
		temp = new char[64];
		inputText.getline(temp, 64);
		inputString.push_back(temp);
	}

	
	hashTbl.resize(tableSize);
	hashTbl.clear();
	for(x = 0; x < 112; x++)
		hashTbl[x] = 0;
	

	int y, hashIndex;
	for(x = 0; x < 3; x++)
	{
		for(y = 0; y < tableSize; y++)
		{
			hashIndex = hashArray[x](inputString[y]) % tableSize;
			//check to see if index is open, if not, go somewhere else
			//record collisions, decide to increase table size
			if(hashTbl[hashIndex] != 0)
			{
				collisions++;
				//deal with collisions
				
				hashIndex++;
				if(hashIndex == hashTbl.size())
				{
					hashIndex = 0;
				}
				
				while(hashTbl[hashIndex] != 0)
					hashIndex++;
					
				hashTbl[hashIndex] = 1;
					
			}
			else
				hashTbl[hashIndex] = 1;
				
		}
		cout << "Hash Function " << x + 1 << " had " << collisions << " many collisions." << endl;
		collisions = 0;
		hashTbl.clear();
		hashTbl.resize(tableSize);
	}
}
Right in the first while statement in the hash1 function, it segs on *key.... Any thoughts??
 
Old 12-12-2005, 04:53 PM   #4
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
It looks to me that your string doesn't end as expected (no zero at the end).
 
Old 12-12-2005, 09:23 PM   #5
kike_coello
Member
 
Registered: Jul 2005
Location: maryland
Distribution: Ubuntu 9.04
Posts: 88

Rep: Reputation: 17
Cool

what a coincidence, i gotta turn in a project on hashing functions tomorrow in my abstract data types class

anyways i just had to hash 100 elements read in from a file (positive numbers) and i was asked to use mod7, mod51 and mod151

what i did was i used 2 unsorted linked list arrays of 7 and 51 elements. and depending on the remainder (mod) i sent each number to a different list. it works great.

i also used a 151 element array to store the 100 integers with hash function "key & table_size" and if the index was already occupied i added 3 on my rehash function and modded again "(key + 3) % table_size" this worked good too.

i guess the solution really depends on what you wanna use. i think the unsorted linked list is best cause you can insert and remove nodes easily with the methods

let me know if you want the code to some of the functions if you decide on using unsorted linked lists or arrays, i can hook you up with the class files for the list so you can use the methods and stuff

later, peace out
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
svgalib, matrix & Hash Tables: programming tutorial yakkmeister Programming 6 10-11-2005 12:35 PM
hash tables library in c trutnev Programming 1 05-23-2005 03:16 PM
Simple Questions...Hopfully Simple Answers. caps_phisto Linux - General 3 12-21-2004 12:40 PM
Perl XML::Simple - Hash Question smaida Programming 2 05-26-2004 04:20 AM
Simple Newb Question: How do I check what version of Redhat is Installed from the CLI praefex Linux - General 1 01-27-2004 07:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration