LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Hash Tables - Newb to These, simple questions (https://www.linuxquestions.org/questions/programming-9/hash-tables-newb-to-these-simple-questions-391437/)

AquamaN 12-11-2005 12:11 AM

Hash Tables - Newb to These, simple questions
 
Hello, I am currently messing around with hash tables and I figured the best way to do so was with other peoples hash functions to see how to go about making my own. I seem to have hit a road bump though... I just want to write something that will take in words from a list (text doc) and then pass them to the other people's hashing functions but I am not sure... I'm new to the idea of hashing so this is a tad different. I made a pointer to the functions (pointer array, actually) and I am stuck after there as to what I should do. I know I need to generate a key value to be passed to these functions, but do I just do the modulus of the total amount of words? Also, looking at these other hasing functions it looks like they only take in the key value and not the actual word itself... I also want to track the number of collisions to see which hashing function would be best to model my own future hashing function after. Any insight would be helpful. Below is what I've managed to whip together in a few minutes.

Code:

#include <iostream>
#include <fstream>
#include <vector>

using namespace std;


// http://www.cs.yorku.ca/~oz/hash.html
int hash2( char *str )
{
    unsigned int hash = 0;
    int c;
    while ( c = *str++ )
        hash += c;
    return hash;
}

// http://www.softlab.od.ua/algo/other/alg/node29.html
int hash3 ( char *str )
{
    int total = 0;
    while ( *str )
    {
        total += *str++;
    }
    return ( total % 256 );
}

/* Peter Weinberger's */
//http://www.sparknotes.com/cs/searching/hashtables/section2.rhtml
int hash4(char *s)
{
        char *p;
        unsigned int h, g;
        h = 0;
        for(p=s; *p!='\0'; p++){
                h = (h<<4) + *p;
                if (g = h&0xF0000000) {
                        h ^= g>>24;
                        h ^= g;
                }
        }
        return h % 211;
}

int main()
{
        ifstream inputText;
        vector<char> inputString;
        //vector<char> hashTbl;
        char * temp;
        typedef int (*hashValType)(char*);
        hashValType hashArray[5];

        hashArray[1] = &hash2;
        hashArray[2] = &hash3;
        hashArray[3] = &hash4;
       
        inputText.open("wordlist.txt");

        //take in words from file
        while(!inputText.eof())
        {
                inputText >> temp;
                inputString.push_back(*temp);
                cout << temp;
        }
       
      //is that correct syntax for the hash table?
        vector<int> hashTbl[113];
       
        int x, y;
      //loop for amount of hash tables
        for(x = 0; x < 5; x++)
        {
            //loop for words in list to put in each hash table individually
                for(y = 0; y < 50; y++)
                {
                       
                        //I should do the key value and pass to the functions in here
                    //something like hashArray[x] (key);
                }
        }
       
}

Thanks for your time,
aqua

Mara 12-11-2005 02:42 PM

I'm not sure where you have a problem, in fact. The code looks rather correct.

I see one thing in it, however. You have three functions, each of them operated on a different range, so you can't compare them. If a hash function returns values from a wider range, you have (usually) less collisions etc.

Quote:

I know I need to generate a key value to be passed to these functions, but do I just do the modulus of the total amount of words?
Has function means you do *something* with the whole file and return one single value (int or something like that). Key is not required.

AquamaN 12-11-2005 10:24 PM

Ok, I have made some changes to the program and now I run into a problem everytime one of the hash functions tries to access the key that was passed into it.

Code:

#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

vector<int>::size_type hash1(char *key){
    vector<int>::size_type hash = 0;
    register unsigned long temp;
    while ( *key ){
        hash = ( hash << 4 ) + *key++;
        if ( (temp = ( hash & 0xf0000000 )) ){
            hash ^= ( temp >> 24 );
            hash ^= temp;
        }
    }
    return hash;
}

// http://www.softlab.od.ua/algo/other/alg/node29.html
vector< int>::size_type hash2 ( char *str )
{
    vector<int>::size_type total = 0;

    while ( *str )
    {
        total += *str++;
    }

    return ( total % 256 );
}

/* Peter Weinberger's */
//http://www.sparknotes.com/cs/searching/hashtables/section2.rhtml
vector< int>::size_type hash3(char *s)
{
        char *p;
        vector<int>::size_type h, g;
        h = 0;
        for(p=s; *p!='\0'; p++){
                h = (h<<4) + *p;
                if (g = h&0xF0000000) {
                        h ^= g>>24;
                        h ^= g;
                }
        }
        return h % 211;
}

int main()
{
        ifstream inputText;
        int tableSize = 113, collisions = 0, x;
        vector<char *> inputString;
        vector<int> hashTbl;
        vector<int> indexList;
        char * temp;
        vector<int>::size_type (*hashArray[5])(char *) = {hash1, hash2, hash3};

        inputText.open("mispro.txt");

        //take in words from file
        while(!inputText.eof())
        {
                temp = new char[64];
                inputText.getline(temp, 64);
                inputString.push_back(temp);
        }

       
        hashTbl.resize(tableSize);
        hashTbl.clear();
        for(x = 0; x < 112; x++)
                hashTbl[x] = 0;
       

        int y, hashIndex;
        for(x = 0; x < 3; x++)
        {
                for(y = 0; y < tableSize; y++)
                {
                        hashIndex = hashArray[x](inputString[y]) % tableSize;
                        //check to see if index is open, if not, go somewhere else
                        //record collisions, decide to increase table size
                        if(hashTbl[hashIndex] != 0)
                        {
                                collisions++;
                                //deal with collisions
                               
                                hashIndex++;
                                if(hashIndex == hashTbl.size())
                                {
                                        hashIndex = 0;
                                }
                               
                                while(hashTbl[hashIndex] != 0)
                                        hashIndex++;
                                       
                                hashTbl[hashIndex] = 1;
                                       
                        }
                        else
                                hashTbl[hashIndex] = 1;
                               
                }
                cout << "Hash Function " << x + 1 << " had " << collisions << " many collisions." << endl;
                collisions = 0;
                hashTbl.clear();
                hashTbl.resize(tableSize);
        }
}

Right in the first while statement in the hash1 function, it segs on *key.... Any thoughts??

Mara 12-12-2005 04:53 PM

It looks to me that your string doesn't end as expected (no zero at the end).

kike_coello 12-12-2005 09:23 PM

what a coincidence, i gotta turn in a project on hashing functions tomorrow in my abstract data types class

anyways i just had to hash 100 elements read in from a file (positive numbers) and i was asked to use mod7, mod51 and mod151

what i did was i used 2 unsorted linked list arrays of 7 and 51 elements. and depending on the remainder (mod) i sent each number to a different list. it works great.

i also used a 151 element array to store the 100 integers with hash function "key & table_size" and if the index was already occupied i added 3 on my rehash function and modded again "(key + 3) % table_size" this worked good too.

i guess the solution really depends on what you wanna use. i think the unsorted linked list is best cause you can insert and remove nodes easily with the methods

let me know if you want the code to some of the functions if you decide on using unsorted linked lists or arrays, i can hook you up with the class files for the list so you can use the methods and stuff

later, peace out


All times are GMT -5. The time now is 05:23 PM.