LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-12-2010, 07:12 AM   #1
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Rep: Reputation: 39
Equivalence classes, based on field values and multi-key hashtable


Hello

I've got a set of objects (all of the same type). I'm trying to think of a good way to divide it into equivalence classes, with equivalence of two objects defined as meaning a specified set of attributes are equal for both objects.

More concretely, I've got:
- a Java class with around 50 fields
- a bunch of instances of the class

I want:
- to divide the instances into a few sets
- in each set, each instance has field 1 - field 5 equal to fields 1-5 of the other instances in the set.


The method I've come up with is to generate a hashcode for each instance based on the hashcodes of fields 1-5*, and map the hashcode to one of my sets. Ignoring problems with potential hashcode collisions (which I'm expecting to be too rare to worry about for now), does that sound reasonable? It seems simple enough, but I'm wondering if there's a simpler method I haven't thought of.

If anyone can think of a better method (quicker to implement/quicker to run/easier to maintain and extend), I'd like to know! Let me know if anything's unclear.

cheers,

OS

* I'll generate the hashcode using a method based on Eclipse's generic hashcode method, which looks like this:
Code:
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ( ( f1 == null ) ? 0 : f1.hashCode() );
    result = prime * result + ( ( f2 == null ) ? 0 : f2.hashCode() );
    result = prime * result + ( ( f3 == null ) ? 0 : f3.hashCode() );
    result = prime * result + ( ( f4 == null ) ? 0 : f4.hashCode() );
    result = prime * result + ( ( f5 == null ) ? 0 : f5.hashCode() );
    result = prime * result + ( ( f6 == null ) ? 0 : f6.hashCode() );
    return result;
  }
 
Old 05-13-2010, 12:10 AM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by openSauce View Post
Hello

I've got a set of objects (all of the same type). I'm trying to think of a good way to divide it into equivalence classes, with equivalence of two objects defined as meaning a specified set of attributes are equal for both objects.

More concretely, I've got:
- a Java class with around 50 fields
- a bunch of instances of the class

I want:
- to divide the instances into a few sets
- in each set, each instance has field 1 - field 5 equal to fields 1-5 of the other instances in the set.


The method I've come up with is to generate a hashcode for each instance based on the hashcodes of fields 1-5*, and map the hashcode to one of my sets. Ignoring problems with potential hashcode collisions (which I'm expecting to be too rare to worry about for now), does that sound reasonable? It seems simple enough, but I'm wondering if there's a simpler method I haven't thought of.

If anyone can think of a better method (quicker to implement/quicker to run/easier to maintain and extend), I'd like to know! Let me know if anything's unclear.

cheers,

OS

* I'll generate the hashcode using a method based on Eclipse's generic hashcode method, which looks like this:
Code:
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ( ( f1 == null ) ? 0 : f1.hashCode() );
    result = prime * result + ( ( f2 == null ) ? 0 : f2.hashCode() );
    result = prime * result + ( ( f3 == null ) ? 0 : f3.hashCode() );
    result = prime * result + ( ( f4 == null ) ? 0 : f4.hashCode() );
    result = prime * result + ( ( f5 == null ) ? 0 : f5.hashCode() );
    result = prime * result + ( ( f6 == null ) ? 0 : f6.hashCode() );
    return result;
  }
Assuming the fields are strings why not simply concatenate them using a non-used in the character as the separator ? For example, an ASCII 1 (Control-A) character.

If the fields are not string and, say, they are numeric, they can first be trivially stringified.
 
Old 05-13-2010, 02:24 AM   #3
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Original Poster
Rep: Reputation: 39
I did wonder about using strings, and that's a good idea about using a control char as the separator, I'd then be able to guarantee there were no collisions, which I couldn't if I make a hashcode. I think it would be a bit slower, but with only 10,000 - 100,000 objects I might not notice the difference.
 
Old 05-15-2010, 09:08 AM   #4
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Are you performing clustering, or is the equivalence operator well-defined? In other words, do you know if two objects are equivalent without knowing about any of the other objects?
Kevin Barry
 
Old 05-15-2010, 02:34 PM   #5
openSauce
Member
 
Registered: Oct 2007
Distribution: Fedora, openSUSE
Posts: 252

Original Poster
Rep: Reputation: 39
Yes, it's a well-defined equivalence operation: two objects are equivalent if they have equal values in fields 1-5.
 
Old 05-16-2010, 08:35 PM   #6
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Hashing seems like a good idea, but you should consider an algorithm based on the limitations of the fields. For example, with normal text you're only using about 1/4 to 1/2 of the possible ASCII character values, so you might be able to compress the string before hashing (hashing is just an irreversible form of compression, anyway.) You also might consider making everything lower-case and removing punctuation before hashing. After all of this, you might actually be able to reduce your strings into something that can be directly compared without having to hash, thereby reducing the number of possible collisions.
Kevin Barry
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: The key components of Multi-security based solutions for UNIX LXer Syndicated Linux News 0 03-16-2009 06:20 PM
how to grep for only the values of a specific field hchoonbeng Linux - Newbie 3 11-19-2008 08:20 AM
in-memory hashtable with values, which are db records pointers kpachopoulos Programming 0 11-03-2006 03:46 AM
java multi classes trscookie Programming 6 07-08-2006 06:24 PM
storing multiple values within one field in mysql antken Programming 8 12-15-2002 10:08 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration