LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-08-2006, 12:50 AM   #1
George2
Member
 
Registered: Oct 2003
Posts: 354

Rep: Reputation: 30
any ideas to improve search performance in a tree structure


Hello everyone,


I am developing an application which contains a tree structure. The tree structure reflects the management structure of a big company -- child node represents the employees managed by direct manager (parent node), one parent node may have multiple (various) number of child nodes (employees he/she managed in his/her department).

I need to find all the employees a manager managed in the tree structure, the direct child node, the child node's child node, ... and so on (for example, if he is a senior manager, there are multiple levels of child nodes).

Currently, I am using brute force tree traverse approach to print all (direct and indirect) child nodes -- layer by layer. The tree is very large. I am wondering whether there are any good ideas about how to improve the search performance?


thanks in advance,
George

Last edited by George2; 06-08-2006 at 12:54 AM.
 
Old 06-08-2006, 01:03 AM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
There's no reason you can't have multiple data structures referring to the same underlying data.

Let's say, for example, that all of your employee records were organized in an array (yes, I know they're not - but please bear with me). You simply put the first record into "a[0]", the second into "a[1]", and so on. In this example, you could still have your tree (to describe the organization's structure) ... but now the tree would contain array indexes (0, 1, ...) instead of the data records themselves.

Now generalize one step further. It doesn't really matter how the records are stored. Nor does it necessarily matter how you access them: traverse a tree, loop through an array, iterate a linked list ... whatever. You can build as many different data structures around your data as you have ways of looking up that data. The only thing that REALLY matters is that you make sure all of your "indexes" and "pointers" remain in synch with the underlying data.

Having said that:
1. Yes, having a tree is definitely the best way to represent your management structure.

2. Using a hash, however, would be a much quicker way to look up a specific name.

3. You might consider using both a tree and a hash, and writing a "batch mode" script
that periodically regenerates the hash (say, once every hour) to accomodate any new
records that might have been added, or old records that might have been deleted.

4. Generalizing further, you might create different subtrees for each department, and have
an array containing the root node for each subtree corresponding to each department.

5. The possibilities are really limitless.

Just a thought .. PSM

PS:
Both are fairly difficult reading, but I'd highly recommend either Sedgewick and/or Knuth for their classic books on algorithms and data structures!

Last edited by paulsm4; 06-08-2006 at 01:05 AM.
 
Old 06-08-2006, 02:30 AM   #3
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Sounds like a parent-child classic arrangement in SQL ...
Make the RDBMS do all the heavy lifting ie data retrieval.
Just define the tables and their foreign keys.
 
Old 06-08-2006, 03:14 AM   #4
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Hi chrism01,


Quote:
Originally Posted by chrism01
Sounds like a parent-child classic arrangement in SQL ...
Make the RDBMS do all the heavy lifting ie data retrieval.
Just define the tables and their foreign keys.
The data are stored in a database. But searching database in slower than search in memory. So, I read all the data from database to memory and then search them in memory. Any ideas to improve it?


regards,
George
 
Old 06-08-2006, 03:26 AM   #5
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Thank you paulsm4,


Seems you have a lot of good points. :-)

Quote:
Originally Posted by paulsm4

2. Using a hash, however, would be a much quicker way to look up a specific name.
If there are thousands of nodes, how to find a good hash function to reduce conflicts? Any comments?


Quote:
Originally Posted by paulsm4
3. You might consider using both a tree and a hash, and writing a "batch mode" script
that periodically regenerates the hash (say, once every hour) to accomodate any new
records that might have been added, or old records that might have been deleted.
I can not imagine how to use a tree and a hash together. But it seems that in this way, we can combine both the advantages of tree and hash table. So, could you talk a little more about this ideas?

Quote:
Originally Posted by paulsm4
4. Generalizing further, you might create different subtrees for each department, and have
an array containing the root node for each subtree corresponding to each department.
Do you mean setup different trees with different root node? If it is, I do not think it can significantly improve the performance. Could you point out why this approach can improve performance?


regards,
George
 
Old 06-08-2006, 07:44 AM   #6
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
Looking at you specific question, "How can you improve the speed of finding all employees a manager is responsible for, including different levels?" Well if memory is not a problem then you can adjust you tree as follows:

I am assuming that your tree is as follows:

Code:
ManagerNode
{
   struct ManagerDetails
   vector of SubordinateNodes
}
The tree nodes consist of the personal details plus a list of the subordinate nodes, this is probably a vector so that it can grow efficiently. The SubordinateNode is of course just a ManagerNode but using different names is helpful here. To traverse the tree all the nodes in the SubordinateNode vector will need to be visited.

Now the problem is that to find all the subordinates, including nested levels takes more time. By developing a slight modification to the structure you can easily achieve this:

Code:
ManagerNode
{
   struct ManagerDetails
   vector of SubordinateNodes
   vector of AllSubordinateNodes
}
I'd add a second vector which essentially includes all the subordinates at all levels, infact there is no need to keep the immediate subordinates in that vector, because you have them in the first vector. This makes it really easy to find out the immediate subordinates plus all the subordinates.

The real problem is in setting this structure up. One way to do this would be when adding an entry to the SubordinateNodes also add it to every supervisors AllSubordinateNodes, to do this you need to be aware of who the Supervisor is so it is necessary to add a upward traveral in the tree, it will make you tree like a double linked list:
Code:
ManagerNode
{
   struct ManagerDetails
   vector of SubordinateNodes
   vector of AllSubordinateNodes
   pointer to SupervisorNode
}
ps. It probably goes without saying but: all references to *Node or *Nodes wil be to the ManagerNode structure, thus SupervisorNode is a ManagerNode, SubordianteNodes are ManagerNodes.

Last edited by graemef; 06-08-2006 at 07:47 AM.
 
Old 06-08-2006, 10:27 PM   #7
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Thank you graemef!


Quote:
Originally Posted by graemef
The real problem is in setting this structure up. One way to do this would be when adding an entry to the SubordinateNodes also add it to every supervisors AllSubordinateNodes, to do this you need to be aware of who the Supervisor is so it is necessary to add a upward traveral in the tree, it will make you tree like a double linked list:
Code:
ManagerNode
{
   struct ManagerDetails
   vector of SubordinateNodes
   vector of AllSubordinateNodes
   pointer to SupervisorNode
}
ps. It probably goes without saying but: all references to *Node or *Nodes wil be to the ManagerNode structure, thus SupervisorNode is a ManagerNode, SubordianteNodes are ManagerNodes.
You have presented a good idea. But I am still confused about how to add the reference of current node to all its upper level manager nodes. For example, for a low level employee, we need to add the reference node to all his/her upper level managers, so I think we need to store a pointer to the parent node to the current node in order to traverse upward, right? What I am confused is about whether the pointer "SupervisorNode" means the pointer to parent node in your points?


regards,
George

Last edited by George2; 06-08-2006 at 10:28 PM.
 
Old 06-08-2006, 10:38 PM   #8
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
Yes I was thinking that the supervisorNode would be the parent.

So you would have a loop that would visit the supervisorNode add the subordinate to the allSubordinate vector and then repeat until the supervisor Node was null (presumably being the god node herself.)
 
Old 06-09-2006, 01:14 AM   #9
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Thank you graemef! I understand your idea now -- we are using different terminology before.


Quote:
Originally Posted by graemef
Yes I was thinking that the supervisorNode would be the parent.

So you would have a loop that would visit the supervisorNode add the subordinate to the allSubordinate vector and then repeat until the supervisor Node was null (presumably being the god node herself.)
Anyway, do you think using hash table together with the tree is better, as suggested by paulsm4 in the second post of this thread?


regards,
George
 
Old 06-09-2006, 04:49 AM   #10
primo
Member
 
Registered: Jun 2005
Posts: 542

Rep: Reputation: 34
There are natural trees implemented in filesystems and they most likely have efficient lookup mechanisms. You just name each directory after every individual and files inside would contain data about each one. There's no need at all to "move" nodes as you can easily have the same guy in more than one place. Also, you may use hard links on the files contained in these directories and benefit from the inode abstraction that substracts one everytime a file is unlinked that gets deleted when it reaches zero. Then making a backup of this filesystem database is as easy as doing a tar.bz2. It's mostly portable to any platform provided you use ASCII or some sort of portable format instead of floating-point or integers values.

P.S: This database may even be encrypted transparently with encrypted filesystems. Locking mechanisms may be easier too.

Last edited by primo; 06-09-2006 at 04:51 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Have i tried everything to improve my disk performance? drben Linux - Hardware 15 02-07-2006 02:38 PM
How can I improve gnome performance? pfaendtner Linux - Software 16 04-14-2005 11:52 AM
How to Improve performance of PC Imran Aziz Linux - Software 3 06-03-2004 02:10 PM
Recommendation for webpage tree structure Alan Powell Linux - Newbie 1 04-06-2004 02:07 PM
ways to improve performance flipboi Linux - Newbie 6 10-25-2003 11:22 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration