LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 07-29-2007, 01:28 PM   #1
mohtasham1983
Member
 
Registered: Apr 2005
Location: San Jose
Distribution: Fedora 3,4- Ubuntu 6.06 to 8.10, Gentoo and Arch
Posts: 408

Rep: Reputation: 30
social networking database architecture


I have been thinking about designing a social networking database application using mysql and php. I couldn't come up with any idea better than making a table, named after each individual and add friends' information to that table. I assume this architecture is practically impossible to be used, since the number of users can grow very high and having thousands of tables inside a database doesn't sound good.

I also was thinking about making a very big sized field in a table so that it makes it possible to store a list of friends as a string in it.
This idea doesn't seem to be efficient either, since a lot of memory is required to parse the string if it s very long.

I really appreciate if you share your database experience with me.

Thanks
 
Old 07-30-2007, 06:56 AM   #2
Centinul
Member
 
Registered: Jun 2005
Distribution: Gentoo
Posts: 552

Rep: Reputation: 30
If I were you I would do research on the 1st, 2nd and 3rd Normal Forms of a relational database. Also read about one-to-one, one-to-many and many-to-many relationships. This should give you the insight needed to begin to design a database. The most important thing thoug his to break it up into pieces to avoid repetition.

HTH,

Centinul
 
Old 07-30-2007, 07:11 PM   #3
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
You suggestion is most definitely NOT the way to go.

A database is used to store related data in tables so you would have as an example a table of people. This table would hold the basic information about the person, this might be their name, their email address, their password (encrypted). The basic information should be single entities of data, by that I mean if they can provide multiple versions of the data then it would not be considered basic information. For example you may request a single email address, in which case it belongs in this table. However, if you permit them to provide a list of different email addresses then you need to do something slightly different. Because you probably want to send information to a single source (email address) for each person I guess that a single email address is acceptable, although you may want to hold details about a secondary email address in case there are problems with the first email address.

One last point before I leave this table and tackle the other point that you raised. Each table is built up with a large number of rows, in this case each row will be a person. What is essential is that each row can be uniquely identified. Now we tend to refer to people by their names but these are not unique and computers are very picky when it comes to uniqueness. A login name into the system would be unique but another extremely common approach is to have an integer and increase this number by one each time a new row is added to the table. This approach is so common databases have a special field called the auto incrementing integer. Another advantage about using integers as the unique reference to a row is that they use up less space then any other data type. This unique way to refer to a row is called a key. Each row must have a key (first normal form)

Looking at the point you raised concerning friends, most people have friends. Emphasis on the plural here and emphasis also on the fact that the actual number of friends is unknown and will vary from person to person. The way this is tackled in a relational database is to have a second table called friends and you hold two items on this table a key to the person and a key to their friend. As you can see this will link back to the person table The first would be the person the second would be the friend, for each friend that the person has this table will have a row, so both my registered friends would take up two rows on the table, whereas your 40 registered friends would take up forty rows, and if we are friends with each other that would still be two distinct rows, I'm friends with your, and you're friends with me. So let's say that you application has been running for a while and you have a million people registered and each person has on average 100 friends that will give you 100,000,000 rows. Now that might sound that this table will have a lot of rows and hence it will become very large. But, remember that if the key to the person is an 32-bit integer (meaning that the maximum number of people would be 4,294,967,296) then the size that you are talking about for this table would be 100,000,000 * 2 * 4 bytes or 800Mb. Not a lot if you consider all those pictures that 1 million people will be uploading! This is known as removing the repeating fields from the table (second normal form)

There are other things that you will need to consider, indices, look up tables, data backup etc, but first as already suggested read a good text on data normalisation.
 
Old 07-31-2007, 06:56 AM   #4
mohtasham1983
Member
 
Registered: Apr 2005
Location: San Jose
Distribution: Fedora 3,4- Ubuntu 6.06 to 8.10, Gentoo and Arch
Posts: 408

Original Poster
Rep: Reputation: 30
Thanks for advice. I had never heard anything about normalization before, but when I read some article about it yesterday, I found out my database architecture is almost according to normalization principles. However, there are some tables that require some minor changes.

Another idea came into my mind so that I can make a table that has 3 columns:

1. ID
2. User_ID
3. FriendList

As you already mentioned ID and user_ID can be integer and FriendList stores information about list of friends in XML format. However, I'm not sure if this method be efficient. I want to make sure how fast those XML fields can be parsed.

I guess using XML gives more power and is more expandable. But, I think I will take the approach of having a table with 3 columns: first ID, second user_id and third friend_id since as you mentioned this method seems to be the most efficient one and it meet my application all requirements.
 
Old 08-01-2007, 01:52 AM   #5
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
That's not quite what I was suggesting. I would have a table with just two fields:

user_id
friend_id

Then for each friend the user has a row would be generated. Say my user_id is 5, then given the following data:

3 5
3 6
5 6
5 3
5 7
6 3
7 5

I can say that I have three friends, 6, 3 and 7, whilst two people have recognised me as a friend 3 and 7. The reason for doing it like this is that searching for information would be substantially quicker. Most relational databases are finely tuned for searching amongst small pieces of data, such as integers, and so asking the database for all my friends would be quick - select * from friend_table where user_id = 5, even getting details on my friends friends would be fairly easy, I think that it would be... select * from friend_table as a, friend_table as b where a.friend_id = b.user_id and a.user_id = 5, whilst I'd want to test that properly the idea with relational databases is to store the data at the lowest possible entity. Hence storing data as XML formatted data is against the principles of data normalisation.
 
  


Reply

Tags
social


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Mugshot: Social networking open source style LXer Syndicated Linux News 0 05-04-2007 07:17 PM
LXer: Mozilla experimenting with social networking LXer Syndicated Linux News 0 04-06-2007 09:17 AM
Social networking -notification? vharishankar LQ Suggestions & Feedback 2 09-26-2006 09:30 AM
LXer: U.S. Legislators Move to Ban Social Networking Sites LXer Syndicated Linux News 0 08-02-2006 04:33 AM
LQ Social Networking Features jeremy Linux - News 4 06-10-2006 04:24 AM


All times are GMT -5. The time now is 02:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration