ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Distribution: Fedora 3,4- Ubuntu 6.06 to 8.10, Gentoo and Arch
Posts: 407
Rep:
social networking database architecture
I have been thinking about designing a social networking database application using mysql and php. I couldn't come up with any idea better than making a table, named after each individual and add friends' information to that table. I assume this architecture is practically impossible to be used, since the number of users can grow very high and having thousands of tables inside a database doesn't sound good.
I also was thinking about making a very big sized field in a table so that it makes it possible to store a list of friends as a string in it.
This idea doesn't seem to be efficient either, since a lot of memory is required to parse the string if it s very long.
I really appreciate if you share your database experience with me.
If I were you I would do research on the 1st, 2nd and 3rd Normal Forms of a relational database. Also read about one-to-one, one-to-many and many-to-many relationships. This should give you the insight needed to begin to design a database. The most important thing thoug his to break it up into pieces to avoid repetition.
You suggestion is most definitely NOT the way to go.
A database is used to store related data in tables so you would have as an example a table of people. This table would hold the basic information about the person, this might be their name, their email address, their password (encrypted). The basic information should be single entities of data, by that I mean if they can provide multiple versions of the data then it would not be considered basic information. For example you may request a single email address, in which case it belongs in this table. However, if you permit them to provide a list of different email addresses then you need to do something slightly different. Because you probably want to send information to a single source (email address) for each person I guess that a single email address is acceptable, although you may want to hold details about a secondary email address in case there are problems with the first email address.
One last point before I leave this table and tackle the other point that you raised. Each table is built up with a large number of rows, in this case each row will be a person. What is essential is that each row can be uniquely identified. Now we tend to refer to people by their names but these are not unique and computers are very picky when it comes to uniqueness. A login name into the system would be unique but another extremely common approach is to have an integer and increase this number by one each time a new row is added to the table. This approach is so common databases have a special field called the auto incrementing integer. Another advantage about using integers as the unique reference to a row is that they use up less space then any other data type. This unique way to refer to a row is called a key. Each row must have a key (first normal form)
Looking at the point you raised concerning friends, most people have friends. Emphasis on the plural here and emphasis also on the fact that the actual number of friends is unknown and will vary from person to person. The way this is tackled in a relational database is to have a second table called friends and you hold two items on this table a key to the person and a key to their friend. As you can see this will link back to the person table The first would be the person the second would be the friend, for each friend that the person has this table will have a row, so both my registered friends would take up two rows on the table, whereas your 40 registered friends would take up forty rows, and if we are friends with each other that would still be two distinct rows, I'm friends with your, and you're friends with me. So let's say that you application has been running for a while and you have a million people registered and each person has on average 100 friends that will give you 100,000,000 rows. Now that might sound that this table will have a lot of rows and hence it will become very large. But, remember that if the key to the person is an 32-bit integer (meaning that the maximum number of people would be 4,294,967,296) then the size that you are talking about for this table would be 100,000,000 * 2 * 4 bytes or 800Mb. Not a lot if you consider all those pictures that 1 million people will be uploading! This is known as removing the repeating fields from the table (second normal form)
There are other things that you will need to consider, indices, look up tables, data backup etc, but first as already suggested read a good text on data normalisation.
Distribution: Fedora 3,4- Ubuntu 6.06 to 8.10, Gentoo and Arch
Posts: 407
Original Poster
Rep:
Thanks for advice. I had never heard anything about normalization before, but when I read some article about it yesterday, I found out my database architecture is almost according to normalization principles. However, there are some tables that require some minor changes.
Another idea came into my mind so that I can make a table that has 3 columns:
1. ID
2. User_ID
3. FriendList
As you already mentioned ID and user_ID can be integer and FriendList stores information about list of friends in XML format. However, I'm not sure if this method be efficient. I want to make sure how fast those XML fields can be parsed.
I guess using XML gives more power and is more expandable. But, I think I will take the approach of having a table with 3 columns: first ID, second user_id and third friend_id since as you mentioned this method seems to be the most efficient one and it meet my application all requirements.
That's not quite what I was suggesting. I would have a table with just two fields:
user_id
friend_id
Then for each friend the user has a row would be generated. Say my user_id is 5, then given the following data:
3 5
3 6
5 6
5 3
5 7
6 3
7 5
I can say that I have three friends, 6, 3 and 7, whilst two people have recognised me as a friend 3 and 7. The reason for doing it like this is that searching for information would be substantially quicker. Most relational databases are finely tuned for searching amongst small pieces of data, such as integers, and so asking the database for all my friends would be quick - select * from friend_table where user_id = 5, even getting details on my friends friends would be fairly easy, I think that it would be... select * from friend_table as a, friend_table as b where a.friend_id = b.user_id and a.user_id = 5, whilst I'd want to test that properly the idea with relational databases is to store the data at the lowest possible entity. Hence storing data as XML formatted data is against the principles of data normalisation.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.