LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-13-2011, 08:10 PM   #1
cin_
Member
 
Registered: Dec 2010
Posts: 264

Rep: Reputation: 23
Database :: Efficiency :: Organising Data For Speed


Is it quicker`more secure`more stable to break information into as many tables as possible or to limit your number to as few as possible?


I have a need of a database, and the information to go into it. I know how I want to organise my data for optimal cross referencing, but what I know is merely what I know and I do not know if what I know is what is known...

Madness to say... looking at my database I see this is the best way to organise the data, but what I do not know is how the computer will respond to my methodical madness.

Is it quicker to have one table with many fields or many tables with few fields?

We've got information for 10 users with 17 idiosyncrasies.

So, would one table with 17 columns and 10 rows...
or...
a group of 50 tables with one column and 3 rows and one table with 4 columns and 5 rows...
Be more efficient concerning:
QUERIES...?
Searching?
Comparison?
Insertion?
Deletion?
Modification?

The queries and modifications will be daily, and constant.
These numbers are really much larger but the math is easier for me to articulate on a smaller scale.
I use Perl, php, and C with SQL for management.

Last edited by cin_; 04-13-2011 at 09:34 PM. Reason: gramm'err
 
Old 04-13-2011, 08:56 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,964
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
I think you first need to sort out your terms. I believe that what
you're referring to as "a database" is really a "table".


P.S.: This thread is a better match for PROGRAMMING, so I moved it there.

Last edited by Tinkster; 04-13-2011 at 08:57 PM.
 
1 members found this post helpful.
Old 04-13-2011, 09:14 PM   #3
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,562

Rep: Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394
You should learn about database normalization.
 
Old 04-13-2011, 09:32 PM   #4
cin_
Member
 
Registered: Dec 2010
Posts: 264

Original Poster
Rep: Reputation: 23
Tinkster... You are right. I did mean table instead of database. But now the question still remains and even grows?
So is it more efficient to have a massive table or multiple smaller tables?
And still the original holds in its ignorance: is it more efficient to have a massive database with many tables or many databases with fewer tables?

dugan... normalisation from my hitherto superficial research seems to me to be something of a logical mapping of information. This part I want to use my own methods for. The actual organising is done unless I find that it is a slow method. My desired method? Many small tables cross referenced with insanity. We can call it 17NF...

The real question is not how I should do it, but rather how is the computer set up to interact? If the table is large does the whole thing have to load to get a single field? Seeming to imply smaller tables would be quicker...
Or can fields from multiple tables be modified in parallel?
Could the information be better loaded from many databases of many tables, or one database of many tables, or one database of one table?

Last edited by cin_; 04-13-2011 at 09:48 PM. Reason: gramm'err
 
0 members found this post helpful.
Old 04-13-2011, 10:51 PM   #5
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,226

Rep: Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022Reputation: 2022
If the data is all logically related, then one DB (actually the word schema is better here) with some number of tables. Multiple schemas are usually only used when you want ONE instance of a DB eg 1 copy of Oracle running, but several schema, usually non-related, inside the 'DB'.
Schemas are made up of tables, which are made up of cols 7 rows.
Apologies if you know all this terminology, but just to clarify ...

Ok, some_num users with other_num of attributes would normally be one table; 1 col for userid and 1 col for each of the other attributes.
If(!) you are talking massive nums here (and we need to know how many cols & rows you really will have), then, at least in Oracle, you can logically and physically partition the table into multiple partitions.
Not sure if eg MySQL does that; I'm sure you can google it if you want.

Note that DB engines can only 'use' a limited num of col joins (ie col1 = col2 AND col3 = col4 etc) in a WHERE clause.
In other words, you may specify eg 10 joins in a WHERE clause, but the engine will only use indexes on say 4 and full table scans on the rest.
You'd need to research this.

Hopefully that gives you some pointers ...

Last edited by chrism01; 04-13-2011 at 10:55 PM.
 
1 members found this post helpful.
Old 04-13-2011, 11:22 PM   #6
cin_
Member
 
Registered: Dec 2010
Posts: 264

Original Poster
Rep: Reputation: 23
chrism01... Yeah, this stuff is completely new to me. Terminology is not my strong suit.

I thank you all. You have opened up many avenues of research.
But...

So it seems apparent that multiple tables is the convention, but still I am cloudy on the better format based strictly on performance. And maybe some technical whys and wherefore... So, one or the other?
Many tables few fields or few tables many fields?

Last edited by cin_; 04-13-2011 at 11:27 PM. Reason: gramm'err
 
0 members found this post helpful.
Old 04-13-2011, 11:42 PM   #7
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,562

Rep: Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394
If performance is the only consideration, then the best solution is obviously to have everything in one table.

Joins are slow operations.

Quote:
Originally Posted by cin_ View Post
If the table is large does the whole thing have to load to get a single field?
Nope. If your search is based on an indexed field (e.g. a primary key), then finding a single row will be done in log-n time, which is to say instantly.

Last edited by dugan; 04-13-2011 at 11:52 PM.
 
1 members found this post helpful.
Old 04-14-2011, 01:09 AM   #8
cin_
Member
 
Registered: Dec 2010
Posts: 264

Original Poster
Rep: Reputation: 23
Then what is the benefit of breaking down the table?
Speed up searching and sorting?


I am unsure how the whole process works technically.
So you have an instance of the database management system running and a service that automates interaction with the management system.
Someone logs into the service and so the service inquires for database information to feed to the user. The user reads information from the table NEWS and logs out. Then a new user comes by, logs in, reads the information from NEWS and modifies it and adds something new, so the information in NEWS is moved to OLDES and the newly added pieces are now in the NEWS table; logs out. Third user logs in to read some information, and is automatically shown NEWS, but this user realises they want to read the OLDES so requests the OLDES information and the service then inquires for the information from OLDES; then logs out.

So if the table is always open then these NEWS and OLDES could be fields that shift, but if you have to load the massive table every time there is a QUERY this is not a good solution.
So in this sad example the first user never needed the OLDES table so can I assume that whatever strain, in terms of resources, that would be placed on the server if it needed to load all of the information from OLDES was relieved because it was not requested. If it were one table it would seem that the computer would have to load lots of unnecessary information all the time... Other intriguing uses and neglects can be seen from the example... user two needs both OLDES and NEWS, and user three is shown NEWS, but promptly prompts to only see OLDES; relieving NEWS.

Does the computer load the table, or in an example of many tables: every table; with each query or is every table always loaded into the management system?

Last edited by cin_; 04-14-2011 at 01:13 AM.
 
0 members found this post helpful.
Old 04-14-2011, 01:26 AM   #9
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 541

Rep: Reputation: 74
When a database reads it's data, it does so in a block; Typically along the lines of 8kb blocks, but this number differs. Sometimes it even fetches a few rows after that and/or before that. That's tunable in real databases, don't know how MySQL handles that.

Anyway, what you ideally want, in 1 read, is to have as much relevant data as possible in that single read. Hence: Fixed-length data (char(), int, float etc) is fastest to have in 1 table. Slightly altering-length data (varchar()) should also go in that table, but typically more to the end of the record. When things can, and usually differ is when you use huge portions of data (aka BLOB, bytea or the SQL text type) what matters is then how static this would be.

To search in 1 table and then search the corresponding item in another table is "medium hard" work for a database; Fetching everything from one table is "easy work".
Since the large fields have a tendency to be longer and bigger than the length of data read, you'll often find separate tables (and sometimes even tablespaces) to make sure the efficiency in the "general select" is as high as possible, since you most often don't have to select the blob along.

The reasoning behind putting the large objects in another tablespace, is that you can configure that tablespace (and nowadays this is also possible on tables in Oracle if I'm not mistaken) for a specific block-size, so as to make the reads there more efficient for large objects (where that may not be the case for small objects)

I generally use PostgreSQL, and there the amount of tables you can join in a select (nested queries) is unlimited, this used to be limited to 1 for MySQL, and most recent I know is 5.

Read a bit about normalization; not all rules (especially the 5th form is known for it) are best for performance, but then again: the world is not only about performance. But if you apply the 1st to the 4th form, you generally have an efficient way to store your data in a good way. Here's the wikipedia article:
http://en.wikipedia.org/wiki/Database_normalization
 
1 members found this post helpful.
Old 04-14-2011, 05:22 AM   #10
cin_
Member
 
Registered: Dec 2010
Posts: 264

Original Poster
Rep: Reputation: 23
Ramurd... Thanks for the explanation. This JOINS business seems to be a recurring topic of note to this subject line; am I sacrificing my own ideas and ideals as it pertains to my database organising because I am choosing to use the SQL with a previously created database management system?

Can I just use the management system to create the database and replace the SQL with Perl scripts and C programs? Do I have such raw access to the database's tables or do I have to access them using SQL?
Can I just make the database tables myself with a spreadsheeting program? Originally the information was going to be multidimensional arrays loaded with delineated data until I discovered that this sort of input is what inspired the database`rs to create their databases...

What benefits does SQL and the management systems like Postgres and My offer or present?

Last edited by cin_; 04-14-2011 at 05:31 AM. Reason: gramm'err
 
0 members found this post helpful.
Old 04-14-2011, 07:59 AM   #11
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 541

Rep: Reputation: 74
There is no simple answer to what seems a simple question, even if I get your question right; So, I will make an attempt to an answer; For that I will begin where you began:

Quote:
I have a need of a database, and the information to go into it. I know how I want to organise my data for optimal cross referencing, but what I know is merely what I know and I do not know if what I know is what is known...

Madness to say... looking at my database I see this is the best way to organise the data, but what I do not know is how the computer will respond to my methodical madness.

Is it quicker to have one table with many fields or many tables with few fields?

We've got information for 10 users with 17 idiosyncrasies.

So, would one table with 17 columns and 10 rows...
or...
a group of 50 tables with one column and 3 rows and one table with 4 columns and 5 rows...
Be more efficient concerning:
We all went by the assumption that you had need for a (relational) database, and you wanted optimal cross referencing; You asked whether it matters for a database to split up the data over many tables or to keep things in a single table; The answer was given based on those questions.
1) for a database it does matter; even for normal file access: all reads are essentially done in a certain block-size. The most ideal is that all the data that is read is also being useful (since a read operation is quite costly)
2) I am pretty certain that, with the given constraints, a spreadsheet is not the best solution
3) Coding stuff yourself is always possible; but whether or not this will be "better" I just wonder; Databases come with some extras you'll most likely never use and do consume system resources; You can minimalize that.

Anyway: try to see it from a practical world example: if I write an article that reads like this:
Code:
take book 7, page 34, line 6, word 4
take book 3, page 127, line 4, word 12
take book 46, page 337, line 346, word 9
take book 31, page 16, line 7, word 12
take book 778, page 4, line 19, word 6

The first result: Mary
The second: had
the third: a
the fourth: little
the fifth: lamb
you'll never read my "article", so much cross referencing makes your "read operations" very inefficient: you must read where you can read what you need to read, keep that in mind and read elsewhere to read what is to be read to read what is needed and added to the knowlegde of what was read (follow me?) Space-wise this might actually might save some space, but efficiency wise: there are too many read operations needed to get a single "piece" of data that is to be turned into information (context).

Same goes for your database; You want to read as much data in a single read operation ("Mary had a little lamb" takes way less time to read than all that cross referencing);

Then again:
if you have a table "books" with the fields (isbn, author, num_pages, title, book_contents) (where isbn is the primary key)
Just to "who" wrote books that had "strange" in the title, you don't have to read the entire book contents; That is why such objects are stored in another table usually. In this example that would make 2 tables:
- books with fields isbn, author, num_pages and title
- books_contents with fields isbn and book_contents (where isbn is the primary key but also a foreign key, referencing the isbn on table books)

If you want to read the contents of the books you found that had "strange" in the title, you can actually join those two tables as they have the isbn in common. Databases are good at organizing your data and retrieving it efficiently, where multiple persons may actually query the thing (locking and such)

SQL stands for Structured Query Language; There are some standards for it, so finding data in one dbms goes pretty much the same in another dbms; that's a practical thing mostly. A DBMS stores your data, retrieves it and keeps it consistent, offers transactions for large updates, so that other processes will not be in the way of your update, or if during your update something goes wrong, you can rollback the transaction and roll forward; many more such mechanisms to keep your data in such a state that you must be able to tell in what state it is.

Quote:
We've got information for 10 users with 17 idiosyncrasies.
I translated this to10 users with 17 fields each, as I have not the faintest idea what idiosyncrasies are ;-)
 
Old 04-14-2011, 08:40 AM   #12
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,562

Rep: Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394
Quote:
Originally Posted by cin_ View Post
Does the computer load the table, or in an example of many tables: every table; with each query or is every table always loaded into the management system?
The whole table is never loaded into memory. When you do a query, the database management system loads only the data that you ask for.

For deciding which fields to put into which table, here's a book recommendation:

http://www.amazon.com/dp/1590597699

Last edited by dugan; 04-14-2011 at 08:41 AM.
 
Old 04-14-2011, 02:03 PM   #13
cin_
Member
 
Registered: Dec 2010
Posts: 264

Original Poster
Rep: Reputation: 23
If the whole table is not loaded then I wonder how the database management system cross references its material.
I'm looking for theoretically... I guess My and Postgres are both opensource and I could troll the source for answers.

... but strictly from a banana standpoint: if I had 17 bananas laid out on a table in a block 4 columns by 4 columns with a single banana appended somewhere, and if I wanted to eat the banana at location [3][3], then I would have to move down the titles of 3 rows and then across the titles of three columns and take whatever is there. So that means I had to think about at least 6 titles before I found the banana I wanted. But databases do not work like this because the rows have no `titles`, instead you may use the first column as a psuedo title, so there seems to be no single field referencing. Instead...
Code:
USE tallyme

SELECT third_column
FROM bananas
WHERE  third_row
... so if I understand it right SELECT jumps to a column I tell it to, but it must reference that somehow, I mean how does the database know it is the third column or column with the title third_column? By comparing the titles before it? All other titles? Reference points?
Now the real killer looks like this WHERE statement. It seems to me that the database doesn't allow for field specific inquiry but relies on the WHERE to compare the entire contents of a column to find the material you are looking for and then draw out the applicable fields, which, through enough specification could be a single field?

A WHERE looks like it would make the bananas require you to look at `number_of_columns_before_desired_column` plus `number_of_elements_in_column`, or as for bananas this would be 3 titles and 4 bananas. But outside of bananas this would be a ridiculous number with scads of digits. And to look at every vertical element to find one element I know I want? That shit is bananas.



... at this point it just needs to be done and played with. But for the sake of provocation and conversation...

Last edited by cin_; 04-14-2011 at 02:08 PM. Reason: gramm'err
 
0 members found this post helpful.
Old 04-14-2011, 02:10 PM   #14
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,964
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
An RDBMS has as much in common w/ your array approach to data as
a bicycle has w/ a Mack truck. I highly recommend reading up on
relational databases and their data models (as was repeatedly
suggested before).


Cheers,
Tink
 
Old 04-14-2011, 02:16 PM   #15
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,562

Rep: Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394Reputation: 1394
Quote:
Originally Posted by cin_ View Post
If the whole table is not loaded then I wonder how the database management system cross references its material.
The answer is that primary keys are indexed.

http://en.wikipedia.org/wiki/Database_index

Now, please think carefully about what Tinkster said. You've already been pointed at many resources for learning the database concepts that you need to understand. To not look at them is annoying and rude.

Last edited by dugan; 04-14-2011 at 02:36 PM.
 
  


Reply

Tags
database, efficiency, mysql, postgresql, speed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Organising partitions chipotphe Linux - Software 21 12-21-2008 03:27 PM
Postfix: How to speed up postmap when using it on a huge database? robel Linux - Software 11 10-26-2008 05:50 AM
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 08:36 PM
LQ bookmarks needs organising noranthon LQ Suggestions & Feedback 4 03-10-2007 02:12 AM
Best Photo Organising Software Greebo Linux - Software 1 02-06-2004 05:35 AM


All times are GMT -5. The time now is 03:11 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration