LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-14-2011, 05:23 PM   #16
cin_
Member
 
Registered: Dec 2010
Posts: 266

Original Poster
Rep: Reputation: 23

As for the material that was suggested to me I did read the superficial research that was suggested to me, and that too did not answer my question. If you read it and did find my answer please quothe me the raven.
As you may well know I did not read the book.
As far as books are concerned a book titled `Pratical Database Management System Theory: QUERIES, JOINS, and the code behind them` or `Databases In The Raw: where to find, and how to read the lowest levels of database storage`, this one could be strictally Postegres or My specific, are more appropriate to my questions.

Read those links yourself as well. They say nothing about how the data is read. Even if someone had linked to a similar page for SQL, I find no information on how the WHERE actually works, only what it returns under what circumstances.
I thought it would be rude to go through the entire linked information describing how it does not fulfill any of the requirements of the questions. If anything they support the fact that I am asking the questions:
Quote:
SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several places.
...from Relational Model linked by Database Normalisation after the normalisation page showed no answer to the information needed I moved onto delving in the links present in the article. Down the rabbit hole with no Wonderland.

Tinkster... I am interested in your analogy of the bicycle and the Mack truck.
Is that to say that the Mack truck can carry far more for longer distances with less strain on the person operating it?
Or was it that the Mack truck is a pestilent spewing machina that powers itself on the combustion of highly pressurised remnants of, theoretically: atmospherically due astronomically; killed krill?
Or were you trying to say something completely different?
Wait... Which was the bicycle? Which was the Mack truck?


I think just having access to the basest bit of the database would be best for my own research. Can I look up tables in some human readable fashion other than with some SQL`ing? Say a text editor? How about non human readable?


...http://en.wikipedia.org/wiki/Relatio...lational_model
...http://en.wikipedia.org/wiki/Database_normalization

Last edited by cin_; 04-19-2011 at 08:08 PM. Reason: gramm'err
 
0 members found this post helpful.
Old 04-14-2011, 05:40 PM   #17
cin_
Member
 
Registered: Dec 2010
Posts: 266

Original Poster
Rep: Reputation: 23
dugan... check it out. From the Database Index you linked:
Quote:
Indices are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the B-tree data structure until the Smith entry has been found; this is much less computationally expensive than a full table scan.
Precisely my worry, my complaint, and my question. That WHERE statement is silly. Why continue to use something that seems so inefficient? Sure indexing can place limits on the full desires of the WHERE call, but that just sounds like a double READ, both index and table, which was suggested against early in these posts:
Quote:
Same goes for your database; You want to read as much data in a single read operation ("Mary had a little lamb" takes way less time to read than all that cross referencing);
-Ramurd
And you yourself said:
Quote:
Joins are slow operations.
-dugan
So why would you want to have an inefficient WHERE statement where the solution is to index taking up space and I'm guessing that to reference the index is through some form of JOIN, and futhermore the actual solution breaks down in cases of even more interest. Using LIKEs requires using none of the index`d material.

...http://en.wikipedia.org/wiki/Database_index

Last edited by cin_; 04-14-2011 at 05:55 PM. Reason: gramm'err
 
Old 04-14-2011, 07:21 PM   #18
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
If I understand you correctly you want to know how a relational database engine works, not how to use SQL.

I would say that to understand the former you need to understand the latter, however as far as I an aware the SQL standard doesn't prescribe how the database engine will store the data, so one engine may store the data in compressed formate another may have it in plain text, but for it to be SQL compliant then it must return the data after the call consistently.

It is important to realise that relational databases follow a (mathematical) set notation. That is data is either in a set or not, but a set doesn't prescribe any sense of order so looking for a piece of data at a particular location holds no meaning.

Whilst completely superfluous to a schema and its tables the index is an important concept. You can issue an SQL command on a particular column looking for some data and the SQL engine will look for the rows that match your request. If you don't have an index file on the column in question then it will be necessary to look at every row in turn in the table, for a large database this can become very slow. If the column is indexed then all it needs to do is read the index file (which is ordered) and find what matches the query. This can substantially increase the speed of look up whilst using less memory, at the expense of using more disk storage.
 
1 members found this post helpful.
Old 04-14-2011, 08:48 PM   #19
cin_
Member
 
Registered: Dec 2010
Posts: 266

Original Poster
Rep: Reputation: 23
graemef... If you care to I'd like to hear an elaboration of this paragragh:
Quote:
It is important to realise that relational databases follow a (mathematical) set notation. That is data is either in a set or not, but a set doesn't prescribe any sense of order so looking for a piece of data at a particular location holds no meaning.
... this is the kind of thing I am looking for. What requirements would I need to make my own database?

I have been reading online and see that thinking like the thinking from the previous posts is not uncommon.
No one is an expert, but that's why its interesting to hear everyone's expert opinion.



...http://lwn.net/Articles/328487/
...http://us.pycon.org/2009/conference/schedule/event/64/
...http://labs.google.com/papers/bigtable.html

Last edited by cin_; 04-14-2011 at 08:52 PM. Reason: gramm'err
 
Old 04-15-2011, 12:48 AM   #20
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
When SQL is used to query a table the resultant set has no order and so one implementation might return the rows in one order another implementation might return it in another order. SQL does support the keyword "ORDER BY" which then sorts the result set giving some guarantee to the order for distinct values.

A relational database and the SQL language is a very general (and good) tool. The alternatives that say that they are better than a relational database are either addressing a different problem or tackle a very specific problem. If you want to write your own database my first warning is that you are highly unlikely to end up with a tool that is as fast, flexible or secure as a modern relational database (RDB). However there are reasons for doing it, possible to be able to tackle a very specific situation, or maybe just for the challenge.

Here are some initial questions:
  • Will you be able to hold all of the data in memory, or will you have to resort to switching data in and out of file storage?
  • What is the perceived usage when it comes to query (read), insert (create), update and delete the data?
  • How many concurrent users might there be?
The answer to each of these would form the basis of any design. For example LDAP is designed for many reads but few updates so the insert, update and delete functions are supported but they are slower than the highly tunes read operations. A single user database is significantly easier to create because you don't need to worry and locks but maybe it will be mutli-user read single user update, the result can lead to a design that would be more efficient than a general RDB like MySQL.
 
Old 04-15-2011, 03:26 AM   #21
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Rep: Reputation: 75
There is a difference between a full table scan with string compares and an index; The analogy about reading the book is of course not 100% analogue to how humans read and do things; What I was referring to is the scanning many times hence and forth.

What you must realize is that a string compare is a hard task for a computer, in this sense that computers can only calculate. So to go a bit into detail: a string is an array of characters, which are technically speaking just 8bit integers (or 1 byte); so in order to see if strings are equal, a computer has to compare those in a loop:

for each letter in the string:
check if letter equals the letter in the same location on the other string

(there are shortcuts, which are taken; not going into that detail, as it will dive too deep at this point)
For "string equals other string" the shortcuts are fairly easy and fast enough, but "string contains ..." this is much harder; there are far fewer shortcuts

Since computers are not particulary good at searching in text, but are very good at processing numbers, binary trees were brought forth;

Now, what happens -generally- when you perform a SQL query:
Code:
SELECT field1, field2 FROM table1 WHERE field3 = value
If there is an index, the index is searched: this will lead to the numbers of the records which are to be presented; The whole record is read, but only the fields of that record will be displayed. In this read operation, an amount of data equal to n times the blocksize is read (n being an arbitrary number); This makes that a table scan can be done fairly fast: each read operation is going to lead to a great amount of records, without having to read those records themselves. The index-read leads to a certain amount of records, and this can speed up things, as the database knows how big each record is and can thus jump to those records; which is way faster than a complete read operation.

Indexes are not always going to speed up your operations; on small tables, they actually make things slower.

Also, indexes make write operations slower at all times; they're good for retrieval operations.

In analogy to bookreading: In books you also have indexes; I for one, never read the index of a very small book; I can browse to the required data faster if I have some idea what the book is about; However, as a human I can read text fast, which a computer cannot (all in relativity); However, in a big book (take a cookbook with thousands of recipes, and I want to find the recipe for goulash I will take the index and see how far I can skip ahead to find that one; It speeds up my search operation.
However, if there's a small cookbook "5 ways to boil an egg", I'm not going to use the index to find the recipe for a hard-boiled egg. (usually, smart databases will neither; the query optimizer takes care of such things)

As you might notice here, right now: the entire table is never read into memory; it is scanned and parts of the table may be in memory for a while, but they do not "reside" there; what would be in memory would be a temporary copy of the actual data in the database. Because: what happens when there's a powerfailure during the query?

So, what modern RDBMs do, is store the current data to a "before image", perform the query, and when it completes successfully the before image can be destroyed. If there is a powerfailure during the query, the database can always be restored to the original state by performing a roll-back if the before image write has completed; if not: the data is still in the original state; If the powerfail happened during the query, the rollback can place the data back to the original state. The query is lost, but that's not as bad as a database that is in an inconsistent state.

To prevent data-loss, the "after-image" was invented; how the data would look after the data has been committed; Together with the before image, you can do a roll-back and a roll-forward to have the database in a consistent state at all times and not losing the query. Keeping the data in this consistent state has a cost, all those write operations are costly. But you also see that the database itself, and thus the tables are and must be on the disk.

Now for cross referencing:
If you want to get the idiosyncrasies (I hope I spelled that right) of all the persons, you would do "SELECT * FROM idio_table"; and this would probably be a single read operation. However, if you want to split this up in 50 tables the data is most likely spread out, and thus would cost far more (about 50) read operations; And even with that: how would your query look like
SELECT * FROM people_table INNER JOIN idio1_table ON people_table.id = idio1_table.id INNER JOIN idio2_table ON people_table.id = idio2_table.id ... etc

or

SELECT people_table.name FROM people_table,nose_pickers WHERE people_table.id IN nose_pickers.id; ... etc...

it will take you ages to write the right queries, and with that you also have to maintain as many indexes (to make sure you don't get duplicates) as well as alot of cross referencing; your updates will cost you several read/write operations, whereas if you kept it in 1 table it would be a single read/write operation.

I think this is not too hard to grasp; So: yes, you can write your own program that would manage the data in your way, but I think it'd be slower than the most bloated of databases, where that bloated database provides also all the tools to keep your data consistent at all times.
 
Old 04-15-2011, 09:54 PM   #22
cin_
Member
 
Registered: Dec 2010
Posts: 266

Original Poster
Rep: Reputation: 23
Quote:
...you can write your own program that would manage the data in your way, but I think it'd be slower than the most bloated of databases...
-Ramurd
What variables do you predict would cause it to be slower? I can possibly use any predictions to preemptively solve potentialities.

Last edited by cin_; 04-15-2011 at 09:56 PM. Reason: gramm'err
 
Old 04-20-2011, 04:37 AM   #23
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Rep: Reputation: 75
just the fact that you put all the data in different columns makes the reference lookup that is required to look up your data to be slower.
 
  


Reply

Tags
database, efficiency, mysql, postgresql, speed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Organising partitions chipotphe Linux - Software 21 12-21-2008 03:27 PM
Postfix: How to speed up postmap when using it on a huge database? robel Linux - Software 11 10-26-2008 05:50 AM
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 08:36 PM
LQ bookmarks needs organising noranthon LQ Suggestions & Feedback 4 03-10-2007 02:12 AM
Best Photo Organising Software Greebo Linux - Software 1 02-06-2004 05:35 AM


All times are GMT -5. The time now is 11:39 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration