LinuxQuestions.org - Database design

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Database design (https://www.linuxquestions.org/questions/programming-9/database-design-478391/)

I'm busy with a database. The main table contains a number of fields that each retrieve info from another table. Something like categories, symptoms etc. The user can pick one of the symptoms and one of the categories etc.

Code:

main

PK

symptom

category

area



symptom

PK

symptom_txt



category

PK

category_txt



area

PK

area_txt

I currently store the primary keys in the fields of the main table instead of the (selected) text values.
The alternative is to store the text values.

Disadvantage of the current approach is the fact that outdated records in i.e. symptoms need to be kept to be able to retrieve the text values at later stages and that queries need joins to retrieve the text values which might have a speed impact (but maybe retrieving records with lots of text is also slow). Also the user should not be allowed to change the text in i.e. a sympton record as this influences old records. Advantage obviously is the size.

What are the guidelines to define the better approach.

Does your db support clustered tables? Storing keys in a table has problems down the road. Use the actual text. If you need performance find a way to cluster the tables -

this means that the values in the parent table are actually physically part of the child table as well. If you have a bunch of children, then create indexes, and again use real values from columns, not a constructed key.

Are you setting constraints to prevent widows?

Hello Wim,

Is this a simplified diagram? I ask because, judging only on what is provided here, it does not seem as though symptom, category, and area should be separate tables. Each table in a database should be an entity in the system you are modelling, and each field in the table should be an attribute of the entity. If the only fields in the other tables are the PK (which seems to be a surrogate key, at that) and the one other field, then perhaps these others are really attributes of whatever entity main represents (a disease, syndrome, disorder, or whatever).

Database design is part science, part art; but it relies on a thorough understanding of the system being modelled. Without that understanding, I can only offer suggestions. If you want to provide more details, I would be more than happy to offer further ideas.

To answer your question directly--if the other three tables should be separate tables, then you probably have the correct design. Storing symptom_txt only in symptom and not also in main, for example, avoids data redundancy. This not only saves space, as you mention, but also prevents a number of data anomalies. While the performance impact of JOINs is something to keep in mind, most RDBMSs are quite efficient at JOINs on indexed columns; a database with properly normalized tables is going to need them.

I think there is more going here than meets the eye, however, and would be more than happy to provide additional input if you were to give more details.

I think your current approach (of storing the foreign keys instead of the description of the selected key) is correct. It is also classically relational. Consider an "symptom" that has to be rephrased. If you change the text of the symptom in one place, it will have an effect in all data records.

Retrival may b a little slow, but that should b OK. Most databases have a query execution plan that will show the number of selects etc. Proper indexing is the key.

You can handle outdated records (symptom masters) by marking these masters as deleted beyond a certain date (for adding new data into the main table and permitting updates).

For easy reporting, you can createa view that has the text of the selected key also. (Mind you, retrival will be run time and still a liitle slow).

End

@Jim mcnamara
I don't know if the database supports clustered tables. I'm running MySQL 4.0.23a at the moment although the final system will be a 4.1 flavour. I will read up on clustering.
I use MyISAM tables (without a specific reason), so no constraints.
What are widows? Records that have a foreign key set, but the record in another table with the related primary key is gone? In that case, no. (as I'm using MyISAM tables). However the application takes care of it when necessary. Although this is not 100% foolproof, maintenance mode allows the user to fix these.

@zaichik
Sorry that the description was not clear. I've described 4 tables in the opening post so I think that we are on the same track there.

Can you please tell me if I should index the symptom, area and category fields in the main table for performance? As they are primary keys in the 'help-tables', they are already indexed.

@AnanthaP
Although it's true that there might be a need to rephrase a symptom, I do not allow the user to do so. I desperately try to prevent that, when an user changes category_txt from 'power failure' to 'I had to go to the toilet', this change will affect the existing records that are entered with 'power failure' as it does not make sense in the records. So once e.g a symptom is defined, it can not be changed anymore. Maybe I will make it a bit more luxury and check if it's used before blocking modification. But that might cause problems in concurrent use.

A bit of info:
I'm designing an incident logging system to replace an existing one. One of the limitations in the existing one is that my employer has to call in the (external) software developer to modify the system when something trivial like a new category has to be added. This comes at a cost, can not be done immediately, etc etc.
I'm a strong believer in flexible solutions with a minimum of limitations for the user.

The system will log incidents in a broadcast network.

Code:

srvc1 -+---> tx1 ---> subscribers(a,b,c)

      |

      +---> tx2 -+-> subscribers(d,e)

                  |

                  +---> tx3 ---> subscribers(f,g,h)

                  |

                  +---> tx4 ---> subscribers(x,y,z)



srvc2 -+---> tx2 -----> subscribers(d,e)

      |

      +---> tx3 -----> subscribers(f,g,h)

The network topology can look like shown above (although the real one is more complex and includes satellite, terrestrial and mobile TV in a mix of old-fashioned anlog and digital).

When an incident is logged, the user selects a number of things like category, symptom, area of loss etc etc. He/she also selects one or more services (srvc) and transmitters (tx). This is the tricky part as described by the following situations:
a)
Simple:
If the user selects srvc1/tx3, this will be recorded in the database.
b)
More complex:
If the user however selects srvc1/tx2, srvc1/tx2 and srvc1/tx3 and srvc1/tx4 will be recorded for the same incident as tx3 and tx4 receive signals from tx2.
c)
It is also possible that due to a power failure, tx2 is down. In that situation srvc1/tx2 and srvc2/tx2 need to be recorded for the incident as well as srvc1/tx3 and srvc1/tx4 when the user selects tx2 only (without srvc).

So the database basically stores the complete network topology as well including all dependencies.

The next issue is that although srvc1/tx3 might be down as a result of srvc1/tx2 being down, it will not affect subscribers(f,g,h) as the service was not provided to them at that time (timetable). In that case srvc1/tx2 and srvc1/tx4 will be recorded for the incident but srvc1/tx3 will not be recorded.