# New Years Data To DB - Arithmetic, Graphs and Relations

Posted 03-14-2019 at 04:58 PM by astrogeek

Updated 03-15-2019 at 12:14 PM by astrogeek (Forgot aggregate column)

Updated 03-15-2019 at 12:14 PM by astrogeek (Forgot aggregate column)

**... Or, How To Get From Arithmetic to Relational Algebra and Enjoy The Trip!**

Quote:

What follows is a tour of the landscape as I understand it, pointing out the major landmarks and points of interest without stopping to look very closely at any of them - a whirlwind tour of sorts.

To get started, recall that the original problem posed by the newyears puzzle was to find as many integers between 0 and 100 which could be produced by the basic operations of arithmetic acting on the four digits of the year. The expressions we use for that purpose enter the domain of elementary algebra as soon as we begin to substitute symbols, or

*variables*, for the numbers. Most of our programming efforts are concerned with solving those symbolic expressions using suitably defined variables within the algebra of some programming language or other.

Obviously, our efforts to produce new expressions which produce new solutions using that model have been very productive, and continue!

But now we are beginning to ask questions about the

*set of solutions*we have found, and the

*sets of expressions*which produce them, and we are in many ways no longer in the domain of arithmetic or elementary algebra in asking them! We need to use another algebra - an algebra of the domain of the questions, to find the answers we seek.

We usually think of

*algebra*only as the

*elementary algebra*we are taught in school, or the

*modern algebra*(also called abstract algebra) we would encounter in advanced courses of study. But there are as many algebras as there are ways of thinking about any given subject!

Quote:

In its most general form, algebra is the study of mathematical symbols and the rules for manipulating these symbols.

*sets*of things, we need to use the tools of sets, the symbols and notation of sets, and one or more of the

*algebras of sets*, to find those answers. So our problem leads naturally into the domain of sets (naturally, as opposed to via some contrived or obscure path).

Once we begin to learn how our questions may be

*modeled*in the language of sets, we quickly discover that many of them are actually variations of a well known class of problems known as

*covering set problems*(the term I learned long ago), or commonly, cover set problems. And immediately we enter the domain of graph theory!

While I have worked with cover sets in the past, the term Hitting set was new to me, but this short description of the problem hits the target:

Quote:

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph, with sets represented by vertices on the left, the universe represented by vertices on the right, and edges representing the inclusion of elements in sets. The task is then to find a minimum cardinality subset of left-vertices which covers all of the right-vertices. In the Hitting set problem, the objective is to cover the left-vertices using a minimum subset of the right vertices. Converting from one problem to the other is therefore achieved by interchanging the two sets of vertices.

*"minimal set"*problems, and a pool of well explored ideas and algorithms which we may draw upon to help understand and solve them! All we really need to learn is the notation or symbols used, and the rules for manipulating them - an

*algebra of the domain*, in other words.

As we learn the new algebra, which itself is unfamiliar to most of us, but not actually difficult to understand, we have in back of our minds another question, "How do I express this in a programming language? How do I do this on a computer?"

Which brings us to the

*relational database*.

A

*relation*is a specific concept from set theory, a specific type of set with specific properties, and

__the relational model for databases__is a way of organizing arbitrary information, or data into

*relations*. Set theory also provides us with

*relational operators*for manipulating relations - a

*relational algebra*...

The relational algebra is an algebra which operates on relations much the same as the arithmetic operators operate on real numbers. Not surprisingly then, a

*relational database management system*, or

**RDBMS**, is at its core a

*set processing machine*which understands the language of the

*relational algebra*, which makes it very useful indeed!

One important concept to grasp, and which shows immediately why this is useful, is that of

*closure*. The set of real numbers is closed under the basic operations of arithmetic, for example. What this means is that the result of applying one or more of those operations to a member of the set of real numbers, is another real number.

Similarly,

__the set of relations is closed under the operations of the relational algebra__- the result is always another relation.

*Relations*go in,

*relations*come out. The all important consequence of this is...

Quote:

...The fact that the output from any given relational operation is another relation is referred to as the relational

**closure**property. ... Closure means we can write**nested relational expressions**- i.e. relational expressions in which the operands are themselves represented by relational expressions in turn, of potentially arbitrary complexity. (There is an obvious analogy between the ability to nest relational expressions in the relational algebra and the ability to nest arithmetic expressions in ordinary arithmetic; indeed, the fact that relations are closed under the algebra is important for exactly the same kinds of reasons that the fact that numbers are closed under ordinary arithmetic is important.)*An Introduction To Database Systems, 7th Ed, C. J. Date, pp 152*But the current relational technology is really very good at what it does! When we realize that it actually is a

*set processing machine*which implements and understands the relational algebra, and use it as such, we obtain

*great*benefits for our trouble!

So that, in a nutshell, is how we get from a newyears puzzle to a relational database, with visits to set theory and graph theory along the way - and why all of that is fun to know, and may be useful to us!

As a quick example of how we may use all of this to find minimal sets of expressions (in my data model) which produce all solutions, here is a slightly simplified template algorithm in pseudo SQL which does that for us with repeated application (the while block is a wrapper, not part of SQL)...

Code:

while {The set of all solutions} is not empty { SELECT expression FROM {The set of all expressions} JOIN {The set of all solutions} USING {Product terms} WHERE {Minimizing criteria} GROUP BY expression ORDER BY {Optimal ordering criteria} LIMIT 1 DELETE SetB FROM {The set of all solutions} AS SetA JOIN {Our selected expressions set} USINGexpressionJOIN {The set of all solutions} AS SetB USINGresult}

*product*operator, producing one kind of product of sets.

SELECT...WHERE is the relational

*restrict*operator, restricting the result to some subset.

GROUP BY aggregate column or columns, expression identifier for this example.

ORDER BY imposes some order on the result set which is an unordered set by definition. Here we use it to place our single desired expression at top of the list which we then LIMIT to that one expression.

You can see, I hope, that this focuses our efforts on finding better Minimizing criteria and Optimal ordering criteria, each ultimately expressed as some property of the input or output sets. All variations of this algorithm are forms of the basic greedy algorithm which produces only approximate solutions, but does so in polynomial time - proportional to the size of the sets. All things considered, it works pretty well!

But if we want an exact solution, there is a "

*covering algebra*" specifically defined for use in finding exact minimal covering sets... if we have the time. But that is a topic for another day!

As a final note, I found this concise but understandable paper on covering sets online from MIT, which you may find useful: Cover set PDF

I hope this overview is at least of some use to you in understanding the

*why*, if not the

*how*of this very interesting subset of the mathematics related to our little puzzle!

Total Comments 0