 Welcome to the most active Linux Forum on the web.

Notices

Rate this Entry

# Building expressions lists - 1

Posted 05-29-2019 at 02:46 PM by dogpatch
Updated 06-19-2019 at 02:39 PM by dogpatch

Version 3 of my LQ Challenge program (LQpz_3.c) was a brute force approach, generating Reverse Polish Notation ('rpn') expressions of all conceivable combinations of 4 digits with arithmetic operations + - * / ^ and _ ( the last being for leading negative sign in front of any of the 4 digits). The RPN expression thus generated was evaluated using fractional-exponential integer math routines ('fx') and, if the value was within the target range of 0 to 100, the rpn was converted to an algebraic expression. Selections of such solutions were stored and displayed.

In early March I began coding version 4 (LQpz_4.c), which quickly became my method for creating a list of algebraic expressions using 4 variables 'a' 'b' 'c' and 'd'. The goal: generate a list of algebraic expressions representing a minimum set that would produce all 46980 solutions in concord with version 3 and with the results found by fellow LQ programmers Beryllos and astrogeek.

Still using the brute force approach of generating rpn expressions, my new program generated a total of 14,099,785 equations in an exhaustive run for all 715 4-digit combinations between '0000' and '9999'. As many of these equations were of identical 'abcd' form, I added logic to store each 'abcd' expression as it was generated into a sorted array, eliminating duplicates. This resulted in a 'generous' list of 5594 expressions, including leading negatives.

At this point, it became clear that a more intelligent sorting logic would help greatly, so I employed a fractional ranking system. I had already created the header file (fx.h) for fractional-exponential math which included this structure:
Code:
```struct frac
{
IntSize num;
IntSize den;
};```
I now created an expression-sorting header file (exp.h) which stored the expressions in a structure thus:
Code:
```struct altst
{
char exp[ExpSize];
struct frac jfrac;
short altcnt;
short ndx;
} expj[maxj], expm[maxj], *xpj;```
(Note that IntSize = long, and ExpSize = 24, but in examples below, the expression length was never more than 16 bytes, and the fractional components never greater than type short)

Alternate sort function for this array, called by both qsort() and bsearch():
Code:
```int alt_comp_excnt(const void *pex1, const void *pex2) {
const struct altst *ex;
char c0, c1, c2, val, ord[] = "abc+-*/^()";
ex = pex1;
ex = pex2;
if (ex->jfrac.den != ex->jfrac.den)	// shared expressions (1 = required)
return (ex->jfrac.den - ex->jfrac.den);
if (ex->jfrac.num != ex->jfrac.num)	// uses at above level
return (ex->jfrac.num - ex->jfrac.num);
if (ex->altcnt != ex->altcnt)		// overall uses
return(ex->altcnt - ex->altcnt);
if (strlen(ex->exp) != strlen(ex->exp))	// compact expression
return (strlen(ex->exp) - strlen(ex->exp));
for (c0 = 0; c0 < strlen(ex->exp); c0++) {	// order by above 'alphabet'
for (c1 = 0; c1 < 2; c1++) {
val[c1] = 0;
for (c2 = 0; c2 < strlen(ord); c2 ++ ) {
if (ex[c1]->exp[c0] == ord[c2]) {
val[c1] = c2;
c2 = strlen(ord);
}}}
if (val != val)
return(val - val);
}
return(strcmp(ex->exp,ex->exp));		// ASCII order (never used?)
}```
As the above code indicates, the array is sorted primarily on the denominator half of the fractional expression. This number represents the lowest number of expressions that share a particular solution. So, if 5 expressions are used to solve a given value, all 5 expressions will contain 5 as the denominator value. If only one expression is used to solve a given target value, it will contain 1 here, which means it is a required expression for that solution. The second sort field is the numerator, which indicates how many times the expression is used at the shared level. (In the exhaustive run, if an expression achieves a smaller shared [denominator] value, the numerator is reset to 1.) If the shared (denominator) is 1, the numerator is how many times the expression is required for a solution. The next sort value is the overall number of times the expression is used in an exhaustive run. These 3 sort values have the effect of raising to the top of the array all required expressions, the most required first, etc, and then all expressions sharing one or more solutions with just one other expression, etc. The next sort criteria is simply the string length of the expression, briefer expressions being favored. And the final sort criteria is my own 'alphabetical' sort, favoring concatenation, then simple operations, then more complex operations reading the expression from left to right. This has the effect of favoring what I consider the better looking expressions, and also imposes a more predictable order than otherwise.

The resulting array looked like this (shown without the 4th structure number, the index into the array)
Code:
```ab-c-d          0752/0001 4880
ab+c-d          0656/0001 4873
ab+c*d          0594/0001 3382
ab-cd           0536/0001 3891
a+b*(c+d)       0430/0001 4180
.
.
.
ab^(c+d)        0001/0003 0146
-ab^(c+d)       0001/0003 0056
-a^-b+cd        0533/0004 2400
ab+c^-d         0520/0004 2358
.
.
.```
So, for example, the expression "-a^-b+cd" is employed 2400 times in an exhaustive run. It shares a given solution with just 3 other expressions (denominator=4), and does so 533 times.

Where two or more expressions rank identically for the first 3 sort criteria, the briefer and 'better-looking' expression will rise above its peers. Generally, such a condition indicates that those two or more expressions share the exact same solution set. E.g:
Code:
```.
.
.
(a+b)*c-d       0372/0004 04021
a*(b+c)-d       0372/0004 04021
((a+b)*c)-d     0372/0004 04021
(a*(b+c))-d     0372/0004 04021
.
.
.```
It's easy to see how these 4 expressions share the same solution set; they are algebraically equivalent. (Remember, digit permutation of the starting digits means that the variables 'a' thru 'd' are interchangeable.) Any time one expression produces a valid solution, all 4 will produce the same valid solution. So they all share a common shared denominator value of 4, all share the same solution with the other 3 exactly 372 times, and all are employed a total of 4021 times in the exhaustive run. Therefore, in building a minmimal expression list, all but the first one can be eliminated; the other 3 will never be needed to produce the full 46980 solutions.

But other groups of expressions are not so clearly identified. E.g:
Code:
```.
.
.
(a/b-c)^-d      0001/0005 0702
(-a-b/-c)^-d    0001/0005 0702
(-a/-b-c)^-d    0001/0005 0702
(-a-(-b/c))^-d  0001/0005 0702
.
.
.```
These 4 expressions at first glance appear to share the same solution set; they share a solution with 4 other expressions exactly once, and are all used exactly 702 times. But it's difficult to prove that they are algebraic equivalents, much less prove so programmatically. Moreover, there are only 4 expressions here, yet the shared (denominator) is 5. There must be one other expression somewhere in the list that was one of the 5 shared expressions. Where is it? That missing expression is clearly not equivalent with the 4 shown. Which expression, if any, may be eliminated from the minimum list? It's not clear.

This is where I was as of March 15 2019.
Posted in Uncategorized

 My LQ Syndicate Latest Threads LQ News Twitter: @linuxquestions Facebook: linuxquestions Google+: linuxquestions