# Building expressions lists - 1

Version 3 of my LQ Challenge program (LQpz_3.c) was a brute force approach, generating Reverse Polish Notation ('rpn') expressions of all conceivable combinations of 4 digits with arithmetic operations + - * / ^ and _ ( the last being for leading negative sign in front of any of the 4 digits). The RPN expression thus generated was evaluated using fractional-exponential integer math routines ('fx') and, if the value was within the target range of 0 to 100, the rpn was converted to an algebraic expression. Selections of such solutions were stored and displayed.

In early March I began coding version 4 (LQpz_4.c), which quickly became my method for creating a list of algebraic expressions using 4 variables 'a' 'b' 'c' and 'd'. The goal: generate a list of algebraic expressions representing a minimum set that would produce all 46980 solutions in concord with version 3 and with the results found by fellow LQ programmers Beryllos and astrogeek.

Still using the brute force approach of generating rpn expressions, my new program generated a total of 14,099,785 equations in an exhaustive run for all 715 4-digit combinations between '0000' and '9999'. As many of these equations were of identical 'abcd' form, I added logic to store each 'abcd' expression as it was generated into a sorted array, eliminating duplicates. This resulted in a 'generous' list of 5594 expressions, including leading negatives.

At this point, it became clear that a more intelligent sorting logic would help greatly, so I employed a fractional ranking system. I had already created the header file (fx.h) for fractional-exponential math which included this structure:

I now created an expression-sorting header file (exp.h) which stored the expressions in a structure thus:

(Note that IntSize = long, and ExpSize = 24, but in examples below, the expression length was never more than 16 bytes, and the fractional components never greater than type short)

Alternate sort function for this array, called by both qsort() and bsearch():

As the above code indicates, the array is sorted primarily on the denominator half of the fractional expression. This number represents the lowest number of expressions that share a particular solution. So, if 5 expressions are used to solve a given value, all 5 expressions will contain 5 as the denominator value. If only one expression is used to solve a given target value, it will contain 1 here, which means it is a required expression for that solution. The second sort field is the numerator, which indicates how many times the expression is used at the shared level. (In the exhaustive run, if an expression achieves a smaller shared [denominator] value, the numerator is reset to 1.) If the shared (denominator) is 1, the numerator is how many times the expression is required for a solution. The next sort value is the overall number of times the expression is used in an exhaustive run. These 3 sort values have the effect of raising to the top of the array all required expressions, the most required first, etc, and then all expressions sharing one or more solutions with just one other expression, etc. The next sort criteria is simply the string length of the expression, briefer expressions being favored. And the final sort criteria is my own 'alphabetical' sort, favoring concatenation, then simple operations, then more complex operations reading the expression from left to right. This has the effect of favoring what I consider the better looking expressions, and also imposes a more predictable order than otherwise.

The resulting array looked like this (shown without the 4th structure number, the index into the array)

So, for example, the expression "-a^-b+cd" is employed 2400 times in an exhaustive run. It shares a given solution with just 3 other expressions (denominator=4), and does so 533 times.

Where two or more expressions rank identically for the first 3 sort criteria, the briefer and 'better-looking' expression will rise above its peers. Generally, such a condition indicates that those two or more expressions share the exact same solution set. E.g:

It's easy to see how these 4 expressions share the same solution set; they are algebraically equivalent. (Remember, digit permutation of the starting digits means that the variables 'a' thru 'd' are interchangeable.) Any time one expression produces a valid solution, all 4 will produce the same valid solution. So they all share a common shared denominator value of 4, all share the same solution with the other 3 exactly 372 times, and all are employed a total of 4021 times in the exhaustive run. Therefore, in building a minmimal expression list, all but the first one can be eliminated; the other 3 will never be needed to produce the full 46980 solutions.

But other groups of expressions are not so clearly identified. E.g:

These 4 expressions at first glance appear to share the same solution set; they share a solution with 4 other expressions exactly once, and are all used exactly 702 times. But it's difficult to prove that they are algebraic equivalents, much less prove so programmatically. Moreover, there are only 4 expressions here, yet the shared (denominator) is 5. There must be one other expression somewhere in the list that was one of the 5 shared expressions. Where is it? That missing expression is clearly not equivalent with the 4 shown. Which expression, if any, may be eliminated from the minimum list? It's not clear.

This is where I was as of March 15 2019.

In early March I began coding version 4 (LQpz_4.c), which quickly became my method for creating a list of algebraic expressions using 4 variables 'a' 'b' 'c' and 'd'. The goal: generate a list of algebraic expressions representing a minimum set that would produce all 46980 solutions in concord with version 3 and with the results found by fellow LQ programmers Beryllos and astrogeek.

Still using the brute force approach of generating rpn expressions, my new program generated a total of 14,099,785 equations in an exhaustive run for all 715 4-digit combinations between '0000' and '9999'. As many of these equations were of identical 'abcd' form, I added logic to store each 'abcd' expression as it was generated into a sorted array, eliminating duplicates. This resulted in a 'generous' list of 5594 expressions, including leading negatives.

At this point, it became clear that a more intelligent sorting logic would help greatly, so I employed a fractional ranking system. I had already created the header file (fx.h) for fractional-exponential math which included this structure:

Code:

struct frac { IntSize num; IntSize den; };

Code:

struct altst { char exp[ExpSize]; struct frac jfrac; short altcnt; short ndx; } expj[maxj], expm[maxj], *xpj;

Alternate sort function for this array, called by both qsort() and bsearch():

Code:

int alt_comp_excnt(const void *pex1, const void *pex2) { const struct altst *ex[2]; char c0, c1, c2, val[2], ord[] = "abc+-*/^()"; ex[0] = pex1; ex[1] = pex2; if (ex[0]->jfrac.den != ex[1]->jfrac.den) // shared expressions (1 = required) return (ex[0]->jfrac.den - ex[1]->jfrac.den); if (ex[0]->jfrac.num != ex[1]->jfrac.num) // uses at above level return (ex[1]->jfrac.num - ex[0]->jfrac.num); if (ex[1]->altcnt != ex[0]->altcnt) // overall uses return(ex[1]->altcnt - ex[0]->altcnt); if (strlen(ex[0]->exp) != strlen(ex[1]->exp)) // compact expression return (strlen(ex[0]->exp) - strlen(ex[1]->exp)); for (c0 = 0; c0 < strlen(ex[0]->exp); c0++) { // order by above 'alphabet' for (c1 = 0; c1 < 2; c1++) { val[c1] = 0; for (c2 = 0; c2 < strlen(ord); c2 ++ ) { if (ex[c1]->exp[c0] == ord[c2]) { val[c1] = c2; c2 = strlen(ord); }}} if (val[0] != val[1]) return(val[0] - val[1]); } return(strcmp(ex[0]->exp,ex[1]->exp)); // ASCII order (never used?) }

The resulting array looked like this (shown without the 4th structure number, the index into the array)

Code:

ab-c-d 0752/0001 4880 ab+c-d 0656/0001 4873 ab+c*d 0594/0001 3382 ab-cd 0536/0001 3891 a+b*(c+d) 0430/0001 4180 . . . ab^(c+d) 0001/0003 0146 -ab^(c+d) 0001/0003 0056 -a^-b+cd 0533/0004 2400 ab+c^-d 0520/0004 2358 . . .

Where two or more expressions rank identically for the first 3 sort criteria, the briefer and 'better-looking' expression will rise above its peers. Generally, such a condition indicates that those two or more expressions share the exact same solution set. E.g:

Code:

. . . (a+b)*c-d 0372/0004 04021 a*(b+c)-d 0372/0004 04021 ((a+b)*c)-d 0372/0004 04021 (a*(b+c))-d 0372/0004 04021 . . .

But other groups of expressions are not so clearly identified. E.g:

Code:

. . . (a/b-c)^-d 0001/0005 0702 (-a-b/-c)^-d 0001/0005 0702 (-a/-b-c)^-d 0001/0005 0702 (-a-(-b/c))^-d 0001/0005 0702 . . .

This is where I was as of March 15 2019.

Total Comments 0