ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Let's first implement tree-less calculation on the fly. The point is to better show/feel how much of already processed part of expression you really need for calculations/decision making.
I guess you can just return pre-calculated results instead of trees.
Quote:
Originally Posted by Sergei Steshenko
Secondly, the tree is a "book" one, but not necessarily the most convenient one. For many practical tasks a tree like this:
Code:
(+, 1, 2, 3, (*, 4, (^, 5, 2)))
is much more convenient.
I just thought that since they are binary operators, that they would be better represented as having two operands each.
I guess you can just return pre-calculated results instead of trees.
I just thought that since they are binary operators, that they would be better represented as having two operands each.
There are two points.
1) In the first approximation you do not need a tree, and trust me, to add tree generation (and, if necessary, to eliminate on the fly calculation) is walk in the park in the correctly implemented on the fly calculator.
2) if you think about constant folding, e.g. if you want to convert, for example,
Code:
x + y + 3 + 4 + 5 * (z + 7 + 8)
into
Code:
x + y + 7 + 5 * (z + 15)
, the second form is more convenient.
Philosophically/empirically if you can do something without recursion, do it - your form is more recursive. But, again, for general understanding/feeling at this stage the tree is premature.
The solution I prefer to this common parsing problem is to use an explicit data stack instead of much of the recursion and to have a left precedence distinct from the right precedence of each operator, allowing no ties (ties are easy to handle but not allowing them is still less confusing).
For example, consider making every left precedence odd and every right precedence even. For ordinary operators, always make the right precedence one more than the left precedence. For right associative operators (such as exponentiation in many languages) make the right precedence one less than the left precedence.
I prefer languages in which right and left precedence can get further apart, such as having an assignment operator with a very high left precedence despite its usual very low right precedence.
Ignoring startup and termination details, the basic work flow is:
You have a stack of pairs of subexpression and following operator. The stack always has at least one such pair.
The main loop does:
A) Read another primary expression and following operator and push onto the stack.
B) Repeat as long as the left precedence of the top operator is less than the right precedence of the second to top operator:
B1) Pop both pairs from the stack.
B2) Combine the two subexpressions with the operator that was second to top.
B3) Push back the pair of that resulting subexpression and the other operator.
The startup is obviously just reading the first primary expression and following operator.
The termination is simplest if you invent a dummy operator representing any token that is not a binary operator that is seen when you try to read a binary operator. That dummy operator has left precedence lower than any real right precedence.
Before step A in the main loop, you check whether the one operator you have on the stack is the dummy. If so, you exit.
Re the original question of this thread, it may be easiest to understand a recursive expression parser (such as the one in that tutorial) by seeing its similarity to the explicit stack version I just described.
The stack is just the local variables of all the recursive calls. So a net push is one level of call and a net pop is one level of return (but the pop twice and push of forming a new subexpression is combined so it is not two returns and a call). Otherwise, the operation sequence and all the decision points of the recursive version are the same as the explicit stack version.
This is not the final algorithm, I'm just still thinking.
You know, please write types of the variables involved, i.e. 'op', 'rhs', 'next', 'lhs'.
Also, please publish all the related to each other subroutines, I mean that expression parsing/calculation involves cross/mutual recursion, so it's easier to comprehend all at once.
I.e. if A possibly calls B, and B possibly calls C, and C possibly calls A, then it's better to see all the A, B, C side by side.
Another problem I see (and C/C++ is not quite your friend here) - your algorithm will most likely behave erratically/ungracefully in case of errors. I.e. what will the behavior be in case of, say,
You know, please write types of the variables involved, i.e. 'op', 'rhs', 'next', 'lhs'.
Also, please publish all the related to each other subroutines, I mean that expression parsing/calculation involves cross/mutual recursion, so it's easier to comprehend all at once.
I disagree. I think it is better to focus on the algorithm being designed at the moment and just make reasonable assumptions about the characteristics of the data types and called functions.
Part of being a skilled programmer is doing a good job of estimating what is or isn't a reasonable assumption for the parts you aren't focusing on at the moment.
Quote:
Originally Posted by Sergei Steshenko
Without going too much into details - see the line in red, and see the line in blue - should the line in blue be immediately after line in red ?
I mean, can you be sure the look-ahead token is indeed operation and not newline apparently indicating end of expression ?
I also have an issue with the line in blue, but not any issue with its position.
I'd be much more comfortable with
Code:
if( ! binary_operator(next) ) break;
Does every expression end with '\n', including invalid expressions Sergei's "2 3 + 4" example? I think not.
Quote:
Originally Posted by Sergei Steshenko
So,
Code:
assoc['\n']
is defined and returns a value high enough to make 'if' condition true ?
The point should be that assoc[x] is defined for any value x that might be returned by peekToken() and that for any token that isn't a binary operator it is some value not equal to either ASSOC_LEFT or ASSOC_RIGHT so the if condition will not be true.
I'd also like to comment on an important difference between the typical non object oriented version of the above code and a typical object oriented version.
A typical non object oriented version would tend to have a predefined set of possible values return by peekToken(), so that when the token is a number or identifier, the returned value can only indicate that it is a number or identifier. It can't hold which number or identifier.
Then something like assoc[next] can be the simple array lookup that it appears to be. But the important info about which number or identifier has been tokenized must be returned through some back door.
In a more object oriented design the return from peekToken() would be a pointer or object for the whole token. So assoc[next] is not a simple array lookup. It's really the execution of a function.
I disagree. I think it is better to focus on the algorithm being designed at the moment and just make reasonable assumptions about the characteristics of the data types and called functions.
Part of being a skilled programmer is doing a good job of estimating what is or isn't a reasonable assumption for the parts you aren't focusing on at the moment.
...
I know that the hardest mistakes to correct are infrastructural ones.
If we are talking about both parser and on the fly evaluator, and if they are recursive (which they are), two values should be returned:
result of parsing (i.e. legal or illegal language construct);
I'm thinking that I should just drop this issue for now.
Taking my mind away from it for a while might help me understand it better, and also this topic still reminds me of yesterday's argument and makes me a little emotional and regretful.
I'm thinking that I should just drop this issue for now.
Taking my mind away from it for a while might help me understand it better, and also this topic still reminds me of yesterday's argument and makes me a little emotional and regretful.
Here are some more examples to think about:
Code:
1 + (2 - 2) + 3
- a correct expression - indeed needs recursion, inner expression legitimately returns 0 in terms of numeric value and TRUE from grammar point of view because it is correct, parsing/calculations should continue.
Code:
1 + (2 2) + 3
- a wrong expression - inner expression should return FALSE from grammar point of view because it is wrong, its value for calculations is undefined, parsing/calculations should stop.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.