LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-01-2010, 10:28 AM   #16
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723

Quote:
Originally Posted by Sergei Steshenko View Post
Let's first implement tree-less calculation on the fly. The point is to better show/feel how much of already processed part of expression you really need for calculations/decision making.
I guess you can just return pre-calculated results instead of trees.

Quote:
Originally Posted by Sergei Steshenko View Post
Secondly, the tree is a "book" one, but not necessarily the most convenient one. For many practical tasks a tree like this:

Code:
(+, 1, 2, 3, (*, 4, (^, 5, 2)))
is much more convenient.
I just thought that since they are binary operators, that they would be better represented as having two operands each.
 
Old 04-01-2010, 10:42 AM   #17
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I guess you can just return pre-calculated results instead of trees.



I just thought that since they are binary operators, that they would be better represented as having two operands each.
There are two points.

1) In the first approximation you do not need a tree, and trust me, to add tree generation (and, if necessary, to eliminate on the fly calculation) is walk in the park in the correctly implemented on the fly calculator.

2) if you think about constant folding, e.g. if you want to convert, for example,

Code:
x + y + 3 + 4 + 5 * (z + 7 + 8)
into

Code:
x + y + 7 + 5 * (z + 15)
, the second form is more convenient.

Philosophically/empirically if you can do something without recursion, do it - your form is more recursive. But, again, for general understanding/feeling at this stage the tree is premature.
 
Old 04-01-2010, 11:00 AM   #18
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
The solution I prefer to this common parsing problem is to use an explicit data stack instead of much of the recursion and to have a left precedence distinct from the right precedence of each operator, allowing no ties (ties are easy to handle but not allowing them is still less confusing).

For example, consider making every left precedence odd and every right precedence even. For ordinary operators, always make the right precedence one more than the left precedence. For right associative operators (such as exponentiation in many languages) make the right precedence one less than the left precedence.

I prefer languages in which right and left precedence can get further apart, such as having an assignment operator with a very high left precedence despite its usual very low right precedence.

Ignoring startup and termination details, the basic work flow is:

You have a stack of pairs of subexpression and following operator. The stack always has at least one such pair.

The main loop does:

A) Read another primary expression and following operator and push onto the stack.

B) Repeat as long as the left precedence of the top operator is less than the right precedence of the second to top operator:
B1) Pop both pairs from the stack.
B2) Combine the two subexpressions with the operator that was second to top.
B3) Push back the pair of that resulting subexpression and the other operator.

The startup is obviously just reading the first primary expression and following operator.

The termination is simplest if you invent a dummy operator representing any token that is not a binary operator that is seen when you try to read a binary operator. That dummy operator has left precedence lower than any real right precedence.

Before step A in the main loop, you check whether the one operator you have on the stack is the dummy. If so, you exit.

Re the original question of this thread, it may be easiest to understand a recursive expression parser (such as the one in that tutorial) by seeing its similarity to the explicit stack version I just described.

The stack is just the local variables of all the recursive calls. So a net push is one level of call and a net pop is one level of return (but the pop twice and push of forming a new subexpression is combined so it is not two returns and a call). Otherwise, the operation sequence and all the decision points of the recursive version are the same as the explicit stack version.

Last edited by johnsfine; 04-01-2010 at 11:07 AM.
 
Old 04-01-2010, 11:36 AM   #19
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Could something like this work?

Code:
parseBinOp(lhs) {
	while(1) {
		op = getToken();
		rhs = parsePrimary();
		next = peekToken();
		if((assoc[next] == ASSOC_LEFT && prec[next] > prec[op]) ||
		   (assoc[next] == ASSOC_RIGHT && prec[next] >= prec[op])) {
			rhs = parseBinOp(rhs);
		}
		lhs = calculate(op, lhs, rhs);
		if(next == '\n') break;
	}
	return lhs;
}
 
Old 04-01-2010, 12:11 PM   #20
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
Could something like this work?

Code:
parseBinOp(lhs) {
	while(1) {
		op = getToken();
		rhs = parsePrimary();
		next = peekToken();
		if((assoc[next] == ASSOC_LEFT && prec[next] > prec[op]) ||
		   (assoc[next] == ASSOC_RIGHT && prec[next] >= prec[op])) {
			rhs = parseBinOp(rhs);
		}
		lhs = calculate(op, lhs, rhs);
		if(next == '\n') break;
	}
	return lhs;
}

Without going too much into details - see the line in red, and see the line in blue - should the line in blue be immediately after line in red ?

I mean, can you be sure the look-ahead token is indeed operation and not newline apparently indicating end of expression ?
 
Old 04-01-2010, 12:15 PM   #21
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Because even of this is the last binary expression, this needs to be run:

Code:
lhs = calculate(op, lhs, rhs);
 
Old 04-01-2010, 12:21 PM   #22
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
Because even of this is the last binary expression, this needs to be run:

Code:
lhs = calculate(op, lhs, rhs);
So,

Code:
assoc['\n']
is defined and returns a value high enough to make 'if' condition true ?
 
Old 04-01-2010, 12:22 PM   #23
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
This is not the final algorithm, I'm just still thinking.
 
Old 04-01-2010, 12:33 PM   #24
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
This is not the final algorithm, I'm just still thinking.
You know, please write types of the variables involved, i.e. 'op', 'rhs', 'next', 'lhs'.

Also, please publish all the related to each other subroutines, I mean that expression parsing/calculation involves cross/mutual recursion, so it's easier to comprehend all at once.

I.e. if A possibly calls B, and B possibly calls C, and C possibly calls A, then it's better to see all the A, B, C side by side.

Another problem I see (and C/C++ is not quite your friend here) - your algorithm will most likely behave erratically/ungracefully in case of errors. I.e. what will the behavior be in case of, say,

Code:
2 3 + 4
input (operation is missing between '2' and '3').
 
Old 04-01-2010, 01:36 PM   #25
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by Sergei Steshenko View Post
You know, please write types of the variables involved, i.e. 'op', 'rhs', 'next', 'lhs'.

Also, please publish all the related to each other subroutines, I mean that expression parsing/calculation involves cross/mutual recursion, so it's easier to comprehend all at once.
I disagree. I think it is better to focus on the algorithm being designed at the moment and just make reasonable assumptions about the characteristics of the data types and called functions.

Part of being a skilled programmer is doing a good job of estimating what is or isn't a reasonable assumption for the parts you aren't focusing on at the moment.

Quote:
Originally Posted by Sergei Steshenko View Post
Without going too much into details - see the line in red, and see the line in blue - should the line in blue be immediately after line in red ?

I mean, can you be sure the look-ahead token is indeed operation and not newline apparently indicating end of expression ?
I also have an issue with the line in blue, but not any issue with its position.

I'd be much more comfortable with
Code:
if( ! binary_operator(next) ) break;
Does every expression end with '\n', including invalid expressions Sergei's "2 3 + 4" example? I think not.

Quote:
Originally Posted by Sergei Steshenko View Post
So,

Code:
assoc['\n']
is defined and returns a value high enough to make 'if' condition true ?
The point should be that assoc[x] is defined for any value x that might be returned by peekToken() and that for any token that isn't a binary operator it is some value not equal to either ASSOC_LEFT or ASSOC_RIGHT so the if condition will not be true.

I'd also like to comment on an important difference between the typical non object oriented version of the above code and a typical object oriented version.

A typical non object oriented version would tend to have a predefined set of possible values return by peekToken(), so that when the token is a number or identifier, the returned value can only indicate that it is a number or identifier. It can't hold which number or identifier.

Then something like assoc[next] can be the simple array lookup that it appears to be. But the important info about which number or identifier has been tokenized must be returned through some back door.

In a more object oriented design the return from peekToken() would be a pointer or object for the whole token. So assoc[next] is not a simple array lookup. It's really the execution of a function.

Last edited by johnsfine; 04-01-2010 at 01:51 PM.
 
Old 04-01-2010, 01:47 PM   #26
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by johnsfine View Post
I disagree. I think it is better to focus on the algorithm being designed at the moment and just make reasonable assumptions about the characteristics of the data types and called functions.

Part of being a skilled programmer is doing a good job of estimating what is or isn't a reasonable assumption for the parts you aren't focusing on at the moment.
...
I know that the hardest mistakes to correct are infrastructural ones.

If we are talking about both parser and on the fly evaluator, and if they are recursive (which they are), two values should be returned:
  1. result of parsing (i.e. legal or illegal language construct);
  2. result of evaluation
.
 
Old 04-01-2010, 03:31 PM   #27
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I'm thinking that I should just drop this issue for now.

Taking my mind away from it for a while might help me understand it better, and also this topic still reminds me of yesterday's argument and makes me a little emotional and regretful.
 
Old 04-01-2010, 09:44 PM   #28
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I'm thinking that I should just drop this issue for now.

Taking my mind away from it for a while might help me understand it better, and also this topic still reminds me of yesterday's argument and makes me a little emotional and regretful.
Here are some more examples to think about:

Code:
1 + (2 - 2) + 3
- a correct expression - indeed needs recursion, inner expression legitimately returns 0 in terms of numeric value and TRUE from grammar point of view because it is correct, parsing/calculations should continue.


Code:
1 + (2   2) + 3
- a wrong expression - inner expression should return FALSE from grammar point of view because it is wrong, its value for calculations is undefined, parsing/calculations should stop.


"inner expression" in both cases means '(....)'.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Could somebody explain how this line of code works... trist007 Programming 2 05-12-2009 01:50 PM
help to explain the shell code ! nillgump Linux From Scratch 1 10-11-2008 11:09 AM
Can someone explain to me this code from K&R's 'The C Programming Language' book? frankie_DJ Programming 10 11-25-2006 01:35 PM
explain me some c code alaios Programming 4 11-28-2005 11:27 AM
explain me this c code plz alaios Programming 1 06-03-2005 05:32 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration