Could someone help explain this code?
http://llvm.org/docs/tutorial/LangImpl2.html
I tried implementing that before and went pretty far, but I got stuck at the op precedence part. Anyway, is that a recursive descent parser? And can someone help explain how the operator precedence works? |
Quote:
But the basic concepts are the same. So it is something similar to a "recursive descent parser". Quote:
How operator precedence "works" in the language made up for that tutorial? Or how the ParseBinOpRHS function in that tutorial implements the rules of operator precedence? That function is called whenever the parser has an expression that might be the left hand operand of some larger expression. Assuming there is a binary operator as the next token, making it possible for the current expression to be its left hand operand: The function first must decide whether that expression is instead the right hand operand of some larger expression being processed by the (recursive) caller. It does so by comparing the precedence passed in by that caller to the next token's precedence. If the expression is the left operand of the next token, the function must then find the right operand of that token. To do so, it advances past the token and parses a primary expression. That primary expression might be the desired right operand or it might left operand of a subexpression and that whole subexpression would be the desired right operand. So the function uses itself recursively to construct its right operand. |
Quote:
|
Quote:
|
I'm still having difficulty understanding it, but it slightly makes sense.
I was wondering if it could be easier just to do it this way? And no matter what method, how do you handle left/right accociativity? Code:
expr ::= expr2 "+" expr2 |
Quote:
... Here are some hopefully useful links regarding expression parsing: http://www.smccd.net/accounts/hasson...icParsing.html http://en.wikipedia.org/wiki/Operator-precedence_parser . I'm impatiently waiting for your reaction regarding the fact that in both articles the algorithm is explained in English, and code samples are given too. And no, I wasn't picking articles by hand. ... http://en.wikipedia.org/wiki/Scannerless_parsing ... At all, http://en.wikipedia.org/wiki/Categor...ing_algorithms . |
Quote:
If you insist that Perl will really be better, then maybe... , because I really am intrigued by the idea of storing AST nodes as hashes. Quote:
|
Quote:
I am not saying Perl is really better, but in Perl everything is "elastic" by default, and there is no type strictness which in this case is good - again, you do not have to plan ahead, say, fields in a data structure describing a tree node, you just add hash keys. Regarding "I don't think I understand it really well" - here is a very simple question for you: how (step by step) as a human do you calculate the following expression: Code:
1 + 2 + 3 + 4 * 5 ^ 2 '^' is power operation (i.e. 2 ^ 3 is 8) which has higher precedence than other arithmetic operations. I emphasize the point - step by step. |
Quote:
Quote:
I look at all the operators and find the '^', that has the highest precedence. I solve it and replace "5 ^ 2" with "25". Then I see the '*', which has higher precedence than the rest. I replace "4 * 25" with 100. Then, because the rest is addition and it doesn't matter which way you add them, I just add all the remaining numbers up. |
Quote:
So now let's change the task by "scaling". Again, let's take the same expression: Code:
1 + 2 + 3 + 4 * 5 ^ 2 Code:
1 + 2 + 3 + 4 Let's also assume that each operand is written, say, as 30cm/1foot high/wide digit (approximately), and it's written on a wall. Similarly, operation signs ('+', '*', '^' in this case) are also written kinda in 30cm x 30cm cells, and your eyes distance from the wall is also about 30cm/1foot, i.e. at a time you can see just one operand or one operation sign. You also have a notebook (I mean paper one), a pencil and an eraser, so you can take notes and read them as needed. You are put at the left side of the expression and because of the above conditions, as I said, you can only see what is now the leftmost '1'. How would you calculate the expression under such circumstances ? You are not allowed (and do no need) to run to the other end in order to see the whole expression. Remember, the expression occupies at least 1000 feet (300m) on the wall, so, again, there is no chance you see its tail. I reiterate, you can see either one operand or one operation sign at a time, and you can take notes. And let's assume you are constructively lazy, i.e. you want to perform minimum amount of moves along the wall. Again, step by step actions you'll perform as a human. |
I am not sure, I guess I might do it like this:
I would go past the first number, the plus, and the next number. Then I check the next operator and it's another plus. Since it's left-associative, I see if it's precedence is > than the previous plus. It's not, so I calculate the first +-expression and move on... Then I get to the multiplication. It's left-associative, so I check if it's precedence > than of the plus. It is, so I skip to the multiplication. But I first see what the next operator is. It's a ^, and it's right-accociative, so I see if it's precedence ≥ the * operator. It is, so I see the next one. But there is no next one, so I do the ^, then fall back to the *, and to the +. |
Quote:
Now regarding the item in bold - I have a "bad news" for you - no matter how you formulate, but in fact it really means: "if current operator (after which there is an operand) is followed by by an operator with higher precedence, then I do this and that". Likewise, similar statements with followed by can be formulated with equal and lower precedence. Back to our sheep - assume now that your notebook has sheets with pretty big cells in it. The cell is big in a sense you can write some name in it - like 'operand', 'operation', etc., but the cell is still small in a sense it can contain just one item. So, for example, in a cell named 'operation' you can have '+' or '*' or '^', but no more than that. And in a cell named 'operand' you can have '1', '2', etc - just any one of the operands present in the expression. In the beginning you may have as many cells as you like, but ultimately try to minimize number of needed cells. So, now fully document your steps for the Code:
1 + 2 + 3 + 4 * 5 ^ 2 Having all this before your and my eyes will really help to draw further algorithmic conclusions and get closer to real code. Of course, your final calculated result should be correct. |
Quote:
First, those grammar rules are wrong. They couldn't even handle A+B+C. Probably you meant: Code:
expr ::= expr "+" expr2 Notice these corrected rules define an infinite recursion if you take a simplistic recursive approach to parsing. Try to parse an expr and the first thing to check is whether it starts with an expr. That is infinite recursion before you even look at anything. Quote:
A+B could be an expr or B+C could be an expr. If A+B is an expr then that expr + C is another expr. But if B+C is an expr, then A plus that expr is not represented by this grammar. |
I see the "1".
Then the '+'. Then the '2'. I make this tree: (+, 1, 2) I check the next operator. It's left associative, and its precedence is not greater than that of the previous operator. I then get the next number and continue the tree like this: (+, (+, 1, 2), 3) Now the the next +. The next operator after it is left-associative and has a higher precedence. So I recursively start from there. The next operator (^) is right-associative and is greater than or equal to the previous one. I again go further recursively. There is no next operator. I return back to the multiplication with this tree: (^, 5, 2) The multiplication function finds there is no next operator now. It returns this tree: (*, 4, (^, 5, 2)) The first level now finds there are no more operators, and returns this: (+, (+, (+, 1, 2), 3), (*, 4, (^, 5, 2))) EDIT: I did not see the reply by johnsfine |
Quote:
Secondly, the tree is a "book" one, but not necessarily the most convenient one. For many practical tasks a tree like this: Code:
(+, 1, 2, 3, (*, 4, (^, 5, 2))) |
All times are GMT -5. The time now is 08:51 PM. |