Correct way to derive parse trees, for a given input, to show ambiguity.
1 Attachment(s)
Is it correct to state that the below grammar can handle the expression: a - b * c, by the below two different parse trees.
Code:
Grammar: The decision should have been based on the lookahead symbol only. If possible show by a C language code snippet, how the implementation of the second derivation, is correct, as the book titled: Compliers and Compiler Generators An Introduction With C++, by Patrick D. Terry, shows the above example, in Section 6.4, page #131 For me, the ambiguous grammar can only be shown by the below second derivation: Code:
Parse tree #2: But, the bigger question is am not sure if the implementation of the above grammar, say in C, would leave any chance for the second parse tree generation, either of the first type (Patrick Terry's book); or of the second type (inspired by Louden's book, whose screenshot is attached). Only code snippet (say, in C) can clear this. |
These rules can generate any sequence of {abc}[{-/}{abc}]* (Here I replaced the operators least they overlap with metacharacters.)
Indeed, if you use these rules for parsing, you'll find they are ambiguous. |
Quote:
Code:
2 +3+4 * 5/6 -7 +8 Code:
Grammar: Better, if could state some implementation details about the new grammar being used to build an integrated parser-lexer, with lookahead of lexer being passed on to the parser. But, have to tell why one approach is correct. Or, if both approaches are correct, then why? Edit: It is not a question of the grammar being ambiguous or not, but in syllabus the topic is mandatory, and hence the example must be drawn from an ambiguous grammar only. |
If you know it is ambiguous, then you have to accept that multiple parsing-trees are equally valid.
In real life, additional information should be added about associativity and precedence. Remember the simple example: 2-3-4 usually means (2-3)-4, not 2-(3-4) |
Quote:
Code:
Expression = Expression "-" Expression Code:
1 E -> E - E Quote:
I think you may be confusing yourself by trying to see code where there is none. The question and example grammar you have posted have nothing obvious to do with scanning the input line or any lookahead symbols - nothing to do with any parser or code at all, nor making of decisions. It appears to be all about understanding that the grammar itself is ambiguous because there is more than one way to produce the sentence a-b*c from it (i.e., more than one parse tree). Deriving those parse trees (production graphs) from the grammar is what makes the ambiguity visible, or as the questions says, "to show the ambiguity". Important things to remember are that the grammar produces or generates the sentence, and the production graph (i.e., parse tree) visually represents how it does so. This is why the grammars are called generative grammars, and the rules are called productions. So, to produce a-b*c from this grammar we start with the start symbol E, and replace it with each of its right sides. We have two possible choices which produce the sentence: Rules 1 and 2, E -> E - E and E -> E * E respectively. First, using production 1 (rule 1), both leftmost and rightmost derivations produce the same parse tree: Code:
Rule order, leftmost derivation: (1,3,4) (2,3,5) (3,6) Code:
Rule order, leftmost derivation: (2,1,3,4) (3,5) (3,6) This is the correct way to derive parse trees, for a given input, to show ambiguity. |
Quote:
[Very sorry, but cannot resist to ask that as C follows rightmost derivation (BUP); i.e. one statement is processed in stack, at a time. So, C has in its context, a single statement. But, why the error-reporting is so imprecise in C, as an error is ascribed often, to a different line number] Though, that means that the whole line must be kept in primary storage, i.e. stack (for BUP). Hence, parser must be a seperate phase than lexer. As a BUP parser can apply rule #2, before rule #1; so would not focus on that. The focus should be on top-down parser. For the case of top-down parsing, can have more economical space storage; as there can be a closed-loop parser-scanner (or, let me call it loop-type parser), where the lexer returns one token on-demand to the parser. But, for this loop-type parser, where the derivation is decided based on each subsequent token supplied by the lexer, please tell : How the left-to-right parsing can pick up the rule #2 first, (as is done in your second diagram) rather than only being able to pick rule #1 first. I can only derive the first diagram under the leftmost derivation, as shown below. And, am unable to derive the second diagrams's derivation, using leftmost derivation. For the given input: a - b * c, for the loop-type parser, the parser is fed the token 'a' first. Will apply rule #4 first. Next, should apply rule #3. Next, the lexer provides '-' token. This provides the incomplete context for applying rule #1. At this point, need next token, which is 'b'. But, the leftmost derivation has successfully got : E + b till now. But, that means, can apply rule #1. However, need lookahead to save on unsuccessful application. That yields the token: *. Now, have: E + b * Next, rule no. 5, is applied to get: E + F * Then, rule no. 3, is applied to get: E + E * Next, needs more input to decide, if rule #2 can be applied, or it is an erroneous input, i.e. if no suitable token(s) after *. The next token is: c. Hence, get: E + E * c Then, rule #6 is applied, to get: E + E * F Next, rule #3 is applied to get: E + E * E Next, rule #2 is applied to get: E + E Next, apply rule #1 to get : E. But, even here cannot see how for the first diagram, get the first set, i.e. (1,3,4). |
I am somewhat reluctant to add further comment for fear of adding confusion. But it is an important point to understand and I have begun, so one more attempt at clarity...
Only you can judge whether and how this information might fit into your current class plan. Consider carefully. Quote:
Code:
Correct way to derive parse trees, for a given input, to show ambiguity. Can the grammar produce this sentence? If so then the sentence is valid in the language described by the grammar. Can the grammar produce this sentence in more than one way? If so then the grammar is ambiguous. This grammar was chosen for its ambiguity and the question is how to make the ambiguity visible by deriving the sentence from the grammar in all possible ways (i.e., all parse trees). Derivation (pencil and paper or chalkboard, not parser) is performed by beginning with the start symbol nonterminal and rewriting it in terms of each of its right-hand sides. Then repeatedly rewriting nonterminals in the resultant forms until there are none left. The sequences of replacements which lead to the sentence provide the parse tree(s). Quote:
Here reproduced in full, that first derivation (i.e., parse tree), leaving similar derivation of the other tree as an exercise. And remember - there is no scanner, no parser and no parse method in this context. Code:
The grammar: |
Here, the exposure (as well as inclination) to practical implementation is very less (if at all), and that makes me worried enough, to just state that the ambiguity is a theoretical one.
So, students need to go here the reverse way, i.e. first understand the way it can be done. Though, I fear if coded, the top-down parser would fail due to left recursion But, again that has to be shown before going onto the LL(1) grammar. Otherwise, the disconnect would be too much that the students are mugging, without a real understanding of the issues involved. Yes, the theoretical ways need be shown of leftmost and rightmost derivation, but that would only confuse more, than help, in the absence of implementation. The current context is top-down parsing, without reaching the LL parsing. The practical (implementation) part would be needed to show how leftmost derivation would occur, both by parse tree derivation, and code. Need show then how left-recursive grammar makes it fail. Similarly, need show how rightmost derivation would occur, by parse tree derivation for a given input string. I think that the rightmost derivation needs slightly more efforts, as keeping in stack, and that needs the concept of handles immediately. So, can postpone the implementation of rightmost derivation, unless am wrong. The focus for now (till reach LR parsing) can be on implementation of leftmost derivation alone. My lack is there, but somewhere it has to be plugged. --------------------------------------------------------- Coming back, must show for the top-down parser the different implementation methods: (1) if top-down parsing is there, then loop-type parser is a possibility, but not for the rightmost derivation. (2) how to implement the leftmost derivation, and also show the infinite loop error, for the given left-recursive grammar. So, my last response was in that line, i.e. to show (1). But, need to derive first the leftmost derivation, in a top-down manner, as the possible implementation for leftmost derivation. Though, need show theoretically the rightmost derivation. --------------------------------------- Am going for coding too, but there too am unable to progress more on my own. What I plan is to take the code of Louden, as at: http://www.cs.sjsu.edu/faculty/louden/cmptext/ and adapting (simplifying) it to suit a starter class, and then move to the actual code, which is for the full course. ----------------------------------------- Want to add that your leftmost derivation of parse tree, is not seemingly like for top-down parsing to me. Hence, face a fix to elaborate that as done by top-down parsing. The reason is that in class, would only start from the terminals. But, you are starting from NT symbols. Even for the rightmost derivation, cannot elaborate without taking terminals first, and deriving the left part of a derivation (production rule) from the right one. I might be wrong, but that is truth. |
Then I wish you the best of luck!
Sorry I could not provide the answer you seek. |
My inability to not being able to see parsing, without terminal symbols being put first, and the rules later, is basic; as otherwise the actual rule application process cannot be seen.
Please vet my analysis of top-down parsing for the given grammar and the input string: The first token input symbol is: a, the parser starts with the start symbol, E. Then, the rules are tried successively, from the rule no. 1. The first rule demands 1 symbol of lookahead, i.e. '-'. ---------------- First, assuming that the implementation is without lookahead, then the rule #3, is the only one left. That leads to application of rule #3. But, now the next input token is : '-'. There is no rule that can handle it, except #1. Hence, need to backtrack and apply rule #1 first. Then, comes the application of rule #3, & then rule #4. But, how this memory of earlier failed attempts is kept, by the top-down parser, is the issue. ---------------- Second, assume that lookahead is available. Here, lookahead is needed apart from the current input. Then it is easy, as the rule #1 can be applied, in anticipation of the next symbol after the lookahead (of '-') being any terminal from a/b/c. Then, rule #3 can be applied. Then, rule #4 is applicable. Have read up to: a - The parse tree has the rules applied as: (1, 3, 4) The next input is: b. But, the lookahead token is: * For this need the rule #2, then #3, then #5. So, the rules applied are: (1, 3, 4) (2, 3, 5) Now read token : *, and the lookahead is: c. So, need to choose among the six rules, and due to eoi marker, as the lookahead, can only apply rule #3, and then choice limits to the only rule #6. So, the rules applied in sequence, for the branches in leftmost derivation, are: (1, 3, 4) (2, 3, 5) (3, 6). But, am confused (even with the lookahead based implementation), as how to put the code in place. Also, how to show infinite recursion error, for the given C code implementation; & hence the need to show the need for the LL parser. |
Request vetting, for the below, which is a part of preparation for an easily accessible notes on pumping-lemma, for my class :
Have read that finite languages are subsets of Regular languages, but can only have >= 1 terminals on the rhs of any production rule. That in turn means that the set of productions, is the set of all words in the language. Say, for the below DFA generating finite language: Code:
G_f = (q_0, Σ={X,a,b}, q_F, P={X -> aa | bb}) Code:
G_r = (q_0, Σ={A,a,b}, q_F, P={A -> a | Aa| b| Ab}) Code:
A -> aA => aa |
How to prove using pumping lemma, on a word in the C-language's parser is not regular grammar.
Want to show using pumping lemma, that the parser (LALR) used for (bottom-up parsing) C -language is not regular. Request some inputs, or some literature sources. Need a word in the language to show that it fails the test. But, not clear what criteria the word must possess. At best can think of a programming language construct, say: while statement. The construct is: Code:
while(i> minimum) Seems an arithmetic expression grammar (as below, which is CFG) can be a suitable candidate, and ideally should be able to pick a word in that, to prove by contradiction. Code:
stmt: ID '=' expr ';' Edit: Could find some similar questions on other sites, but they are inconclusive. Say at: 1. https://cs.stackexchange.com/questio...-pumping-lemma 2. https://cs.stackexchange.com/questio...-pumping-lemmahttps://cs.stackexchange.com/questio...-pumping-lemma |
The two code-parts in your post have no connection at all.
There is no ambiguity in parsing stmt -> "while" '(' expression ')' stmt rule. If you wish an ambiguity, consider 'else': Code:
stmt -> expr ';' Code:
if (color==yellow) if (taste==acerb) fruit= lemon; else fruit= mellon; Code:
#1 if (color==yellow) { if (taste==acerb) fruit= lemon; else fruit= mellon; } /*no else here */ |
I never thought that ambiguity in a programming construct (i.e., the if-else statement), alone would be the sole area where to apply the pumping lemma.
In fact, was more hopeful of getting an application of pumping lemma, in arithmetic expressions. Have understood that the basic (i.e., the rule #3: stmt -> "if" '(' expr ')' stmt ) if-else statement, would provide the pumping-length. The rule #3 can be repeated any number of times. In pumping lemma, we divide input word (string) into a concatenation of 3 parts, s.t.: Hence, if before entering the if-else statement, if the program execution had some state q_i; then if rule #3 is applied, and the program enters another state q_j; then any number of executions of the rule #3, would have the program at the same state, i.e. q_j. From Wikipedia, as quoting here, for any regular language L there exists a constant p such that any word w in L with length at least p can be split into three substrings, w = xyz, where the middle portion y must not be empty, such that the words xz, xyz, xyyz, xyyyz, … constructed by repeating y zero or more times are still in L. So, have the program represented as: xy^iz, with y being the part concerned with the: if-else programming construct. The block of executable statements occuring sequentially before y, are denoted by : x; and the block after y, by: z. The pumping-length would then be the length of the y block. |
Quote:
else if ((color==yellow) && (taste!=acerb)) fruit= mellon; Quote:
Edit: Why the first example is given is unclear Code:
if (color==yellow) if (taste==acerb) fruit= lemon; else fruit= mellon; The addition of braces to that example doesn't make a difference, to non-applicability of the pumping lemma. Also, the second example forces me to think that both if, and else parts, standalone are useful candidates for the pumping lemma, apart from the the construct: if-else. |
All times are GMT -5. The time now is 09:11 PM. |