Why a regular grammar cannot be both left-recursive, and right-recursive?
It is stated that a regular grammar can be either left-linear, or right-linear; but not both.
Want to understand why? Let us take the below grammar, which is having left recursion, in all except the rule for derivation of the non-terminal symbol: factor: Code:
expr->expr+term | expr - term | term Please give some insights. |
Exactly what is impossible and why is it impossible?
My only concern is that lexical parsing should be a separate process: Code:
expr->expr + term | expr - term | term |
Quote:
And, it is just a theoretical issue, that such a regular grammar, be either left-linear grammar, or right-linear one. But, still the doubt lingers on, as why books state that the regular grammar be either left-linear/right-linear, not both. |
Don't know what do you mean 'linear', but I do think you should actually start creating example-programs with bison.
https://www.gnu.org/software/bison/manual/bison.html |
I too, am confused by your terms - what exactly do you mean by linear? And are you asking specifically about a "regular" grammar, that is a Type 3 grammar in the Chomsky hierarchy? All of which leads me to think that you may be confusing yourself as well and using an example which is not applicable to the question.
Taking just the question itself: Quote:
Quote:
I agree that you should be exploring examples by writing your own examples using GNU Bison - and be precise about your terms. It may be that other authors use other terms, but you should still make effort to define their meaning more precisely, not assuming that others have read the same material. |
Why it is valid for parser to break a token into units.
Reconsidering the grammar above, for a different reason.
For the arithmetic expression: Code:
200 + 300 * 50 / 2 * 5 - 3 + 4 Code:
expr->expr+term | expr - term | term |
I might have mentioned earlier that you should have a lexical parser that returns a sequence of consecutive digits as a unit (and also removes the white-spaces between tokens) so it should be like this:
Code:
expr -> expr + term | expr - term | term |
Quote:
Also, why the last rule is needed, is unclear; i.e. Code:
factor -> (expr) Edit: Have found here a similar way to split an integer number into parts based on recursive rule, into parts. Though the rosetta code does not break it. |
"Token" is a data unit returned by the lexical parser. Sample:
Input: Code:
str := "foo" + 'b'; Code:
IDENTIFIER Regarding `factor->(expr)`: please try to parse this expression: `1*(2-3)` |
Quote:
|
> But, that means that your approach is correct!
I should hope so. Do you still have any unanswered question? |
A few comments:
The scanner, or "lexer," is the input side. Its purpose is simply to take a stream of raw characters and to arrange them meaingfully into "tokens." For example, the seven consecutive characters 1234.56 would be recognized as "a floating-point number" and passed along accordingly. The lexer is also able to distinguish between '<' '<=' '<>' and to return three individual 'tokens' accordingly, usually needing to "look ahead" only one character to do so. (The language is designed that way, from the bottom up. It is created to be "efficiently parsed" by such an engine. And then, to allow for efficient code generation and the generation of efficient code. But, I digress ...) The parser then accepts this stream of "tokens" – TK_FLOAT, TK_LT, TK_LE, TK_NE – and processes it against the "grammar." However, at some important point you must also consider how the underlying parser engine is actually implemented. The grammar that you write is transformed into a data structure that actually feeds the engine, and "the engine is built for simplicity and raw speed." When the pedal finally meets the metal, the engine isn't "recursive" at all. Usually, it amounts to (in part) an extremely clever FSM = finite-state machine. This implementation imposes certain by-design constraints upon how it actually works, and many of those decisions are careful compromises. The compiler for that grammar understands the innermost workings of that particular engine and builds its data structures (similarity: "object code") accordingly. Of course, not all engines are the same: "bison" is 'more advanced' than "yacc" because it is specifically engineered to handle more difficult cases. But, at the end of the day, both of them are focused on being "blindingly fast." And this basically means that "they don't 'think.'" The decision paths are built into the data structure by the compiler ... and imposed upon you as the creator of the "valid" grammar that goes in. It is because of the underlying nature of the engine that various grammars and their grammar-compilers also differ. I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works. What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly? |
Quote:
If correct, the reason must be the simplicity; but still request some example based reasoning to show why weaker language (i.e., regular vs cfg) has been chosen for lexer. In fact, it is stated that gcc-C compiler has hand-built lexer, rather than by a tool as Lex, for reasons of efficiency; but still gcc-C lexer is based on regular grammar. I suspect the gcc-C lexer to be built on regular grammar, as it is based on closed-loop with its parser, that demands one token at a time. But, unable to get any literature on this. Quote:
Quote:
Seems the book by Holub might point in this direction, as has code implementation; but have to search. If a source is known by you, kindly point out. Quote:
Quote:
Have got from Delnet, even a very old book titled: Pascal - The Language and it's Implementation, by: D. W. Burton. Have most of the books in library, but seems none reaches any of the topics stated by you. The gcc-C compiler related statements above, are courtesy of some answers on stackexchange, hence depth is missing. Quote:
|
I have long ago come to the conclusion that authors who write books about compilers don't know how to write. :) Many of them are very frankly terrible.
|
Quote:
So, should start with yacc/bison source code. Any direction for what to make as a list to check for, in such code, in what order, particularly for checking how the parser is implemented, or how the lexer is implemented. Also, the reason why it is implemented that way. Efficiency results based on particular implementation of parser/lexer as seperate phases, or closed-loop. But, that needs literature too, to point in the right direction, and papers must be there. Also, your earlier answers are relevant here as demand more questions, than have raised here In particular, the last line, of your earlier message, with portion restated below: Quote:
|
All times are GMT -5. The time now is 02:45 AM. |