LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-30-2023, 08:23 PM   #1
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Rep: Reputation: 4
Why a regular grammar cannot be both left-recursive, and right-recursive?


It is stated that a regular grammar can be either left-linear, or right-linear; but not both.
Want to understand why?
Let us take the below grammar, which is having left recursion, in all except the rule for derivation of the non-terminal symbol: factor:
Code:
expr->expr+term | expr - term | term
term-> term*factor | term/factor | factor
factor -> digit factor | digit 
digit -> 0|1|2|3|4|5|6|7|8|9
Want to understand why the parse of arithmetic expressions, as per the above grammar is impossible; say for the string: 2*3*5-112+2+3.

Please give some insights.

Last edited by ajiten; 07-30-2023 at 10:48 PM.
 
Old 07-30-2023, 11:18 PM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
Exactly what is impossible and why is it impossible?
My only concern is that lexical parsing should be a separate process:
Code:
expr->expr + term | expr - term | term
term-> term * factor | term / factor | factor
factor -> ( expr ) | NUMBER

# Lexical parser returns NUMBER + - * / ( )

Last edited by NevemTeve; 07-30-2023 at 11:19 PM.
 
1 members found this post helpful.
Old 07-31-2023, 04:14 AM   #3
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
Exactly what is impossible and why is it impossible?
My only concern is that lexical parsing should be a separate process:
Code:
expr->expr + term | expr - term | term
term-> term * factor | term / factor | factor
factor -> ( expr ) | NUMBER

# Lexical parser returns NUMBER + - * / ( )
Thanks, but seems you mean to say that for the parser, if regular grammar is used; then it can be both left-linear and right-linear.
And, it is just a theoretical issue, that such a regular grammar, be either left-linear grammar, or right-linear one.

But, still the doubt lingers on, as why books state that the regular grammar be either left-linear/right-linear, not both.
 
Old 07-31-2023, 04:39 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
Don't know what do you mean 'linear', but I do think you should actually start creating example-programs with bison.
https://www.gnu.org/software/bison/manual/bison.html
 
Old 07-31-2023, 10:45 AM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
I too, am confused by your terms - what exactly do you mean by linear? And are you asking specifically about a "regular" grammar, that is a Type 3 grammar in the Chomsky hierarchy? All of which leads me to think that you may be confusing yourself as well and using an example which is not applicable to the question.

Taking just the question itself:

Quote:
Why a regular grammar cannot be both left-recursive, and right-recursive?
By definition a regular grammar, that is a Type 3 grammar, cannot "remember" things that have gone before, which means that the RHS may only contain one non-terminal and if present it must come at the end. This means that there may be only two types of rules in a regular grammar (From Parsing Technques A Practical Guide, Dick Grune & Ceriel J. H. Jacobs, 1990, pg. 38):

Quote:
A non-terminal produces zero or more terminals
A non-terminal produces zero or more terminals followed by one non-terminal
This definition precludes left-recursion, so left-recursion is impossible in such a regular grammar.

I agree that you should be exploring examples by writing your own examples using GNU Bison - and be precise about your terms. It may be that other authors use other terms, but you should still make effort to define their meaning more precisely, not assuming that others have read the same material.

Last edited by astrogeek; 07-31-2023 at 12:28 PM.
 
Old 08-06-2023, 01:44 PM   #6
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Why it is valid for parser to break a token into units.

Reconsidering the grammar above, for a different reason.
For the arithmetic expression:
Code:
200 + 300 * 50 / 2 * 5 - 3 + 4
The rule for NT factor, will break the tokens: 200, 300, 50 into further smaller parts, which is somewhat confusing; as a token is considered as an atom for lexer.

Code:
expr->expr+term | expr - term | term
term-> term*factor | term/factor | factor
factor -> digit factor | digit 
digit -> 0|1|2|3|4|5|6|7|8|9
 
Old 08-06-2023, 02:08 PM   #7
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
I might have mentioned earlier that you should have a lexical parser that returns a sequence of consecutive digits as a unit (and also removes the white-spaces between tokens) so it should be like this:

Code:
expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> NUMBER | ( expr )

Last edited by NevemTeve; 08-06-2023 at 02:10 PM.
 
1 members found this post helpful.
Old 08-06-2023, 03:58 PM   #8
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
I might have mentioned earlier that you should have a lexical parser that returns a sequence of consecutive digits as a unit (and also removes the white-spaces between tokens) so it should be like this:

Code:
expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> NUMBER | ( expr )
Thanks, but any hint towards coding this lexical parser, where the NUMBER is both a token, and a terminal; hence is taken directly without any need for breaking it further.

Also, why the last rule is needed, is unclear; i.e.
Code:
factor -> (expr)
=============================================

Edit:
Have found here a similar way to split an integer number into parts based on recursive rule, into parts.

Though the rosetta code does not break it.

Last edited by ajiten; 08-09-2023 at 11:39 PM.
 
Old 08-06-2023, 11:40 PM   #9
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
"Token" is a data unit returned by the lexical parser. Sample:
Input:
Code:
str := "foo" + 'b';
Tokens:
Code:
IDENTIFIER
ASSIGN
STRINGLITERAL
+
CHARLITERAL
;
https://en.wikipedia.org/wiki/Lexical_analysis

Regarding `factor->(expr)`: please try to parse this expression: `1*(2-3)`

Last edited by NevemTeve; 08-06-2023 at 11:48 PM.
 
Old 08-06-2023, 11:43 PM   #10
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
"Token" is a data unit returned by the lexical parser. Sample:
Input:
Code:
str := "foo" + 'b';
Tokens:
Code:
IDENTIFIER
ASSIGN
STRINGLITERAL
+
CHARLITERAL
;
https://en.wikipedia.org/wiki/Lexical_analysis
But, that means that your (& rosetta code) approach is correct!
 
Old 08-07-2023, 01:36 AM   #11
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,869
Blog Entries: 1

Rep: Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870Reputation: 1870
> But, that means that your approach is correct!

I should hope so. Do you still have any unanswered question?
 
1 members found this post helpful.
Old 08-10-2023, 08:58 AM   #12
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
A few comments:

The scanner, or "lexer," is the input side. Its purpose is simply to take a stream of raw characters and to arrange them meaingfully into "tokens." For example, the seven consecutive characters 1234.56 would be recognized as "a floating-point number" and passed along accordingly. The lexer is also able to distinguish between '<' '<=' '<>' and to return three individual 'tokens' accordingly, usually needing to "look ahead" only one character to do so.

(The language is designed that way, from the bottom up. It is created to be "efficiently parsed" by such an engine. And then, to allow for efficient code generation and the generation of efficient code. But, I digress ...)

The parser then accepts this stream of "tokens" – TK_FLOAT, TK_LT, TK_LE, TK_NE – and processes it against the "grammar."

However, at some important point you must also consider how the underlying parser engine is actually implemented. The grammar that you write is transformed into a data structure that actually feeds the engine, and "the engine is built for simplicity and raw speed." When the pedal finally meets the metal, the engine isn't "recursive" at all. Usually, it amounts to (in part) an extremely clever FSM = finite-state machine. This implementation imposes certain by-design constraints upon how it actually works, and many of those decisions are careful compromises.

The compiler for that grammar understands the innermost workings of that particular engine and builds its data structures (similarity: "object code") accordingly. Of course, not all engines are the same: "bison" is 'more advanced' than "yacc" because it is specifically engineered to handle more difficult cases. But, at the end of the day, both of them are focused on being "blindingly fast." And this basically means that "they don't 'think.'" The decision paths are built into the data structure by the compiler ... and imposed upon you as the creator of the "valid" grammar that goes in. It is because of the underlying nature of the engine that various grammars and their grammar-compilers also differ.

I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works. What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly?

Last edited by sundialsvcs; 08-10-2023 at 09:06 PM.
 
1 members found this post helpful.
Old 08-11-2023, 08:38 AM   #13
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by sundialsvcs View Post
... Its purpose is simply to take a stream of raw characters and to distinguish between '<' '<=' '<>' and to return three individual 'tokens' accordingly, usually needing to "look ahead" only one character to do so.
The lexer-generator for C-language is built on regular grammar, though the same for some other tools, is based on CFG.
If correct, the reason must be the simplicity; but still request some example based reasoning to show why weaker language (i.e., regular vs cfg) has been chosen for lexer. In fact, it is stated that gcc-C compiler has hand-built lexer, rather than by a tool as Lex, for reasons of efficiency; but still gcc-C lexer is based on regular grammar.
I suspect the gcc-C lexer to be built on regular grammar, as it is based on closed-loop with its parser, that demands one token at a time.
But, unable to get any literature on this.

Quote:
(The language is designed that way, from the bottom up.
Kindly elaborate /provide link, for what is meant by: 'bottom up'.

Quote:
It is created to be "efficiently parsed" by such an engine. And then, to allow for efficient code generation and the generation of efficient code. But, I digress ...)

The parser then accepts this stream of "tokens" – TK_FLOAT, TK_LT, TK_LE, TK_NE – and processes it against the "grammar."

However, at some important point you must also consider how the underlying parser engine is actually implemented. The grammar that you write is transformed into a data structure that actually feeds the engine, and "the engine is built for simplicity and raw speed." When the pedal finally meets the metal, the engine isn't "recursive" at all.

Usually, it amounts to (in part) an extremely clever FSM = finite-state machine. This implementation imposes certain by-design constraints upon how it actually works, and many of those decisions are careful compromises.
Please give an example/reference to show that parser instead of being recursive (hence CFG), is regular.

Seems the book by Holub might point in this direction, as has code implementation; but have to search.
If a source is known by you, kindly point out.


Quote:
The compiler for that grammar understands the innermost workings of that particular engine and builds its data structures (similarity: "object code") accordingly. Of course, not all engines are the same: "bison" is 'more advanced' than "yacc" because it is specifically engineered to handle more difficult cases. But, at the end of the day, both of them are focused on being "blindingly fast." And this basically means that "they don't 'think.'" The decision paths are built into the data structure by the compiler ... and imposed upon you as the creator of the "valid" grammar that goes in. It is because of the underlying nature of the engine that various grammars and their grammar-compilers also differ.
Seems "bison"/"yacc" is different from the data structures, in which the decision paths are implemented.


Quote:
I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works.
Request book and article names, even 1-2 should suffice.
Have got from Delnet, even a very old book titled: Pascal - The Language and it's Implementation, by: D. W. Burton.
Have most of the books in library, but seems none reaches any of the topics stated by you.
The gcc-C compiler related statements above, are courtesy of some answers on stackexchange, hence depth is missing.

Quote:
What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly?
It depends on sources, and seems have none, inspite of googling.

Last edited by ajiten; 08-11-2023 at 08:44 AM.
 
Old 08-11-2023, 08:53 AM   #14
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
I have long ago come to the conclusion that authors who write books about compilers don't know how to write. Many of them are very frankly terrible.
 
Old 08-11-2023, 06:32 PM   #15
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by sundialsvcs View Post
I have long ago come to the conclusion that authors who write books about compilers don't know how to write. Many of them are very frankly terrible.
Then some paper, or some open-source compiler code, should be the clue.
So, should start with yacc/bison source code.
Any direction for what to make as a list to check for, in such code, in what order, particularly for checking how the parser is implemented, or how the lexer is implemented.
Also, the reason why it is implemented that way.
Efficiency results based on particular implementation of parser/lexer as seperate phases, or closed-loop.
But, that needs literature too, to point in the right direction, and papers must be there.
Also, your earlier answers are relevant here as demand more questions, than have raised here
In particular, the last line, of your earlier message, with portion restated below:
Quote:
I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works. What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly?
The answer cannot be found, just by source code's snippets' ( can run stand-alone) understanding. Need literature, though literature must be derived by coding.

Last edited by ajiten; 08-11-2023 at 07:56 PM.
 
  


Reply

Tags
compiler



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Change of associativity for a given recursive grammar. ajiten Programming 21 08-05-2023 09:52 PM
evdev left and right mouse buttons are both left clicks ajb_62 Linux - Newbie 2 03-08-2021 06:14 PM
Volume levels always go lower on either the right or left side (usually the left) chunchunmaru Linux Mint 1 12-11-2020 08:53 AM
Computer detecting right-click as left-click, left-click as left-click and middle with 2 fingers pressed as right-click Festerdam Linux - Newbie 5 06-19-2017 05:41 PM
scroll left goes right and right goes left adamruss Linux - Hardware 2 08-17-2007 11:31 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration