Why a regular grammar cannot be both left-recursive, and right-recursive?

LinuxQuestions.org

Help answer threads with 0 replies.

		LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
Why a regular grammar cannot be both left-recursive, and right-recursive?

Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

Are you new to LinuxQuestions.org? Visit the following links:
Site Howto | Site FAQ | Sitemap | Register Now

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.

Exclusive for LQ members, get up to 45% off per month. Click here for more info.

Reply

Page 1 of 2

1

Search this Thread

07-30-2023, 08:23 PM	#1
ajiten Member Registered: Jun 2023 Posts: 375 Rep:	Why a regular grammar cannot be both left-recursive, and right-recursive? It is stated that a regular grammar can be either left-linear, or right-linear; but not both. Want to understand why? Let us take the below grammar, which is having left recursion, in all except the rule for derivation of the non-terminal symbol: factor: Code: expr->expr+term \| expr - term \| term term-> termfactor \| term/factor \| factor factor -> digit factor \| digit digit -> 0\|1\|2\|3\|4\|5\|6\|7\|8\|9 Want to understand why the parse of arithmetic expressions, as per the above grammar is impossible; say for the string: 235-112+2+3. Please give some insights. Last edited by ajiten; 07-30-2023 at 10:48 PM.*

07-30-2023, 11:18 PM	#2
NevemTeve Senior Member Registered: Oct 2011 Location: Budapest Distribution: Debian/GNU/Linux, AIX Posts: 4,869 Blog Entries: 1 Rep:	Exactly what is impossible and why is it impossible? My only concern is that lexical parsing should be a separate process: Code: expr->expr + term \| expr - term \| term term-> term * factor \| term / factor \| factor factor -> ( expr ) \| NUMBER # Lexical parser returns NUMBER + - * / ( ) Last edited by NevemTeve; 07-30-2023 at 11:19 PM.
	1 members found this post helpful.

Old

07-31-2023, 04:14 AM

#3

ajiten

Member

Registered: Jun 2023

Posts: 375

Original Poster

Rep:

Reputation: 4

Quote:

Originally Posted by NevemTeve

View Post

Exactly what is impossible and why is it impossible?
My only concern is that lexical parsing should be a separate process:

Code:

expr->expr + term | expr - term | term
term-> term * factor | term / factor | factor
factor -> ( expr ) | NUMBER

# Lexical parser returns NUMBER + - * / ( )

Thanks, but seems you mean to say that for the parser, if regular grammar is used; then it can be both left-linear and right-linear.
And, it is just a theoretical issue, that such a regular grammar, be either left-linear grammar, or right-linear one.

But, still the doubt lingers on, as why books state that the regular grammar be either left-linear/right-linear, not both.

07-31-2023, 04:39 AM	#4
NevemTeve Senior Member Registered: Oct 2011 Location: Budapest Distribution: Debian/GNU/Linux, AIX Posts: 4,869 Blog Entries: 1 Rep:	Don't know what do you mean 'linear', but I do think you should actually start creating example-programs with bison. https://www.gnu.org/software/bison/manual/bison.html

Old

07-31-2023, 10:45 AM

#5

astrogeek

Moderator

Registered: Oct 2008

Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}

Posts: 6,269

Blog Entries: 24

Rep:

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

Reputation: 4196

I too, am confused by your terms - what exactly do you mean by linear? And are you asking specifically about a "regular" grammar, that is a Type 3 grammar in the Chomsky hierarchy? All of which leads me to think that you may be confusing yourself as well and using an example which is not applicable to the question.

Taking just the question itself:

Quote:

Why a regular grammar cannot be both left-recursive, and right-recursive?

By definition a regular grammar, that is a Type 3 grammar, cannot "remember" things that have gone before, which means that the RHS may only contain one non-terminal and if present it must come at the end. This means that there may be only two types of rules in a regular grammar (From Parsing Technques A Practical Guide, Dick Grune & Ceriel J. H. Jacobs, 1990, pg. 38):

Quote:

A non-terminal produces zero or more terminals
A non-terminal produces zero or more terminals followed by one non-terminal

This definition precludes left-recursion, so left-recursion is impossible in such a regular grammar.

I agree that you should be exploring examples by writing your own examples using GNU Bison - and be precise about your terms. It may be that other authors use other terms, but you should still make effort to define their meaning more precisely, not assuming that others have read the same material.

Last edited by astrogeek; 07-31-2023 at 12:28 PM.

08-06-2023, 01:44 PM	#6
ajiten Member Registered: Jun 2023 Posts: 375 Original Poster Rep:	Why it is valid for parser to break a token into units. Reconsidering the grammar above, for a different reason. For the arithmetic expression: Code: 200 + 300 * 50 / 2 * 5 - 3 + 4 The rule for NT factor, will break the tokens: 200, 300, 50 into further smaller parts, which is somewhat confusing; as a token is considered as an atom for lexer. Code: expr->expr+term \| expr - term \| term term-> term*factor \| term/factor \| factor factor -> digit factor \| digit digit -> 0\|1\|2\|3\|4\|5\|6\|7\|8\|9

08-06-2023, 02:08 PM	#7
NevemTeve Senior Member Registered: Oct 2011 Location: Budapest Distribution: Debian/GNU/Linux, AIX Posts: 4,869 Blog Entries: 1 Rep:	I might have mentioned earlier that you should have a lexical parser that returns a sequence of consecutive digits as a unit (and also removes the white-spaces between tokens) so it should be like this: Code: expr -> expr + term \| expr - term \| term term -> term * factor \| term / factor \| factor factor -> NUMBER \| ( expr ) Last edited by NevemTeve; 08-06-2023 at 02:10 PM.
	1 members found this post helpful.

Old

08-06-2023, 03:58 PM

#8

ajiten

Member

Registered: Jun 2023

Posts: 375

Original Poster

Rep:

Reputation: 4

Quote:

Originally Posted by NevemTeve

View Post

I might have mentioned earlier that you should have a lexical parser that returns a sequence of consecutive digits as a unit (and also removes the white-spaces between tokens) so it should be like this:

Code:

expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> NUMBER | ( expr )

Thanks, but any hint towards coding this lexical parser, where the NUMBER is both a token, and a terminal; hence is taken directly without any need for breaking it further.

Also, why the last rule is needed, is unclear; i.e.

Code:

factor -> (expr)

=============================================

Edit:
Have found here a similar way to split an integer number into parts based on recursive rule, into parts.

Though the rosetta code does not break it.

Last edited by ajiten; 08-09-2023 at 11:39 PM.

08-06-2023, 11:40 PM	#9
NevemTeve Senior Member Registered: Oct 2011 Location: Budapest Distribution: Debian/GNU/Linux, AIX Posts: 4,869 Blog Entries: 1 Rep:	"Token" is a data unit returned by the lexical parser. Sample: Input: Code: str := "foo" + 'b'; Tokens: Code: IDENTIFIER ASSIGN STRINGLITERAL + CHARLITERAL ; https://en.wikipedia.org/wiki/Lexical_analysis Regarding `factor->(expr)`: please try to parse this expression: `1(2-3)` Last edited by NevemTeve; 08-06-2023 at 11:48 PM.*

Old

08-06-2023, 11:43 PM

#10

ajiten

Member

Registered: Jun 2023

Posts: 375

Original Poster

Rep:

Reputation: 4

Quote:

Originally Posted by NevemTeve

View Post

"Token" is a data unit returned by the lexical parser. Sample:
Input:

Code:

str := "foo" + 'b';

Tokens:

Code:

IDENTIFIER
ASSIGN
STRINGLITERAL
+
CHARLITERAL
;

https://en.wikipedia.org/wiki/Lexical_analysis

But, that means that your (& rosetta code) approach is correct!

08-07-2023, 01:36 AM	#11
NevemTeve Senior Member Registered: Oct 2011 Location: Budapest Distribution: Debian/GNU/Linux, AIX Posts: 4,869 Blog Entries: 1 Rep:	> But, that means that your approach is correct! I should hope so. Do you still have any unanswered question?
	1 members found this post helpful.

08-10-2023, 08:58 AM	#12
sundialsvcs LQ Guru Registered: Feb 2004 Location: SE Tennessee, USA Distribution: Gentoo, LFS Posts: 10,671 Blog Entries: 4 Rep:	A few comments: The scanner, or "lexer," is the input side. Its purpose is simply to take a stream of raw characters and to arrange them meaingfully into "tokens." For example, the seven consecutive characters 1234.56 would be recognized as "a floating-point number" and passed along accordingly. The lexer is also able to distinguish between '<' '<=' '<>' and to return three individual 'tokens' accordingly, usually needing to "look ahead" only one character to do so. (The language is designed that way, from the bottom up. It is created to be "efficiently parsed" by such an engine. And then, to allow for efficient code generation and the generation of efficient code. But, I digress ...) The parser then accepts this stream of "tokens" – TK_FLOAT, TK_LT, TK_LE, TK_NE – and processes it against the "grammar." However, at some important point you must also consider how the underlying parser engine is actually implemented. The grammar that you write is transformed into a data structure that actually feeds the engine, and "the engine is built for simplicity and raw speed." When the pedal finally meets the metal, the engine isn't "recursive" at all. Usually, it amounts to (in part) an extremely clever FSM = finite-state machine. This implementation imposes certain by-design constraints upon how it actually works, and many of those decisions are careful compromises. The compiler for that grammar understands the innermost workings of that particular engine and builds its data structures (similarity: "object code") accordingly. Of course, not all engines are the same: "bison" is 'more advanced' than "yacc" because it is specifically engineered to handle more difficult cases. But, at the end of the day, both of them are focused on being "blindingly fast." And this basically means that "they don't 'think.'" The decision paths are built into the data structure by the compiler ... and imposed upon you as the creator of the "valid" grammar that goes in. It is because of the underlying nature of the engine that various grammars and their grammar-compilers also differ. I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works. What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly? Last edited by sundialsvcs; 08-10-2023 at 09:06 PM.
	1 members found this post helpful.

Old

08-11-2023, 08:38 AM

#13

ajiten

Member

Registered: Jun 2023

Posts: 375

Original Poster

Rep:

Reputation: 4

Quote:

Originally Posted by sundialsvcs

View Post

... Its purpose is simply to take a stream of raw characters and to distinguish between '<' '<=' '<>' and to return three individual 'tokens' accordingly, usually needing to "look ahead" only one character to do so.

The lexer-generator for C-language is built on regular grammar, though the same for some other tools, is based on CFG.
If correct, the reason must be the simplicity; but still request some example based reasoning to show why weaker language (i.e., regular vs cfg) has been chosen for lexer. In fact, it is stated that gcc-C compiler has hand-built lexer, rather than by a tool as Lex, for reasons of efficiency; but still gcc-C lexer is based on regular grammar.
I suspect the gcc-C lexer to be built on regular grammar, as it is based on closed-loop with its parser, that demands one token at a time.
But, unable to get any literature on this.

Quote:

(The language is designed that way, from the bottom up.

Kindly elaborate /provide link, for what is meant by: 'bottom up'.

Quote:

It is created to be "efficiently parsed" by such an engine. And then, to allow for efficient code generation and the generation of efficient code. But, I digress ...)

The parser then accepts this stream of "tokens" – TK_FLOAT, TK_LT, TK_LE, TK_NE – and processes it against the "grammar."

However, at some important point you must also consider how the underlying parser engine is actually implemented. The grammar that you write is transformed into a data structure that actually feeds the engine, and "the engine is built for simplicity and raw speed." When the pedal finally meets the metal, the engine isn't "recursive" at all.

Usually, it amounts to (in part) an extremely clever FSM = finite-state machine. This implementation imposes certain by-design constraints upon how it actually works, and many of those decisions are careful compromises.

Please give an example/reference to show that parser instead of being recursive (hence CFG), is regular.

Seems the book by Holub might point in this direction, as has code implementation; but have to search.
If a source is known by you, kindly point out.

Quote:

The compiler for that grammar understands the innermost workings of that particular engine and builds its data structures (similarity: "object code") accordingly. Of course, not all engines are the same: "bison" is 'more advanced' than "yacc" because it is specifically engineered to handle more difficult cases. But, at the end of the day, both of them are focused on being "blindingly fast." And this basically means that "they don't 'think.'" The decision paths are built into the data structure by the compiler ... and imposed upon you as the creator of the "valid" grammar that goes in. It is because of the underlying nature of the engine that various grammars and their grammar-compilers also differ.

Seems "bison"/"yacc" is different from the data structures, in which the decision paths are implemented.

Quote:

I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works.

Request book and article names, even 1-2 should suffice.
Have got from Delnet, even a very old book titled: Pascal - The Language and it's Implementation, by: D. W. Burton.
Have most of the books in library, but seems none reaches any of the topics stated by you.
The gcc-C compiler related statements above, are courtesy of some answers on stackexchange, hence depth is missing.

Quote:

What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly?

It depends on sources, and seems have none, inspite of googling.

Last edited by ajiten; 08-11-2023 at 08:44 AM.

08-11-2023, 08:53 AM	#14
sundialsvcs LQ Guru Registered: Feb 2004 Location: SE Tennessee, USA Distribution: Gentoo, LFS Posts: 10,671 Blog Entries: 4 Rep:	I have long ago come to the conclusion that authors who write books about compilers don't know how to write. Many of them are very frankly terrible.

Old

08-11-2023, 06:32 PM

#15

ajiten

Member

Registered: Jun 2023

Posts: 375

Original Poster

Rep:

Reputation: 4

Quote:

Originally Posted by sundialsvcs

View Post

I have long ago come to the conclusion that authors who write books about compilers don't know how to write.

Many of them are very frankly terrible.

Then some paper, or some open-source compiler code, should be the clue.
So, should start with yacc/bison source code.
Any direction for what to make as a list to check for, in such code, in what order, particularly for checking how the parser is implemented, or how the lexer is implemented.
Also, the reason why it is implemented that way.
Efficiency results based on particular implementation of parser/lexer as seperate phases, or closed-loop.
But, that needs literature too, to point in the right direction, and papers must be there.
Also, your earlier answers are relevant here as demand more questions, than have raised here
In particular, the last line, of your earlier message, with portion restated below:

Quote:

I would encourage you to "dumpster-dive into the actual source code" of the "compiler creator" of your choice – since you have it – and to read some of the many articles (and books) that have been written to describe why(!) it works. What were the issues that prompted the creators of these tools to write them in this particular way? They were building tools to solve an extremely practical and commonly-encountered problem. But, from their technical point of view, what was 'that problem?' Exactly?

The answer cannot be found, just by source code's snippets' ( can run stand-alone) understanding. Need literature, though literature must be derived by coding.

Last edited by ajiten; 08-11-2023 at 07:56 PM.

Reply

Page 1 of 2

1

Posting Rules
You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off Forum Rules

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Change of associativity for a given recursive grammar.	ajiten	Programming	21	08-05-2023 09:52 PM
evdev left and right mouse buttons are both left clicks	ajb_62	Linux - Newbie	2	03-08-2021 06:14 PM
Volume levels always go lower on either the right or left side (usually the left)	chunchunmaru	Linux Mint	1	12-11-2020 08:53 AM
Computer detecting right-click as left-click, left-click as left-click and middle with 2 fingers pressed as right-click	Festerdam	Linux - Newbie	5	06-19-2017 05:41 PM
scroll left goes right and right goes left	adamruss	Linux - Hardware	2	08-17-2007 11:31 AM

All times are GMT -5. The time now is 02:01 PM.

Main Menu

Advertisement

My LQ

Write for LQ

LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.

Main Menu

Syndicate

Latest Threads

LQ News

Twitter: @linuxquestions

Open Source Consulting | Domain Registration