LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-06-2023, 11:45 PM   #1
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Rep: Reputation: 4
Unable to understand why the given Lex program fails to recognize the given input string.


The problem is stated below, based on the contents here: https://www.ibm.com/docs/en/zos/2.4....uity-lookahead

It states a Lex program may be ambiguous, if some particular string may match more than one translation expression.
If the input matches more than one expression, Lex uses the following rules to determine which action to take:
1. The rule that matches the longest possible input stream is preferred.
2. If more than one rule matches an input of the same length, the rule that appears first in the translations section is preferred.


It states that the below fragment of lex program, would not be able to recognize the input string: ‘abbb9’, due to the second translation expression never being reached.
This is stated to be due to the above rule (not clear which of the two, the author implies):

letter [[:lower:]]
%%
a({letter})* { return('A'); }
ab({letter})* { return('B'); }

It states that by the swapping the sequence of the two translation expressions, the given string would be recognized.

Last edited by ajiten; 07-06-2023 at 11:58 PM.
 
Old 07-07-2023, 12:13 AM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,866
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
You might have missed this part:
In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.

Note: it is unlucky that 'rule' is used in two meaning, maybe it could be rephrased as:

1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.

(Also '9' in 'abbb9' is not part of the pattern, guess it is meant to terminate the 'abbb' sequence.)

Last edited by NevemTeve; 07-07-2023 at 12:20 AM.
 
1 members found this post helpful.
Old 07-07-2023, 12:29 AM   #3
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
You might have missed this part:
In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.

Note: it is unlucky that 'rule' is used in two meaning, maybe it could be rephrased as:

1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.

(Also '9' in 'abbb9' is not part of the pattern, guess it is meant to terminate the 'abbb' sequence.)
Thanks, but do you mean to say that I have wrongly interpreted that the given Lex program, would not be able to read the input: 'abbb'?
If yes, then why the second pattern is preferred by the author? What difference is there, if the first pattern is able to read the input: 'abbb'?
Also, if the second pattern is never reached by the given program, then why the grammar is ambiguous?
How the swap of the two patterns, would make the program unambiguous?

Last edited by ajiten; 07-07-2023 at 12:50 AM.
 
Old 07-07-2023, 12:49 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,866
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
It is no question that the program will accept `abbb`, the question is which pattern will be used. Both pattern (ab(letter)* and a(letter)*) matches the sequence (the whole), so the first one wins.

Last edited by NevemTeve; 07-07-2023 at 12:50 AM.
 
1 members found this post helpful.
Old 07-07-2023, 12:54 AM   #5
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
It is no question that the program will accept `abbb`, the question is which pattern will be used. Both pattern (ab(letter)* and a(letter)*) matches the sequence (the whole), so the first one wins.
But, this winning of the first pattern having translation expression, for recognizing the given input string; would make the program ambiguous, as per the author!
But, how the swap of the two translation expressions, would make the program unambiguous is unclear?
Still both translation expressions can match, and just there is no need for the second one, i.e. in the program containing the swapped translation expressions.

Last edited by ajiten; 07-07-2023 at 12:55 AM.
 
Old 07-07-2023, 01:00 AM   #6
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,866
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Please re-read the second rule:

1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.
 
Old 07-07-2023, 01:16 AM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,267
Blog Entries: 24

Rep: Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195Reputation: 4195
Quote:
Originally Posted by ajiten View Post
Also, if the second pattern is never reached by the given program, then why the grammar is ambiguous?
How the swap of the two patterns, would make the program unambiguous?
I have not read the linked article, but based on what has been said in this thread so far...

If you have a pattern that can never be reached by any matching pattern, then what is the purpose of that pattern... that is ambiguity. The program can never see a 'B' token.

If you swap the order of those patterns then patterns which match it will be returned, thus removing the ambiguity. The program can see a 'B' token.
 
Old 07-07-2023, 01:24 AM   #8
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by astrogeek View Post
I have not read the linked article, but based on what has been said in this thread so far...

If you have a pattern that can never be reached by any matching pattern, then what is the purpose of that pattern... that is ambiguity. The program can never see a 'B' token.

If you swap the order of those patterns then patterns which match it will be returned, thus removing the ambiguity. The program can see a 'B' token.
Thanks, but by swapping the matching patterns, the program would not able to return a 'A' Token.
And, that means it cannot be the basis for stating that the modified program is unambiguous.

To quote the source, it states:
In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.

Last edited by ajiten; 07-07-2023 at 01:50 AM.
 
Old 07-07-2023, 02:14 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,129

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Might one ask why you are reading that particular doco ?. That is for a Unix subsystem that was added to IBMs proprietary z/OS operating system running (only) on IBM proprietary mainframe zseries hardware. Note - not z/linux.

I would image the lex code is pretty common, but it's not guaranteed - remember this is the company that also has AIX.
 
Old 07-07-2023, 03:15 AM   #10
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,866
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
> Thanks, but by swapping the matching patterns, the program would not able to return a 'A' Token.

Of course it would, try sequences "a", "aa", "acab" etc.
 
Old 07-07-2023, 05:45 AM   #11
ajiten
Member
 
Registered: Jun 2023
Posts: 375

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by NevemTeve View Post
> Thanks, but by swapping the matching patterns, the program would not able to return a 'A' Token.

Of course it would, try sequences "a", "aa", "acab" etc.
So, the document is wrong in specifying the problem, as one of ambiguity; as the problem is of specifying non-general patterns, that distinguish between different cases of inputs; rather than specifying a relatively general pattern.
An alternative way to state the same, is: the pattern should match the state diagram precisely.
So, the state diagram for the input language given by the pattern in the second rule, as:
> start node (let, n1), one input: 'a', that leads to the second node (let, n2).
n2 , in-turn, has two possible inputs: 'a', 'b'; and that leads to two new states (nodes): n3, n4; and n3/n4 nodes can have any number of 'a', or 'b', as inputs; and act as final states.

It implies that a general pattern, should not be used, instead of a particular pattern.

But, is that synonymous with ambiguity?

Last edited by ajiten; 07-07-2023 at 05:51 AM.
 
Old 07-11-2023, 12:28 AM   #12
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,866
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Having both (1) a[a-z]* and (2) ab[a-z]* in the language is ambiguous (it is not lex-related).
(Possible solution is changing the rule (1) to a|(a[ac-z][a-z]*)

Programmatic implementations (like lex) can solve this (without changing the pattern) by defining evaluation order: (2) first, then (1)

Last edited by NevemTeve; 07-11-2023 at 12:30 AM.
 
  


Reply

Tags
compiler



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
i tried using this code for deleting a user given character from a user given string mecrazyme1234 Linux - Newbie 2 06-04-2011 04:59 PM
i tried using this code for deleting a user given character from a user given string mecrazyme1234 Programming 7 06-04-2011 11:47 AM
Haskell: Why are these ambiguous? hydraMax Programming 3 04-28-2011 12:35 AM
merge files, given its odd and even given timepassman Linux - Software 1 05-08-2008 01:17 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration