Unable to understand why the given Lex program fails to recognize the given input string.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
It states a Lex program may be ambiguous, if some particular string may match more than one translation expression.
If the input matches more than one expression, Lex uses the following rules to determine which action to take:
1. The rule that matches the longest possible input stream is preferred.
2. If more than one rule matches an input of the same length, the rule that appears first in the translations section is preferred.
It states that the below fragment of lex program, would not be able to recognize the input string: ‘abbb9’, due to the second translation expression never being reached.
This is stated to be due to the above rule (not clear which of the two, the author implies):
You might have missed this part: In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.
Note: it is unlucky that 'rule' is used in two meaning, maybe it could be rephrased as:
1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.
(Also '9' in 'abbb9' is not part of the pattern, guess it is meant to terminate the 'abbb' sequence.)
You might have missed this part: In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.
Note: it is unlucky that 'rule' is used in two meaning, maybe it could be rephrased as:
1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.
(Also '9' in 'abbb9' is not part of the pattern, guess it is meant to terminate the 'abbb' sequence.)
Thanks, but do you mean to say that I have wrongly interpreted that the given Lex program, would not be able to read the input: 'abbb'?
If yes, then why the second pattern is preferred by the author? What difference is there, if the first pattern is able to read the input: 'abbb'?
Also, if the second pattern is never reached by the given program, then why the grammar is ambiguous?
How the swap of the two patterns, would make the program unambiguous?
It is no question that the program will accept `abbb`, the question is which pattern will be used. Both pattern (ab(letter)* and a(letter)*) matches the sequence (the whole), so the first one wins.
It is no question that the program will accept `abbb`, the question is which pattern will be used. Both pattern (ab(letter)* and a(letter)*) matches the sequence (the whole), so the first one wins.
But, this winning of the first pattern having translation expression, for recognizing the given input string; would make the program ambiguous, as per the author!
But, how the swap of the two translation expressions, would make the program unambiguous is unclear?
Still both translation expressions can match, and just there is no need for the second one, i.e. in the program containing the swapped translation expressions.
1. The pattern that matches the longest possible input stream is preferred.
2. If more than one pattern matches an input of the same length, the one that appears first in the translations section is preferred.
Also, if the second pattern is never reached by the given program, then why the grammar is ambiguous?
How the swap of the two patterns, would make the program unambiguous?
I have not read the linked article, but based on what has been said in this thread so far...
If you have a pattern that can never be reached by any matching pattern, then what is the purpose of that pattern... that is ambiguity. The program can never see a 'B' token.
If you swap the order of those patterns then patterns which match it will be returned, thus removing the ambiguity. The program can see a 'B' token.
I have not read the linked article, but based on what has been said in this thread so far...
If you have a pattern that can never be reached by any matching pattern, then what is the purpose of that pattern... that is ambiguity. The program can never see a 'B' token.
If you swap the order of those patterns then patterns which match it will be returned, thus removing the ambiguity. The program can see a 'B' token.
Thanks, but by swapping the matching patterns, the program would not able to return a 'A' Token.
And, that means it cannot be the basis for stating that the modified program is unambiguous.
To quote the source, it states:
In this example, rule 1 is not sufficient, as, for example, the string abb9 applies to either action; therefore, by rule 2, the first matching action should apply.
Might one ask why you are reading that particular doco ?. That is for a Unix subsystem that was added to IBMs proprietary z/OS operating system running (only) on IBM proprietary mainframe zseries hardware. Note - not z/linux.
I would image the lex code is pretty common, but it's not guaranteed - remember this is the company that also has AIX.
> Thanks, but by swapping the matching patterns, the program would not able to return a 'A' Token.
Of course it would, try sequences "a", "aa", "acab" etc.
So, the document is wrong in specifying the problem, as one of ambiguity; as the problem is of specifying non-general patterns, that distinguish between different cases of inputs; rather than specifying a relatively general pattern.
An alternative way to state the same, is: the pattern should match the state diagram precisely.
So, the state diagram for the input language given by the pattern in the second rule, as:
> start node (let, n1), one input: 'a', that leads to the second node (let, n2).
n2 , in-turn, has two possible inputs: 'a', 'b'; and that leads to two new states (nodes): n3, n4; and n3/n4 nodes can have any number of 'a', or 'b', as inputs; and act as final states.
It implies that a general pattern, should not be used, instead of a particular pattern.
Having both (1) a[a-z]* and (2) ab[a-z]* in the language is ambiguous (it is not lex-related).
(Possible solution is changing the rule (1) to a|(a[ac-z][a-z]*)
Programmatic implementations (like lex) can solve this (without changing the pattern) by defining evaluation order: (2) first, then (1)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.