LinuxQuestions.org - Newbie Yacc/Bison shift reduce question

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Newbie Yacc/Bison shift reduce question (https://www.linuxquestions.org/questions/programming-9/newbie-yacc-bison-shift-reduce-question-459259/)

Newbie Yacc/Bison shift reduce question

Hi.

Since yacc/bison only has 1 token of look-ahead, I expect a shift reduce conflict in the second "STRING IDEN ..." rule in the code segment pasted below, but I don't get one. Then in a similar situation, I get a shift reduce conflict! See fprintf(stdout, ..) statements below.

What's happening?

Thanks in advance!

Code:

/* Two questions about shift reduce conflicts found in fprintf(stdout, ..)

statements.  Thanks!!  */

 

/* ..... */

  |  DOUBLE IDEN becomes OBRACK number_list CBRACK {

      myasgn.name=$2;

      myasgn.vl=$5;

      myasgn.type=VECTOR;

      addAssign(myasgn);

      gotit=1;

    }

  |  STRING IDEN becomes OBRACK string_list CBRACK {

      myasgn.name=$2;

      myasgn.vl=$5;

      myasgn.type=VECTOR;

      addAssign(myasgn);

      gotit=1;

    }  

  |  STRING IDEN becomes EXTRACT CBRACK string_list {

      fprintf(stdout, "Why is there no shift reduce conflict here?!?!\n");

      /* assignLook($3)->assign; */

    }

  |  datatype IDEN becomes EXTRACT PLUS string_list {

      fprintf(stdout, "But there is a shift reduce conflict when I do this?!\n");

    }

  ;

 

 

datatype: LONG | DOUBLE | STRING | IVEC | FVEC | VECTOR | SCALAR

  ;

 

/* ...... */

Compiler complaints:

Code:

gparmod6.y: conflicts: 5 shift/reduce

gparmod6.y:171.11-14: warning: rule never reduced because of conflicts: datatype: LONG

gparmod6.y:171.18-23: warning: rule never reduced because of conflicts: datatype: DOUBLE

gparmod6.y:171.27-32: warning: rule never reduced because of conflicts: datatype: STRING

gparmod6.y:171.36-39: warning: rule never reduced because of conflicts: datatype: IVEC

gparmod6.y:171.43-46: warning: rule never reduced because of conflicts: datatype: FVEC

Note: the conflicts dissapear, as expected, if I change
datatype IDEN becomes ....

to
datatype EXTRACT IDEN becomes ...

so that yacc can use that token of look-ahead.

Aidez-moi s'il vous plait? :)

Any ideas?

Any thoughts (even if they are guesses) would be appreciated... I haven't been able to find anything in the literature about why that "string iden.." rule wouldn't cause a conflict.

Consider what decisions the parser must make and when it is obliged to make them...

In the first case, when we see "STRING IDEN becomes," then even though there are two possible rules that we could be following at this moment in time, there's only one thing that we must do: we must shift. It is not until the next symbol appears that we have narrowed things down to just one rule (OBRACK vs. EXTRACT), and even then we're still going to continue to shift.

However... if you introduce the fourth rule, a shift/reduce conflict immediately appears. When presented with the input stream "STRING IDEN," we could either shift that symbol onto the stack and proceed, as we did before, or we could reduce STRING to an instance of DATATYPE. The situation is ambiguous: a shift/reduce conflict.

When grokking parser problems, it's really important to understand how the parsing engine actually works: what information it is presented with (this information having been generated from the grammar), what its choices are (shift .. that is to say "push" .. or reduce), and how it will make that decision. The grammar is not the actual input that drives the engine: the parsing engine is a state-machine.

What a helpful response, sundialsvcs! Thanks. If only there were karma so that I could give you some :).

You're welcome. :) I guess it always annoyed me, somehow, that parsers were made out to be so gosh-darned mysterious (and subsequently inflicted upon generations of grad-students) when they really are not.

A parser is a state-machine. In other words, it bounces from one "state" to another, doing different things based on its "state" and its "current input." This lets us build a very complicated algorithm using a very simple piece of code that's driven by a very intricate, but entirely machine-generated, table of constant numbers. All of the source-code, except those tables, fits very neatly on a couple of printed pages.
What it uses as its input is twofold: (1) a stack of symbols that it has recently seen; and (2) the "current symbol" that it is looking at right now. ("Scanners" are similar, but their input consists of characters.)
At any moment in time, a parser can do [only] one of two things: "shift" or "reduce." (Then, of course, it switches to some other state...)
Shift actually means "push." The current symbol is pushed onto the top of the stack. You do this when the current input-symbol is acceptable but, by itself, does not mean anything. The next symbol will be read, and the process begins again.
Reduce means to remove so-many symbols from the top of the stack and to replace it with just one. You do this when you "recognize" that the symbols you just removed "together, mean something significant." In this case the parser doesn't read another symbol right away, as it might perform several "reduces" in a row.
Usually, the parser gets to call some bit of C-code .. some subroutine, usually .. when it has recognized something.
The parser cannot do both a shift and a reduce at the same time. But if you write the grammar incorrectly, you can produce an ambiguous situation that would require it to do just that ... where either choice would be equally valid, and it's not possible to un-ambiguously decide what to do based on a single input-symbol. It's a fork in the road where either road is equally "correct." This ambiguous situation is called a shift/reduce conflict, and it is not allowed. You must design your grammar, and quite possibly your language, to prevent this.
If you get your jollies out of cryptic terminology like "LALR(0)," please take a moment to enjoy the zen-moment .. :jawa: .. because the previous bullet-point is all that it really means. (The terms were, of course, invented by graduate students for use in their PhD theses ... and such documents were never meant to be, you know, read by mere mortals. :) )
The parser state-machine isn't actually "walking through the grammar" while it's doing its work. Rather, it is walking through a bunch of state-tables that have been generated from that grammar. Ordinarily you do not have to look closely at their contents.
And that's it! A comparatively simple process, and certainly a fast one, that has been couched in needlessly obscure phrases and force-fed to college students. :D To quickly understand it, start with the parser source-code and work backwards. "Use the Source, Luke!"