LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Book "Compiler Construction using Flex and Bison" - problems, discussions, steps, ... (https://www.linuxquestions.org/questions/programming-9/book-compiler-construction-using-flex-and-bison-problems-discussions-steps-4175587087/)

dedec0 08-15-2016 08:13 AM

Book "Compiler Construction using Flex and Bison" - problems, discussions, steps, ...
 
Hello. I started reading the book "Compiler Construction using Flex and Bison", freely available at http://research.microsoft.com/en-us/...l/compiler.pdf. Same file mirrored here: http://balacobaco.insomnia247.nl/dedec0/compiler.pdf .

There are a few errors but was able to fix them until chapter 3. Now, in chapter 4, I have reached a point where I cannot make some modifications the text directs to, without reaching a compilation error from bison.

A working bison file (it compiles with no output), with the lines that causes problem when added/changed is:

Code:

%start program

/*

 We should create this union, although it does not have more than one declaration in it:

%union {  /* SEMANTIC RECORD * /
char *id;  /* For returning identifiers * /
}

  We should change IDENTIFIER token to get the name of declared variables:

%token IDENTIFIER

  becomes:

%token <id> IDENTIFIER /* Simple identifier * /

*/

%token <id> IDENTIFIER /* Simple identifier */
%token IDENTIFIER
%token LET INTEGER INT IN
%token SKIP IF THEN ELSE FI END WHILE DO READ WRITE
%token NUMBER
%token ASSGNOP
%left '-' '+'
%left '*' '/'
%left '<' '>' '=' '' /* were missing in the book */
%right '^ '

%{

#include <stdlib.h> /* For malloc in symbol table */
#include <string.h> /* For strcmp in symbol table */
#include <stdio.h> /* For error messages */
#include "st.h" /* The Symbol Table Module */
#define YYDEBUG 1 /* para depuração */

int install( char* sym_name)
{
    symrec *s;
    s = getsym(sym_name);
    if (s == 0)
        s = putsym (sym_name);
    else
    {
        errors++;
        printf("%s is already defined\n", sym_name);
        return 0;
    }
    return 1;
}

int context_check(char* sym_name)
{
    if ( getsym( sym_name ) == 0 )
    {
        printf("%s is an undeclared identifier\n", sym_name);
        return 0;
    }
    return 1;
}


%}

%%

 /* Grammar rules and actions */

program : LET declarations IN commands END ;

declarations : /* empty */
    | INTEGER id_seq IDENTIFIER '.' { install( $3 ); }
;

id_seq : /* empty */
    | id_seq IDENTIFIER ','            { install( $2 ); }
;
commands : /* empty */
    | commands command ';'
;
command : SKIP
    | READ IDENTIFIER                    { context_check( $2 ); }
    | WRITE exp
    | IDENTIFIER ASSGNOP exp            { context_check( $2 ); }
    | IF exp THEN commands ELSE commands FI
    | WHILE exp DO commands END
;
exp : NUMBER
                            /* book said $2 for this, wrong*/
  | IDENTIFIER                            { context_check( $1 ); }
  | exp '<' exp
  | exp '=' exp
  | exp '>' exp
  | exp '+' exp
  | exp '-' exp
  | exp '' exp
  | exp '/' exp
  | exp '^ ' exp
  | '(' exp ')'
;

%%

 /* C subroutines */

 /* no output, implied parse tree */
int main( int argc, char *argv[] )
{
    extern FILE *yyin;
    ++argv; --argc;
    yyin = fopen( argv[0], "r" );
    yydebug = 1;
    errors = 0;
    yyparse ();
    return 0;
}
int yyerror (char *s) /* chamada por yyparse() com erros */
{
    printf ("%s\n", s);
    return 1;
}

The file as above works. Making the said changes the error reported is:

Code:

$ bison -vd ch4.y
ch4.y:82.54-55: $2 from `command' has no declared type
$

Chapter 4 starts in PDF page 19.

I need this .y file working before I proceed to next session, where the corresponding scanner (a flex file) is changed. May you help me understanding what is wrong and fixing it?

dedec0 08-15-2016 08:15 AM

Forgot to post it. The "st.h" file is:

Code:

typedef struct symrec
{
    char *name;                    /* symbol name*/
    struct symrec *next;    /* link field */
} symrec;

symrec *sym_table = (symrec *)0;
symrec* putsym(char *);
symrec* getsym(char *);

symrec* putsym( char *sym_name)
{
    symrec *ptr;
    ptr = (symrec *) malloc( sizeof(symrec) );
    ptr->name = (char *) malloc( strlen(sym_name) + 1 );
    strcpy( ptr->name, sym_name);
    ptr->next = (symrec*) sym_table;
    sym_table = ptr;
    return ptr;
}

symrec* getsym( char* sym_name)
{
    symrec *ptr;
    for (
        ptr = sym_table;
        ptr != (symrec *) 0;
        ptr = (symrec *)ptr- >next
        )
        if( strcmp( ptr->name, sym_name) == 0 )
            return ptr;
    return 0;
}


grail 08-15-2016 10:23 AM

Are you sure you have copied over all the changes as required? I noticed a reference to "exp : INT" which I do not see in your file. There could be others. I would suggest going back over all entries.

dedec0 08-15-2016 11:19 AM

The "exp: INT" you mention is in the end of PDF page 21? I have changed it to INTEGER.

I had a few doubts and went through a few errors in chapters 1-3. I had to make changes to what is written in the book to be able to compile. I have files made for each chapter, with their given code. Until chapter 3 they work.

Is "INT" something that makes sense? Or would it be a short "typo" for INTEGER, like I thought? And there is also the NUMBER token.

Note that, in the file above, I have added declarations for all of them (or bison gives error for undeclared token).

In the file below I have changed all INT and NUMBER tokens to INTEGER. Then I removed declarations for both, the error is the same. Without the union declaration and the IDENTIFIER with <id> (exchanged) bison compiles it silently (assumed to be good). With the union declaration and <id> in IDENTIFIER, as added in chapter 4, the error appears.

In the code below it is easy to make the changes I mentioned. It is just to cut/paste a few lines from/to multiline comments above each part:

Code:

%start program

 /* SEMANTIC RECORD */
 /* char *id: For returning identifiers */
 /*
Place to easily pasting/cutting the union declaration
%union {
char *id;
}

 */

/* Simple identifier */
 /*
Place to exchange the IDENTIFIER token declarations
%token <id> IDENTIFIER
 */
%token IDENTIFIER

%token LET IN
/* tem os dois, INT e INTEGER?
%token INT
%token NUMBER
*/
%token INTEGER

 /* tava faltando o FI */
%token SKIP IF THEN ELSE FI END WHILE DO READ WRITE
    /* tava faltando o ASSGNOP */
%token ASSGNOP
%left '-' '+'
%left '*' '/'
%left '<' '>' '=' '' /* não tinham no livro, precisa acrescentar */
%right '^ '

%{

#include <stdlib.h> /* For malloc in symbol table */
#include <string.h> /* For strcmp in symbol table */
#include <stdio.h> /* For error messages */
#include "st.h" /* The Symbol Table Module */
#define YYDEBUG 1 /* para depuração */

int install( char* sym_name)
{
    symrec *s;
    s = getsym(sym_name);
    if (s == 0)
        s = putsym (sym_name);
    else
    {
        errors++;
        printf("%s is already defined\n", sym_name);
        return 0;
    }
    return 1;
}

int context_check(char* sym_name)
{
    if ( getsym( sym_name ) == 0 )
    {
        printf("%s is an undeclared identifier\n", sym_name);
        return 0;
    }
    return 1;
}


%}

%%

 /* Grammar rules and actions */

program : LET declarations IN commands END ;

declarations : /* empty */
    | INTEGER id_seq IDENTIFIER '.' { install( $3 ); }
;

id_seq : /* empty */
    | id_seq IDENTIFIER ','            { install( $2 ); }
;
commands : /* empty */
    | commands command ';'
;
command : SKIP
    | READ IDENTIFIER                    { context_check( $2 ); }
    | WRITE exp
    | IDENTIFIER ASSGNOP exp            { context_check( $2 ); }
    | IF exp THEN commands ELSE commands FI
    | WHILE exp DO commands END
;
exp : INTEGER
                                    /* no livro está $2, errado */
  | IDENTIFIER                            { context_check( $1 ); }
  | exp '<' exp
  | exp '=' exp
  | exp '>' exp
  | exp '+' exp
  | exp '-' exp
  | exp '' exp
  | exp '/' exp
  | exp '^ ' exp
  | '(' exp ')'
;

%%

 /* C subroutines */

/* não tem saída, a árvore de recon. fica implícita */
int main( int argc, char *argv[] )
{
    extern FILE *yyin;
    ++argv; --argc;
    yyin = fopen( argv[0], "r" );
    yydebug = 1;
    errors = 0;
    yyparse ();
    return 0;
}
int yyerror (char *s) /* chamada por yyparse() com erros */
{
    printf ("%s\n", s);
    return 0;
}

Code:

$ bison -vd ch4.y
ch4.y:91.54-55: $2 de `command' não tem tipo declarado


grail 08-15-2016 11:59 AM

INT appears in your tokens. Been a while since I have played with this stuff, but I am sure one of the others will be able to help you further :)

dedec0 08-15-2016 12:22 PM

Do not miss it: in this last post INT is only inside a multiline comment (that we should use it to easily change from a working to an erroneous file, or vice versa). There is no other occurrences of it.

I hope to have given enough details of my problem so anyone could easily and quickly reproduce it. Thank you for your goodwill, grail. Do you have a book to recommend? This is the second one that I got with these not so small problems.

smallpond 08-15-2016 12:26 PM

Not seeing any definition for ASSGNOP.

Also I don't think this is right:

Code:

%token <id> IDENTIFIER
It looks like it should be:

Code:

%type <id> IDENTIFIER

dedec0 08-15-2016 01:13 PM

Quote:

Originally Posted by smallpond (Post 5591167)
Not seeing any definition for ASSGNOP.

It is missing in the book, I noted it. But I have added it before starting this thread. It is there on line 30 (and it is not inside a comment): %token ASSGNOP .

Quote:

Originally Posted by smallpond (Post 5591167)
Also I don't think this is right:

Code:

%token <id> IDENTIFIER
It looks like it should be:

Code:

%type <id> IDENTIFIER

Should both type and token exist? It is not clear for me (I tried to read something in https://www.gnu.org/software/bison/m...emantic-Tokens too). My results:

1. Without union and IDENTIFIER is a simple token: it compiles (a previous result).

2. With union declared, "%token IDENTIFIER" removed, line "%type <id> IDENTIFIER" added:

Code:

$bison -vd ch4.y
ch4.y:18.12-21: symbol IDENTIFIER used, but not defined as a token and has no rules
ch4.y:91.54-55: $2 from `command' has no declared type

3. Continuing from try 2, simply add one more line with "%token IDENTIFIER". Result is the same error from the last big post:

Code:

$bison -vd ch4.y
ch4.y:91.54-55: $2 from `command' has no declared type

Did I try everything?

For now I am keeping the "%type <id> IDENTIFIER" line, but the error continues.

smallpond 08-15-2016 01:24 PM

Code:

ch4.y:91.54-55: $2 from `command' has no declared type
The error says you have not fully defined the 2nd word of line 91 which it has expanded from "command".

dedec0 08-15-2016 01:56 PM

Line 91 is
Code:

command : SKIP
    | READ IDENTIFIER                    { context_check( $2 ); }
    | WRITE exp
    | IDENTIFIER ASSGNOP exp            { context_check( $2 ); } /* line 91 */
    | IF exp THEN commands ELSE commands FI
    | WHILE exp DO commands END
;

Should it be $3 in line 91? The error repeats. I do not know what else to do with this information. :( Please give me a clearer hint.

dedec0 08-15-2016 02:02 PM

NO! It should be $1, right?? So context_check will check if the variable is declared or not. Right?? :D :D :D

smallpond 08-15-2016 06:53 PM

How much clearer can I get? I already told you in comment 7 there is no definition for ASSGNOP and you didn't believe me.

astrogeek 08-15-2016 08:31 PM

I will admit first, that I have not fully followed your example code nor read the PDF (that URL is blocked by my local firewall rules).

But I do see that you are confused about the use of %token and %type, so perhaps I can offer some helpful comment on those.

Quote:

Originally Posted by dedec0 (Post 5591205)
Should both type and token exist? It is not clear for me (I tried to read something in https://www.gnu.org/software/bison/m...emantic-Tokens too).

%token and %type are two different things.

%token is used to declare terminals, and may include or require a <type> assignment when a %union has been declared, but not otherwise. So for terminal symbols (which IDENTIFIER seems to be), either of these would be correct depending on usage...

Code:

%token IDENTIFIER
  or
%token <id> IDENTIFIER

... but NOT...

Code:

%type <id> IDENTIFIER
%type is used to declare the types of non-terminals (commands, command, exp, etc... from your example code).

So something like this might be appropriate, again depending on usage...

Code:

%type <id> commands command exp ...
So in simplistic terms...

When %union is declared you will need to declare terminals (tokens) with a type as...

Code:

%token <id> IDENTIFIER
... and non-terminals with a type as...

Code:

%type <id> command commands exp ...
And your code must be consistent with regard to those types as values traverse the parse tree. That is, when a typed right-hand value is assigned to a left-hand (non-terminal) symbol, they must be of the same type or the compiler will complain (this is what the type declarations are used for).

Quote:

Originally Posted by dedec0 (Post 5591205)
For now I am keeping the "%type <id> IDENTIFIER" line, but the error continues.

That is not correct IF IDENTIFIER is a terminal symbol, as appears to be the case.

It may be that the '$2 from command' error you are seeing is complaining about exp, not IDENTIFIER (but, also not clear to me).

dedec0 08-15-2016 08:37 PM

Quote:

Originally Posted by smallpond (Post 5591360)
How much clearer can I get? I already told you in comment 7 there is no definition for ASSGNOP and you didn't believe me.

It is not what I did. Don't mix understanding with believing. ASSGNOP was defined, there is a token declaration for it, as I said above. You expected me to understand that:

"since there is no definition for ASSGNOP, I should look at line 91 and see that I need to change the argument because it is incorrect, it is pointing to ASSGNOP instead of IDENTIFIER"

?

No way. Your words there are not clear at all, not for me. I bet that not for others too. The title of the thread contains the word "learning" because I know very little of Bison, if I can say I know something about it at all.

Anyway, thank you. Our conversation helped me to solve another "easy" problem in this book.

dedec0 08-15-2016 08:57 PM

astrogeek, thank you for your explanations. I had not yet seen %type, so I had no clue what was it. I just used it as something more to try, almost blindly (as my trial and error report shows).

As I undertand now, IDENTIFIER is the token that corresponds to the variable name, not its value or symbol. In C we could have an integer atribution:

boxOfFruits = 26;

For this line, there would be 4 tokens: IDENTIFIER (with a string "boxOfFruits"); EQUAL_SIGN; INTEGER (with value 26); END_OF_LINE. So, IDENTIFIER is a terminal symbol or not (I think it is not). The language of the book it not C, but it is something similar for that expression.


All times are GMT -5. The time now is 11:31 AM.