Book "Compiler Construction using Flex and Bison" - problems, discussions, steps, ...
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
There are a few errors but was able to fix them until chapter 3. Now, in chapter 4, I have reached a point where I cannot make some modifications the text directs to, without reaching a compilation error from bison.
A working bison file (it compiles with no output), with the lines that causes problem when added/changed is:
Code:
%start program
/*
We should create this union, although it does not have more than one declaration in it:
%union { /* SEMANTIC RECORD * /
char *id; /* For returning identifiers * /
}
We should change IDENTIFIER token to get the name of declared variables:
%token IDENTIFIER
becomes:
%token <id> IDENTIFIER /* Simple identifier * /
*/
%token <id> IDENTIFIER /* Simple identifier */
%token IDENTIFIER
%token LET INTEGER INT IN
%token SKIP IF THEN ELSE FI END WHILE DO READ WRITE
%token NUMBER
%token ASSGNOP
%left '-' '+'
%left '*' '/'
%left '<' '>' '=' '' /* were missing in the book */
%right '^ '
%{
#include <stdlib.h> /* For malloc in symbol table */
#include <string.h> /* For strcmp in symbol table */
#include <stdio.h> /* For error messages */
#include "st.h" /* The Symbol Table Module */
#define YYDEBUG 1 /* para depuração */
int install( char* sym_name)
{
symrec *s;
s = getsym(sym_name);
if (s == 0)
s = putsym (sym_name);
else
{
errors++;
printf("%s is already defined\n", sym_name);
return 0;
}
return 1;
}
int context_check(char* sym_name)
{
if ( getsym( sym_name ) == 0 )
{
printf("%s is an undeclared identifier\n", sym_name);
return 0;
}
return 1;
}
%}
%%
/* Grammar rules and actions */
program : LET declarations IN commands END ;
declarations : /* empty */
| INTEGER id_seq IDENTIFIER '.' { install( $3 ); }
;
id_seq : /* empty */
| id_seq IDENTIFIER ',' { install( $2 ); }
;
commands : /* empty */
| commands command ';'
;
command : SKIP
| READ IDENTIFIER { context_check( $2 ); }
| WRITE exp
| IDENTIFIER ASSGNOP exp { context_check( $2 ); }
| IF exp THEN commands ELSE commands FI
| WHILE exp DO commands END
;
exp : NUMBER
/* book said $2 for this, wrong*/
| IDENTIFIER { context_check( $1 ); }
| exp '<' exp
| exp '=' exp
| exp '>' exp
| exp '+' exp
| exp '-' exp
| exp '' exp
| exp '/' exp
| exp '^ ' exp
| '(' exp ')'
;
%%
/* C subroutines */
/* no output, implied parse tree */
int main( int argc, char *argv[] )
{
extern FILE *yyin;
++argv; --argc;
yyin = fopen( argv[0], "r" );
yydebug = 1;
errors = 0;
yyparse ();
return 0;
}
int yyerror (char *s) /* chamada por yyparse() com erros */
{
printf ("%s\n", s);
return 1;
}
The file as above works. Making the said changes the error reported is:
Code:
$ bison -vd ch4.y
ch4.y:82.54-55: $2 from `command' has no declared type
$
Chapter 4 starts in PDF page 19.
I need this .y file working before I proceed to next session, where the corresponding scanner (a flex file) is changed. May you help me understanding what is wrong and fixing it?
Are you sure you have copied over all the changes as required? I noticed a reference to "exp : INT" which I do not see in your file. There could be others. I would suggest going back over all entries.
The "exp: INT" you mention is in the end of PDF page 21? I have changed it to INTEGER.
I had a few doubts and went through a few errors in chapters 1-3. I had to make changes to what is written in the book to be able to compile. I have files made for each chapter, with their given code. Until chapter 3 they work.
Is "INT" something that makes sense? Or would it be a short "typo" for INTEGER, like I thought? And there is also the NUMBER token.
Note that, in the file above, I have added declarations for all of them (or bison gives error for undeclared token).
In the file below I have changed all INT and NUMBER tokens to INTEGER. Then I removed declarations for both, the error is the same. Without the union declaration and the IDENTIFIER with <id> (exchanged) bison compiles it silently (assumed to be good). With the union declaration and <id> in IDENTIFIER, as added in chapter 4, the error appears.
In the code below it is easy to make the changes I mentioned. It is just to cut/paste a few lines from/to multiline comments above each part:
Code:
%start program
/* SEMANTIC RECORD */
/* char *id: For returning identifiers */
/*
Place to easily pasting/cutting the union declaration
%union {
char *id;
}
*/
/* Simple identifier */
/*
Place to exchange the IDENTIFIER token declarations
%token <id> IDENTIFIER
*/
%token IDENTIFIER
%token LET IN
/* tem os dois, INT e INTEGER?
%token INT
%token NUMBER
*/
%token INTEGER
/* tava faltando o FI */
%token SKIP IF THEN ELSE FI END WHILE DO READ WRITE
/* tava faltando o ASSGNOP */
%token ASSGNOP
%left '-' '+'
%left '*' '/'
%left '<' '>' '=' '' /* não tinham no livro, precisa acrescentar */
%right '^ '
%{
#include <stdlib.h> /* For malloc in symbol table */
#include <string.h> /* For strcmp in symbol table */
#include <stdio.h> /* For error messages */
#include "st.h" /* The Symbol Table Module */
#define YYDEBUG 1 /* para depuração */
int install( char* sym_name)
{
symrec *s;
s = getsym(sym_name);
if (s == 0)
s = putsym (sym_name);
else
{
errors++;
printf("%s is already defined\n", sym_name);
return 0;
}
return 1;
}
int context_check(char* sym_name)
{
if ( getsym( sym_name ) == 0 )
{
printf("%s is an undeclared identifier\n", sym_name);
return 0;
}
return 1;
}
%}
%%
/* Grammar rules and actions */
program : LET declarations IN commands END ;
declarations : /* empty */
| INTEGER id_seq IDENTIFIER '.' { install( $3 ); }
;
id_seq : /* empty */
| id_seq IDENTIFIER ',' { install( $2 ); }
;
commands : /* empty */
| commands command ';'
;
command : SKIP
| READ IDENTIFIER { context_check( $2 ); }
| WRITE exp
| IDENTIFIER ASSGNOP exp { context_check( $2 ); }
| IF exp THEN commands ELSE commands FI
| WHILE exp DO commands END
;
exp : INTEGER
/* no livro está $2, errado */
| IDENTIFIER { context_check( $1 ); }
| exp '<' exp
| exp '=' exp
| exp '>' exp
| exp '+' exp
| exp '-' exp
| exp '' exp
| exp '/' exp
| exp '^ ' exp
| '(' exp ')'
;
%%
/* C subroutines */
/* não tem saída, a árvore de recon. fica implícita */
int main( int argc, char *argv[] )
{
extern FILE *yyin;
++argv; --argc;
yyin = fopen( argv[0], "r" );
yydebug = 1;
errors = 0;
yyparse ();
return 0;
}
int yyerror (char *s) /* chamada por yyparse() com erros */
{
printf ("%s\n", s);
return 0;
}
Code:
$ bison -vd ch4.y
ch4.y:91.54-55: $2 de `command' não tem tipo declarado
Do not miss it: in this last post INT is only inside a multiline comment (that we should use it to easily change from a working to an erroneous file, or vice versa). There is no other occurrences of it.
I hope to have given enough details of my problem so anyone could easily and quickly reproduce it. Thank you for your goodwill, grail. Do you have a book to recommend? This is the second one that I got with these not so small problems.
It is missing in the book, I noted it. But I have added it before starting this thread. It is there on line 30 (and it is not inside a comment): %token ASSGNOP .
1. Without union and IDENTIFIER is a simple token: it compiles (a previous result).
2. With union declared, "%token IDENTIFIER" removed, line "%type <id> IDENTIFIER" added:
Code:
$bison -vd ch4.y
ch4.y:18.12-21: symbol IDENTIFIER used, but not defined as a token and has no rules
ch4.y:91.54-55: $2 from `command' has no declared type
3. Continuing from try 2, simply add one more line with "%token IDENTIFIER". Result is the same error from the last big post:
Code:
$bison -vd ch4.y
ch4.y:91.54-55: $2 from `command' has no declared type
Did I try everything?
For now I am keeping the "%type <id> IDENTIFIER" line, but the error continues.
%token is used to declare terminals, and may include or require a <type> assignment when a %union has been declared, but not otherwise. So for terminal symbols (which IDENTIFIER seems to be), either of these would be correct depending on usage...
Code:
%token IDENTIFIER
or
%token <id> IDENTIFIER
... but NOT...
Code:
%type <id> IDENTIFIER
%type is used to declare the types of non-terminals (commands, command, exp, etc... from your example code).
So something like this might be appropriate, again depending on usage...
Code:
%type <id> commands command exp ...
So in simplistic terms...
When %union is declared you will need to declare terminals (tokens) with a type as...
Code:
%token <id> IDENTIFIER
... and non-terminals with a type as...
Code:
%type <id> command commands exp ...
And your code must be consistent with regard to those types as values traverse the parse tree. That is, when a typed right-hand value is assigned to a left-hand (non-terminal) symbol, they must be of the same type or the compiler will complain (this is what the type declarations are used for).
Quote:
Originally Posted by dedec0
For now I am keeping the "%type <id> IDENTIFIER" line, but the error continues.
That is not correct IF IDENTIFIER is a terminal symbol, as appears to be the case.
It may be that the '$2 from command' error you are seeing is complaining about exp, not IDENTIFIER (but, also not clear to me).
How much clearer can I get? I already told you in comment 7 there is no definition for ASSGNOP and you didn't believe me.
It is not what I did. Don't mix understanding with believing. ASSGNOP was defined, there is a token declaration for it, as I said above. You expected me to understand that:
"since there is no definition for ASSGNOP, I should look at line 91 and see that I need to change the argument because it is incorrect, it is pointing to ASSGNOP instead of IDENTIFIER"
?
No way. Your words there are not clear at all, not for me. I bet that not for others too. The title of the thread contains the word "learning" because I know very little of Bison, if I can say I know something about it at all.
Anyway, thank you. Our conversation helped me to solve another "easy" problem in this book.
astrogeek, thank you for your explanations. I had not yet seen %type, so I had no clue what was it. I just used it as something more to try, almost blindly (as my trial and error report shows).
As I undertand now, IDENTIFIER is the token that corresponds to the variable name, not its value or symbol. In C we could have an integer atribution:
boxOfFruits = 26;
For this line, there would be 4 tokens: IDENTIFIER (with a string "boxOfFruits"); EQUAL_SIGN; INTEGER (with value 26); END_OF_LINE. So, IDENTIFIER is a terminal symbol or not (I think it is not). The language of the book it not C, but it is something similar for that expression.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.