Quote:
So, I will try to offer what I think will be the most helpful comments for you, but maybe not as directly applicable to your code example as you would like. First, let me try to clarify a few terms and their use that did not seem to be presented clearly in my limited scan of the PDF. I think that getting these very clearly in mind will allow you to understand things not so clearly explained in examples that you find on the interwebs... Let's start with terminal and non-terminal. These terms come directly from BNF/EBNF and the foundations of grammer theory. They are not especially relevant to the code itself unless you already know what they mean! I did not see any explanation of them in your PDF, so you might want to search online for a complete discussion. But in simple terms, in your grammer (which is what the Bison rules section code actually is), non-terminals are the symbols that appear on the left-hand side of any expression, and terminals are what is received from the lexer or scanner. Non-terminals are called that because they can be further reduced to other terms. Terminals are end-points, they cannot be further reduced. Note that your PDF adopts the convention that terminals are shown in all caps, and non-terminals are always lower case. So, IDENTIFIER is definitely a terminal - and it comes from the lexer too, obviously. Now that we have a simple working idea of those, what are "tokens"? Tokens are what the lexer or scanner produces - it is a "tokenizer". And what it produces is received by the parser (syntactical analyzer) as terminals... tokens in the lexer are terminals in the grammer. Very easy. Tokens are represented as integer values as a result of Bison processing the %token declarations, these integer definitions are passed to the lexer by the tab.h files produced by Bison so that they have the same names on both ends, but they are just integers associated with names - enums. But the scanner may need to also return a value along with a token, and in the parser the $1, $2, ... references in the action code are mapped to these values, not to the token itself. These values are always passed in a variable named yylval. Look at your lexer code - it returns the token (integer identifier) by name, and if there is an associated value it is first placed into yylval, where the parser can find it. If you only ever pass integer values, then yylval is all you need - an integer by default. But, if you need to pass other types such as strings or structures you must declare a %union type. What the %union declaration does is creates a union definition in the resulting C code so that pointers to other values can be returned with the tokens. Of course, then you must tell the parser what type to expect! This is what the type declarations are all about! If a %union declaration exists, then you must declare the type of every terminal and non-terminal that is referenced by value ($1, $2, ...) in the action code. This way the parser code knows which union member to fetch and can check for type errors in the code! These declarations appear like this in the Bison source file: Code:
%token <type> TERMINAL-NAME Terminals and non-terminals that are never referenced as $1, $2, etc. in the action code need no type declaration. Additionally, any non-terminal (left hand symbol) that can take on the value of a typed terminal or non-terminal on the right hand side, must have the same declared type! If they do not, then you will receive the type of error in your example code: "$2 from `command' has no declared type", or a similar complaint about mis-matched types. So, if this makes sense, just review your example code and see which symbols are terminals that return a value, and be sure it has a type declaration. And check all the non-terminals that can be referenced by or assigned a typed value, and be sure they have a type declaration. Hopefully, when you see a "type" error generated by the parser processing you will know what it means and how to fix it! That is all the time I have tonight, sorry for the length, and good luck! |
Thank you very much for this post, astrogeek. It blends together several things that I studied a bit, some a little more. And it puts Bison and Flex together in the story with their own possible functions, clearly. It helped me to understand more about what I am reading. :) And I will read the post again, just to make sure... hahaha
I agree with you. This book feels unfinished, but (at least) it is free to distribute. I am finding it better to read than the other book about Flex and Bison that I tried before. A friend have suggested that a Wikibook is made from this book, with at least these corrections and improvements - and possibly with a few others, naturally. |
I am glad that the previous post was helpful!
During the day today I have tried to catch up with you by compiling the example code from the PDF... It is not compilable as written... I copied and pasted each step - including MUCH editing to adjust for dumb-quotes (usually called smart-quotes) in the PDF code, as well as dumb-substitutions such as a unicode -- for the simple -, etc... I expect that you must have done the same. I also added the missing token declarations ASSGNOP, FI, etc... And I finally got down to the Chapter 4 complete version. But that cannot be processed by Bison due to multiple missing type declarations (declarations, command, id_seq,... more). I suspect those you were seeing are just the beginning for you. I also suspect that the author intends to replace IDENTIFIER with IDENT, but that is not clear in the text. The lexer does not include a rule to return an IDENT token either, and the grammer rules using it are inconsistent, or at least make no sense as written. It is not clear to me that the author intends for the examples to actually be compiled, at least not incrementally per-chapter. It may be that the author intends to make further modifications in the next chapter, but I will leave that up to you to follow along at this point. Perhaps they will arrive at a "final" version, but I think that you will waste a lot of time trying to compile the modifications per-chapter, at least if Chapter 4 is representative. I am not discouraging you from completing the exercise, but it is clearly not a tested code example, so you might want to read along and not get bogged down debugging the untested example code - or look for another example to follow... |
As I read chapters 1-3, I have made a few files, trying to make them as correct as possible. The aim is to have a more practical reading.
Chapter 1 does not need a program file. I have only a txt with what it shows. As shell scripts, I consider lines starting with "#" a comment, and added one: Code:
# Context-free grammar for the language Simple Code:
%start program Code:
%{ Code:
# produces ch2.output, ch2.tab.c and ch2.tab.h Chapter 4 files is my next task, probably possible now with the help I got in this thread. |
I did not note your answer before my post with files' code. I only read it now.
Well, I will try to follow the whole book as I did for chapters 1-3. If it increases its numbers of problems, I will do as you suggest: just pass them by while trying to get the idea there. "Follow other examples": if you (or anyone else) can recommend better books or other materials to read, that is welcome! |
Sorry to be so slow responding.
I had a chance to look ahead in the PDF that you are working from later today, and I must say I am not optimistic that you will learn Flex and Bison from it, assuming that is your objective. After Chapter 4 it becomes somewhat confused with no continuity from what had gone before. Chapter 5, Optimization is mostly just random thoughts (IMO) that ends with "We do not illustrate an optimizer in the parser...". Chapter 6, Virtual Machines... little connection to the rest of the PDF... Chapter 7 Code Generation is back to the original code, with more modifications, but honestly I think there would be little point in trying to follow it as a learning exercise. I have sent you a PM with some other options and will post any useful links for examples that I can find back in this thread. Good luck. |
Chapter 4: first try (does not work yet)
Chapter 4 is fairly fast to read, but many things are implied or come from previous chapters. I have made a few files to achieve something that makes sense for the whole chapter (and, if possible, that can be compiled, run and tested, not yet achieved).
I had some difficulties with it, though. The files are shown separately below, after everything I write (so you do not have more to read after their start - that is not a comment inside the files, there are some). The files are being compiled with: Code:
# first bison because ch4.tab.h is included in flex file Code:
$ gcc lex.yy.c -lfl ch4.lex: Code:
%{ Code:
%start program Code:
/* symbol table module */ |
Quote:
No, IDENTIFIER is not a terminal symbol. It is a variable name in their declaration or use. The Simple grammar was showed in chapter 1 (pdf page 9). I have made a text file from it, given below. I am not sure what is/was the exp problem. There was an error due to grammar ambiguity that I solved it (somewhere I do not remember now). There was also a simple error too about a "$M" being used where a "$K" makes sense. ---------------------------------------------------------- ch1.txt: Code:
# Context-free grammar for the language Simple |
Hi dedec0!
Sorry to be so late getting in here, and I only have a few minutes so I'll have to be brief for now. Quote:
And yes, IDENTIFIER is a terminal symbol. Perhaps you are confused about what a terminal symbol is? In the simple CF grammer on pages 8-9 (bottom page numbers 2-3), IDENTIFIER appears only on the right hand side of any expressions and is used exactly as a terminal symbol. Also, immediately below that figure on page 9 (page number 3) it says... Quote:
Further along on page 16 (page number 10) in the scanner example code, IDENTIFIER is a token returned by the scanner to the parser - i.e., a terminal symbol. I don't know what to add to that, and I am not sure how you think that should be used if it is not a terminal symbol... perhaps if you could try to explain to me more precisely what you think IDENTIFIER is in the CF grammer, and how that relates to IDENTIFIER in the scanner, maybe we could reach a better understanding of this example. A little OT: Late last week I was able to spend a little time refreshing my Flex/Bison notes and I found a link to an example I do not recall seeing before - and it is a very good Flex Bison C++ Example too! The C++ Flex code may be a little confusing so just focus on the lexer rules and the grammer in the Bison code. But it compiles and works well and will give you an example parser that builds the AST explicitly. Hope it is helpful. |
Two parts:
1. A terminal symbol is one that does change anymore. It may be something written in the source code, like numbers values, variable names or language symbols ('+','-', ...). IDENTIFIER is the token given where variable names are declared or used in Simple. Nonterminal symbols are variables in the grammar. For example, in the grammar we have a line: program ::= LET [ declarations ] IN command_sequence END. In this line: program, declarations and command_sequence are nonterminal symbols (or variables); LET, IN and END are terminal symbols. In this book, the convention to use uppercase/lowercase for nonterminal/terminal symbols is different from what we always used in school. In a Simple program (source code), we may have: Code:
let integer trees, seed_package in Is there something you want to correct or add in my ideas? 2. I am waiting for an answer and comments in my post about chapter 4 |
Quote:
|
Indeed, that is better. Thank you also for remembering the parse tree concept and its characteristics. :)
|
Another way to think of it is that a "terminal symbol" is "something that you can see in the source-code you are compiling." They are the structural elements that define the language and that allow positions in the grammar to be non-ambiguously determined.
Noterminal symbols correspond to something else in the grammar. |
Quote:
Code:
# First produce C files |
Every parser-generator which produces a compilable source-file must provide some means for you to include headers at the top of the generated code, if it does not provide them already. Statements needed to #include the header-files needed by the compiler must be provided-for by appropriate means. These sections will generally be included in the output source-file verbatim.
|
All times are GMT -5. The time now is 08:51 AM. |