Quote:
Originally Posted by des_a
I basically taught myself some of these languages I'm working with. I never did them in school or anything.
|
Good for you! I am all self taught too, it is the best way to learn, provided that you want to learn! Everything else is inferior!
OK, my too-long notes... I have worked on them incrementally, apologies for any distracted errors.
As mentioned previously, you appear to be doing many things in your action code that should be handled by the lexer and the grammar long before your action code comes into play. I suspect that this is the root cause of your malformed parameters which is the main subject of this thread. And I know that it will be the cause of many other problems as you attempt to implement other features which are now only skeletons in your code. As you fix every small thing by modifying your action code, each successive problem will become increasingly intractable, until you let the parser do what it was designed to do.
So I will suggest what I think are the most important things that you can fix now, which will greatly simplify your current, and future action code, and help you avoid the many difficulties which Flex and Bison were designed to resolve.
In this post I will focus on the data path from text to grammar, and getting it under control in the most useful way.
To start at the beginning, let's rethink your YYSTYPE definition. This is the definition of the data type used to pass TOKEN values from the lexer to the parser. Currently you define it as a string, which probably seems to make other things simpler. In truth, it complicates almost everything else you do and severely limits many future options!
YYSTYPE defines the data type of yylval. yylval is passed between lexer and parser to return values with each call of yylex(...). By default it is an integer.
Bison allows you to define your own type, or it provides a powerful means of defining a more comlex type by the
%union declaration. When you make use of the %union declaration you get the additional and very important benefit of being able to easily specify the data types of each TOKEN value and of non-terminals in your grammar, as a by-product!
You will have to think through your own needs, but here is a general way that I often approach it. First, remove your current YYSTYPE definition, you won't need it. Then after the first code block at top of your Bison code, declare your %union, something like this...
Code:
%union {
int ival;
string *sval;
astnode *aval;
}
Here I have declared a union of three types: Integer, pointer to string and pointer to astnode (which I define for my own uses). I would suggest integer and string pointer for your current uses, at minimum. Note that because this results in a C-type union it cannot contain complex types, but can contain pointers to them.
The member names of the union become the types for TOKENS and non-terminals in your parser. So you might then re-define your tokens similar to this (example only):
Code:
%token <sval> TSTRING STRING
Currently I think those are the only token values you are using (which is a topic for another post). Tokens which you never reference in action code do not require a type.
As your grammar is refined and extended you will also want/need to define the types of your non-terminals. That uses the %type declaration, something like this...
Code:
%type <sval> newstring string string_cmd macro_cmd define_cmd...
%type <aval> nodetype ...
I have shown the fictional <aval> example for reasons I will show further down.
Obvious string types I have made <sval> types.
Others should probably be <ival> until you have a reason to make them something else.
Now, with a union type for yylval, we need to update your lexer code to make use of it. You must now set the values using ordinary C-type union syntax which specifies the member being set. And for string values you will need to actually create the referenced string, and destroy it after use.
Code:
{STRING} { yylval->sval = new string(yytext); return STRING; }
Here we create the string with the new operator and set it's value to the contents of the now familiar yytext, and set
yylval->sval to the returned string pointer.
I am sure you can figure out what to do with integer values.
Also remember to remove any currently unused references to yylval in the Flex definitions.
Now, back to the parser, you need a single rule which will process this returned value and make it available to anything that now accepts the STRING token as a non-terminal instead. This will allow you remove many instances of code which currently post-process all those string tokens! Something like this perhaps...
Code:
newstring: STRING { $$ = strip_quotes($1); }
I have shown your strip_quotes() function for example, but with a proper Flex regular expression you should need no further processing of most received tokens in this manner.
Now, with this as the ONLY rule that receives STRING tokens, having done any necessary scrubbing and set the typed value of newstring to use it, every other reference to STRING can be replaced with newstring and further string manipulations removed!
Finally, when a grammar rule uses the newstring instance and it is no longer needed, delete it to prevent memory leaks.
Code:
some_rule: SOME_TOKEN newstring {handler.process($2); delete $2;}
You should be able to implement something like this in your existing code easily, and retain your current level of function.This will add some necessary constraints to you token value handling and provide C/C++ language type enforcement, avoiding the usual pitfalls of badly typed data.
Once this mechanism is in place, we should rethink your tokenizing and grammar rules to let the lexer/parser do the heavy lifting and greatly simplify the action code... mostly for free... but the topic for another post.
One final brief comment on my <aval> types which I included in the %union declaration. A very useful parsing approach called
Attribute Grammars allows you to associate context with each node of the parse tree, or Abstract Syntax Tree (hence, "astnode"). When I make use of this I define a node class, or classes with members for whatever attributes I wish to associate, and methods for manipulating them. I include pointer to this type as a member of the %union declaration.
Then I define all applicable non-terminals the type <aval>, create the node instances in the first rule that must handle them and set $$ to the returned pointer, something like this...
Code:
some_parameter: newstring other_param { $$ = new astnode(AST_TYPE,$1,$2); }
Subsequent rules which use or extend
some_parameter all share a common API without further ado! Each node accumulates it's context attributes as it moves up the parse tree. If the astnode class includes left and right pointers of type *astnode you literally build the abstract syntax tree as a linked list of nodes - very useful and powerful and easy to implement.
I include this not to say that you should do all of it at this time, but to encourage you to make best use of the tools at hand. It is
literally no more difficult than all the code you are currently writing, but it is much more useful, robust and simpler to maintain.
Sorry this went so long, but I wanted to give a solid overall view of these ideas in one place. Hope it helps!