[SOLVED] PNFPP: Help with Macros

des_a · 03-29-2017, 11:04 PM

I just changed yytext to $1. Then I removed all the output statements that were for debugging, so that if I'm going to use them, I can start fresh again and not forget them in my code. Now it gives not output to cout, as I'd expect. However, there is actually no other functional difference. Every result of all tests came out exactly as before. But I bet you are right. It's not meant to use and could at some point, crash. It is a time bomb perhaps. Therefore, since in my current code it was indeed trivial, it is now fixed.

I will try to keep better track of changes I make, by using comments, so that I can post only the parts I need to post, at least until we're done with it. This description, was all I needed here. Thanks!

Please tell me whether or not, if spaces might not matter, but might not be pretty, in the output, I should leave it the way it is now or not. If so, it's perhaps good enough. If not, it needs to still be debugged. Thanks again!

astrogeek · 03-29-2017, 11:45 PM

Quote:

Originally Posted by des_a

I just changed yytext to $1... Every result of all tests came out exactly as before. But I bet you are right. It's not meant to use and could at some point, crash. It is a time bomb perhaps. Therefore, since in my current code it was indeed trivial, it is now fixed.

Good! If it was working it was just a happy accident, and definitely a time bomb! Referencing bad pointers and using discarded memory for data input are among the most obscure and difficult problems to debug later!

Quote:

Originally Posted by des_a

I will try to keep better track of changes I make, by using comments, so that I can post only the parts I need to post, at least until we're done with it.

Thanks, with the asynchronous nature of communication in a forum, anything that helps keep the focus over time is a great help to everyone.

Quote:

Originally Posted by des_a

Please tell me whether or not, if spaces might not matter, but might not be pretty, in the output, I should leave it the way it is now or not. If so, it's perhaps good enough. If not, it needs to still be debugged. Thanks again!

Only the applications that will consume the output of your parser can tell you whether extra whitespace is harmful. If you do not have any sort of specification for that, now would be a good time to write one!

Quote:

Originally Posted by des_a

It doesn't look as if it matters. Should I fix it anyway?

Quote:

Originally Posted by des_a

What would you do?

I'd fix it if I knew it was broken!

Always fix what is broken, never fix what is not broken!

That said, I know that you want to "fix" what you have, but there is no guarantee that it is fixable in the larger sense! After all, it has never actually worked and there is no specification to tell us when it is working right... "right" is not defined in this case.

So the best that I and others here can do is to offer our experience, look at the parts that do have well known behavior and tell you how to get the best results from them. In particular, Flex, Bison and the grammer itself.

Flex and Bison, the theory behind them and implementations based on them solve many, many inter-related problems in a robust way - the very problems that you are trying to solve! If it were possible to just bypass them with your own C code for now and figure out how to use them later, then they would not really be needed at all would they?

Your current "fixes" all relate to manipulation of possibly malformed strings. The real question is, why are they malformed in the first place? Fixing them in the code downstream of the lexer and parser is much like fixing your aircraft's engines after the crash! Let's get them working right before you take off.

I am currently working on a follow-up post to what has gone before, which will cover what I think is the most important things you can do next. Hopefully I can finish it later tonight.

des_a · 03-30-2017, 01:19 AM

Quote:

Only the applications that will consume the output of your parser can tell you whether extra whitespace is harmful. If you do not have any sort of specification for that, now would be a good time to write one!

I don't have anything specifically written about whether it's harmful or not at this point. But I checked my code, and it's not harmful. But it looks ugly after it's generated.

Quote:

I'd fix it if I knew it was broken!

Always fix what is broken, never fix what is not broken!

It is not actually broken, just doesn't look as pretty.

Quote:

That said, I know that you want to "fix" what you have, but there is no guarantee that it is fixable in the larger sense! After all, it has never actually worked and there is no specification to tell us when it is working right... "right" is not defined in this case.

I don't know if it's very fixable or not, because of the larger picture. I do have an idea in my head of when it is working "right". I just didn't write it down, as it's more difficult for me. I SHOULD have an idea of when it's working right by now. I have been working on it since I was 15, and am now 30. I just have never gotten nearly this far before.

Quote:

Your current "fixes" all relate to manipulation of possibly malformed strings. The real question is, why are they malformed in the first place? Fixing them in the code downstream of the lexer and parser is much like fixing your aircraft's engines after the crash! Let's get them working right before you take off.

I agree, but with my current skill level because of experience with larger "working" programs, vs textbook programs, it was the best I could do. I am capable of learning, but I need to practice with more real life stuff before I am able to do better than that. The fact that I can do this much shows significant improvement. I basically taught myself some of these languages I'm working with. I never did them in school or anything. I used books and online references, until with the basics of C++, I'm well educated. My debugging skills are less capable. That was why I'd ended up abandoning lots of other versions of this. I was never as capable as I am now.

I'll wait for your next post, but I think I shall call this done enough, unless you have some more to say that leads me to believe otherwise.

Even when we're done with this, please stay tuned. While debugging this, it was working well enough I could continue with my plan on how to use it in real life. Here is a summary of what I have done:

In my interpreter, PNF, I have now made it so that although it works as before, if the file name is a certain other extension, it will preprocess the file before it runs, and then it will run that preprocessed version. This is the start of allowing a concept similar to G++ libraries, and Windows Dynamic Link Libraries. This can also be used to download only what is needed, if it were a web application. I'm working on specifications of how that will work, but I don't know yet how it will work, and if it will require changes or not. I eventually want HTML, to support a new parameter for the APPLET tag, "language =". It could say, "Java" (or bytecode maybe), or it could say, "PNF". If it said "PNF", it would run my language, which is inherently more powerful than Java. The preprocessing feature will be the basis of downloading only what is needed, like Java already does.

There is no library for my language yet, but there might be a standard library I will write. This is the basics of making that work okay. That's why I needed a preprocessor. I could have built it into my other languages, but then I'd have to duplicate lots of code, which is a bad thing.

After making the change to PNF, I required a change to PNFASM. I required some directives to pass on to the lower level some preprocessing directives. Then I can work on the way to preprocess PNFASM itself. I'm having some trouble with it, but that's for another thread. Please stay tuned. I hope it is a simple problem though...

astrogeek · 03-30-2017, 02:35 AM

Quote:

Originally Posted by des_a

I basically taught myself some of these languages I'm working with. I never did them in school or anything.

Good for you! I am all self taught too, it is the best way to learn, provided that you want to learn! Everything else is inferior!

OK, my too-long notes... I have worked on them incrementally, apologies for any distracted errors.

As mentioned previously, you appear to be doing many things in your action code that should be handled by the lexer and the grammar long before your action code comes into play. I suspect that this is the root cause of your malformed parameters which is the main subject of this thread. And I know that it will be the cause of many other problems as you attempt to implement other features which are now only skeletons in your code. As you fix every small thing by modifying your action code, each successive problem will become increasingly intractable, until you let the parser do what it was designed to do.

So I will suggest what I think are the most important things that you can fix now, which will greatly simplify your current, and future action code, and help you avoid the many difficulties which Flex and Bison were designed to resolve.

In this post I will focus on the data path from text to grammar, and getting it under control in the most useful way.

To start at the beginning, let's rethink your YYSTYPE definition. This is the definition of the data type used to pass TOKEN values from the lexer to the parser. Currently you define it as a string, which probably seems to make other things simpler. In truth, it complicates almost everything else you do and severely limits many future options!

YYSTYPE defines the data type of yylval. yylval is passed between lexer and parser to return values with each call of yylex(...). By default it is an integer.

Bison allows you to define your own type, or it provides a powerful means of defining a more comlex type by the %union declaration. When you make use of the %union declaration you get the additional and very important benefit of being able to easily specify the data types of each TOKEN value and of non-terminals in your grammar, as a by-product!

You will have to think through your own needs, but here is a general way that I often approach it. First, remove your current YYSTYPE definition, you won't need it. Then after the first code block at top of your Bison code, declare your %union, something like this...

Code:

%union {
        int ival;
        string *sval;
        astnode *aval;
}

Here I have declared a union of three types: Integer, pointer to string and pointer to astnode (which I define for my own uses). I would suggest integer and string pointer for your current uses, at minimum. Note that because this results in a C-type union it cannot contain complex types, but can contain pointers to them.

The member names of the union become the types for TOKENS and non-terminals in your parser. So you might then re-define your tokens similar to this (example only):

Code:

%token <sval> TSTRING STRING

Currently I think those are the only token values you are using (which is a topic for another post). Tokens which you never reference in action code do not require a type.

As your grammar is refined and extended you will also want/need to define the types of your non-terminals. That uses the %type declaration, something like this...

Code:

%type <sval> newstring string string_cmd macro_cmd define_cmd...
%type <aval> nodetype ...

I have shown the fictional <aval> example for reasons I will show further down.

Obvious string types I have made <sval> types.

Others should probably be <ival> until you have a reason to make them something else.

Now, with a union type for yylval, we need to update your lexer code to make use of it. You must now set the values using ordinary C-type union syntax which specifies the member being set. And for string values you will need to actually create the referenced string, and destroy it after use.

Code:

{STRING}        { yylval->sval = new string(yytext); return STRING; }

Here we create the string with the new operator and set it's value to the contents of the now familiar yytext, and set yylval->sval to the returned string pointer.

I am sure you can figure out what to do with integer values.

Also remember to remove any currently unused references to yylval in the Flex definitions.

Now, back to the parser, you need a single rule which will process this returned value and make it available to anything that now accepts the STRING token as a non-terminal instead. This will allow you remove many instances of code which currently post-process all those string tokens! Something like this perhaps...

Code:

newstring: STRING { $$ = strip_quotes($1); }

I have shown your strip_quotes() function for example, but with a proper Flex regular expression you should need no further processing of most received tokens in this manner.

Now, with this as the ONLY rule that receives STRING tokens, having done any necessary scrubbing and set the typed value of newstring to use it, every other reference to STRING can be replaced with newstring and further string manipulations removed!

Finally, when a grammar rule uses the newstring instance and it is no longer needed, delete it to prevent memory leaks.

Code:

some_rule: SOME_TOKEN newstring {handler.process($2); delete $2;}

You should be able to implement something like this in your existing code easily, and retain your current level of function.This will add some necessary constraints to you token value handling and provide C/C++ language type enforcement, avoiding the usual pitfalls of badly typed data.

Once this mechanism is in place, we should rethink your tokenizing and grammar rules to let the lexer/parser do the heavy lifting and greatly simplify the action code... mostly for free... but the topic for another post.

One final brief comment on my <aval> types which I included in the %union declaration. A very useful parsing approach called Attribute Grammars allows you to associate context with each node of the parse tree, or Abstract Syntax Tree (hence, "astnode"). When I make use of this I define a node class, or classes with members for whatever attributes I wish to associate, and methods for manipulating them. I include pointer to this type as a member of the %union declaration.

Then I define all applicable non-terminals the type <aval>, create the node instances in the first rule that must handle them and set $$ to the returned pointer, something like this...

Code:

some_parameter: newstring other_param { $$ = new astnode(AST_TYPE,$1,$2); }

Subsequent rules which use or extend some_parameter all share a common API without further ado! Each node accumulates it's context attributes as it moves up the parse tree. If the astnode class includes left and right pointers of type *astnode you literally build the abstract syntax tree as a linked list of nodes - very useful and powerful and easy to implement.

I include this not to say that you should do all of it at this time, but to encourage you to make best use of the tools at hand. It is literally no more difficult than all the code you are currently writing, but it is much more useful, robust and simpler to maintain.

Sorry this went so long, but I wanted to give a solid overall view of these ideas in one place. Hope it helps!

des_a · 03-30-2017, 12:05 PM

I'll keep this in mind, if I need to modify it again. But for now it's working okay the way it is, so because that was so hard, I will just leave it alone. But in the future, I'll try to see what I can do to write better bison & flex. Thanks for all your help! Please look for my next thread at which I have what I hope to be a small problem.

des_a · 03-30-2017, 12:21 PM

Here is the URL of the other issue: http://www.linuxquestions.org/questi...90#post5690390