How would I make a multi pass interpreter with flex and bison to pre process labels and forward branches as well as find all the variable declares ?
|
You need to refer to a text book on compiler construction as that topic is fairly large.
But a VERY simple overview is: 1. The scanner+parser validate language input and generate a symbol table + parse tree (or some intermediate language) All loops are left as symbolic in the intermediate code. 2. Optimizing passes can go through the intermediate code to prune code not reached (constant true/false tests always do the same thing,so why generate the alternative, loops that can never be executed are the same...) Register usage optimization, tail recursion elimination, constant expression elimination... Redundant operand elimination -- a*b + a*c is usually slow and replaced by a *(b+a), which eliminates a multiply. Other things that can be done is tail recursion elimination (it really is just iteration, which is faster and reduces stack overhead). 3. During code generation (the last pass usually, but not necessarily) the code generated is assigned addresses - when a label is assigned an address, the back links in the symbol table provide all locations that need the code adjusted with the real address. Note, this may involve additional changes as using things like short relative addresses may change the actual address of the label - requiring additional adjustments to be repeated. It is even possible to enter infinite loops here - where changing an address causes other locations to need long addresses - which again change the location. So some optimization of what and where long/short/relative addressing is to be used becomes important. |
What I was asking is how to do multiple passes with bison flex? Use BEGIN var to turn on and off certain areas of the grammar? Also I am getting 1 shift/reduce warning (error?). It is for :
Quote:
|
Quote:
I don't think the scanner is designed to be invoked recursively, at least not those generated by flex... Neither flex, nor bison are good at two passes... that isn't what they were designed for. Now flex CAN be made to do two file passes (it is a simple scanner) as it does not maintain internal state (I believe seeking to the beginning of the input file will do - but now you also have to track how many times you reach the end of file or you have an infinite loop). But bison DOES maintain internal state. The goal of bison is to either directly interpret the parse tree, or generate an output file... When the scanner reports an end of file to the parser, the parsing is supposed to be completed --- unless the internal state still has elements on the stack... in which case you have a parse error (unexpected end-of-file). Now that said, you can accomplish the equivalent using something like Code:
and EOF2 is the token returned at the second end of file. It also allows you to have to different action rules instead of one. This can be used to do an expected grammar check during pass1, generating errors and such that abort performing the "pass2" parsing, and have pass2 with different action rules. It is awkward, and you HAVE to make sure the grammar specification is identical. It doubles the size of the parser... And all it really does is force double the work, as the action rules of the first part can include those of the second. Quote:
Code:
IF expr THEN IF expr THEN somestatement ELSE somestatement As describe in the documentation for bison: http://www.gnu.org/software/bison/ma...02fReduce.html You fix it as shown in http://www.gnu.org/software/bison/ma...#Non-Operators which forces a particular choice. It can also be solved by having a "if...endif" construct to force the association as well, where the ELSE part is either null or an else clause... the grammar for that would look something like: Code:
ifstatement: |
Thanks jpollard for your expert help. I made it:
Code:
|
How do you handle a forward branch in say a
Code:
WHILE exp stmts WEND Code:
WHILE ( exp ) stmts WEND |
Quote:
|
Quote:
You really need to refer to a textbook for this level of detail. |
But if I am interpreting the code then I have the original in core in order to process it on a second pass. To write a two pass algorithm would it not be best to come up with two different flex/bison routines each with its own unique yyparse() (yyparse1,yyparse2) and call them from a main routine with global symbol table and variable recording routines using different (needed) grammar for each? yyparse1 for first pass for forward branch labels, while wend statements and variable declare statements with yypass2 processing the program with the now known values? A kludge I know but a way to accomplish it.
|
Only if you want to bloat the translator.
Having two parsers in memory takes twice the space... and makes it harder to fix errors. In a single pass, the rule Code:
...: WHILE expr statements WEND { (dynamic) structure with 3 elements - the entry is identified as a while, a pointer to the parse tree for the expression (represented by $2.value), and a pointer to the parse tree for the statements (represented by the $3.value). No labels (forward or backward) required AT THIS TIME. What is this entry? Something like: Code:
struct entry { elments of the language: STATEMENT, IF, WHILE, FUNCTION, ASSIGN, VARIABLE, ADD, SUB, MUL, DIV, EXP, ... These are the junction points of the grammar. Each entry only points to other entries. The end result is that a complete scan of the source code is translated into one or more elements in this tree. An expression like (a + b(c+d)) gets converted into a tree with elements like: Code:
{add, left, null, right} Code:
{while, left, null, right} Code:
{if, left, center, right} Code:
{STATEMENT, ptr to a single statement, null, ptr to another STATEMENT entry} The last STATEMENT entry has a null for "ptr to another STATEMENT entry". So now we have a parse tree that represents the program. First pass done. The SECOND PASS can be either a code generator, or an interpreter. If an interpreter, no labels are needed. The interpreter only does what each element of the tree specifies. The interpreter just walks the tree and invokes functions for each entry type identified for each node in the tree. If the second pass is a code generator, THEN AND ONLY THEN are labels necessary. But it works ALMOST the same as the interpreter. The only difference is that instead of evaluating the statements, it outputs an assembly/machine code. A translator for the IF entry would be something like: Code:
void trans_if(struct entry *left, struct entry *center, struct entry *right) Code:
void trans_while(struct entry *left, struct entry *right) the current node - that is why it is recursive - the code generated for an IF needs code generated for the expression... so the trans_if function just invokes the gencode function to generate code for a particular branch of the tree. The only caution in the "newlabel" use is to be sure to get the value of the generated label before it is needed. Now, using the above as an example creates a three pass compiler. The FIRST PASS does the syntax checking, error handling, and creates a parse tree. The SECOND PASS generats an assembly output... So a THIRD PASS would be the assember that translate the assembly code into machine code... Please note - none of this has been tested. |
Jpollard. Thank you for your most helpful explanation. I greatly appreciate it. Alvin...
|
I am not sure I will be able to trust recursion as my mcu has a return stack only 16 levels deep. Some other way of recursion would have to be found but I will ask the compiler maker what he thinks should be done.
|
Quote:
The sample is a translator to assembler - much smaller, and the resulting code generated doesn't require recursion. All interpreters (with subroutines) need recursion. Now one way to get around the hardware limitation is to define a pcode interpreter. These don't necessarily require recursion, but do need subroutines. Then the pcode interpreter can implement a stack however it wants. The programs that run on the pcode interpreter (also called a virtual machine) can then use recursion with a much less limited stack. The problem you have listed is the limited physical memory. It is possible to get a pcode interpreter there - but the language must aimed at the machine. That was why I tried pointing you to the language fourth - it was designed for minimal overhead, AND simple parsing (it is, a very simple stack machine, and uses Reverse Polish Notation to simplify the parsing - almost to the point of just being a scanner). Plus it has the advantage that source code is already available. It is only the conversion of the interpreter to the target machine that is necessary. Using lex/bison is not what one uses to create compact programs - they aren't. They are designed to make implementing compilers easier. The parsers generated are huge, and require a lot of stack space due to the supporting functions required. |
1)Are you saying that the parser should be run on the PC and the entry structure:
Code:
struct entry { 2)Excuse me for being dense but is the entry structure generated on the fly and how is that done? 3)Can bison be called line by line by an external program and still keep its internal data structure intact? |
Quote:
Quote:
Quote:
Again, that is why I pointed you at a minimalist language requiring only simple parsing/scanning such as Forth (http://en.wikipedia.org/wiki/Forth_%...ng_language%29) |
All times are GMT -5. The time now is 07:15 PM. |