Only if you want to bloat the translator.
Having two parsers in memory takes twice the space... and makes it harder to fix errors.
In a single pass, the rule
Code:
...: WHILE expr statements WEND {
$$.value = generate_while($2.value, $3.value);
};
This simply adds to the parse tree (represented by $$.value), by creating a
(dynamic) structure with 3 elements - the entry is identified as a while, a
pointer to the parse tree for the expression (represented by $2.value), and a
pointer to the parse tree for the statements (represented by the $3.value).
No labels (forward or backward) required AT THIS TIME.
What is this entry? Something like:
Code:
struct entry {
unsigned char entry_type;
struct entry *left;
struct entry *center;
struct entry *right;
}
What are the values for entry_type? A number representing the significant
elments of the language: STATEMENT, IF, WHILE, FUNCTION, ASSIGN, VARIABLE, ADD, SUB,
MUL, DIV, EXP, ...
These are the junction points of the grammar. Each entry only points to other
entries. The end result is that a complete scan of the source code is translated
into one or more elements in this tree.
An expression like (a + b(c+d)) gets converted into a tree with elements like:
Code:
{add, left, null, right}
| |
| {MUL, left, null, right}
| | |
| | {ADD, left, null, right}
| | | |
| | | {VARIABLE, ptr to d,null}
| | VARIABLE, ptr to c, null}
| {VARIABLE, ptr to b, null}
{VARIABLE, pointer to symboltable entry a, null}
A while statement gets something similar:
Code:
{while, left, null, right}
| |
| {points to a list of statements}
a tree for an expression
An if statement looks a little bit different:
Code:
{if, left, center, right}
| | |
| | {pointer to a list of statements}
| {pointer to a list of statements}
{pointer to an expression tree }
So what would a list of statements look like?:
Code:
{STATEMENT, ptr to a single statement, null, ptr to another STATEMENT entry}
A "ptr to a single statement" can be one of ASSIGN, IF, WHILE, or FUNCTION
The last STATEMENT entry has a null for "ptr to another STATEMENT entry".
So now we have a parse tree that represents the program.
First pass done.
The SECOND PASS can be either a code generator, or an interpreter. If an
interpreter, no labels are needed. The interpreter only does what each element
of the tree specifies. The interpreter just walks the tree and invokes
functions for each entry type identified for each node in the tree.
If the second pass is a code generator, THEN AND ONLY THEN are labels necessary.
But it works ALMOST the same as the interpreter. The only difference is that
instead of evaluating the statements, it outputs an assembly/machine code.
A translator for the IF entry would be something like:
Code:
void trans_if(struct entry *left, struct entry *center, struct entry *right)
{
char *label1, *label2;
gencode(left); /* the expression */
label1 = newlabel();
printf(" JZ %s\n",label1);
gencode(center); /* the true part */
if (NULL != right) { /* the else part (if it exists)
label2 = newlabel();
printf(" JMP %s\n",label2);
printf("%s:\n",label1);
gencode(right);
printf("%s:\n",label2);
} else {
printf("%s\n:",label1);
}
}
How about the while? Very similar:
Code:
void trans_while(struct entry *left, struct entry *right)
{
char *label1, *label2;
label1 = newlabel();
printf("%s:\n",label1); /* this is where the while repeats */
gencode(left); /* the expression again */
label2 = newlabel();
printf(" JZ %s\n",label2); /* how to get out of the loop */
gencode(right); /* the list of statements */
printf(" JMP %s\n",label1); /* and this repeats */
printf("%s:\n",label2);
}
All gencode does is walk the tree of statements...invoking the translator for
the current node - that is why it is recursive - the code generated for an IF
needs code generated for the expression... so the trans_if function just
invokes the gencode function to generate code for a particular branch of
the tree.
The only caution in the "newlabel" use is to be sure to get the value of the
generated label before it is needed.
Now, using the above as an example creates a three pass compiler. The FIRST
PASS does the syntax checking, error handling, and creates a parse tree.
The SECOND PASS generats an assembly output...
So a THIRD PASS would be the assember that translate the assembly code into machine code...
Please note - none of this has been tested.