LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-01-2013, 01:32 AM   #16
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled

How would I make a multi pass interpreter with flex and bison to pre process labels and forward branches as well as find all the variable declares ?
 
Old 10-01-2013, 02:27 PM   #17
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
You need to refer to a text book on compiler construction as that topic is fairly large.

But a VERY simple overview is:

1. The scanner+parser validate language input and generate a symbol table + parse tree (or some intermediate language) All loops are left as symbolic in the intermediate code.

2. Optimizing passes can go through the intermediate code to prune code not reached (constant true/false tests always do the same thing,so why generate the alternative, loops that can never be executed are the same...) Register usage optimization, tail recursion elimination, constant expression elimination... Redundant operand elimination -- a*b + a*c is usually slow and replaced by a *(b+a), which eliminates a multiply. Other things that can be done is tail recursion elimination (it really is just iteration, which is faster and reduces stack overhead).

3. During code generation (the last pass usually, but not necessarily) the code generated is assigned addresses - when a label is assigned an address, the back links in the symbol table provide all locations that need the code adjusted with the real address. Note, this may involve additional changes as using things like short relative addresses may change the actual address of the label - requiring additional adjustments to be repeated. It is even possible to enter infinite loops here - where changing an address causes other locations to need long addresses - which again change the location. So some optimization of what and where long/short/relative addressing is to be used becomes important.

Last edited by jpollard; 10-01-2013 at 02:33 PM. Reason: got steps 2 and 3 backwards. So I have reordered them and elimined the original step 3.
 
1 members found this post helpful.
Old 10-05-2013, 12:47 AM   #18
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
What I was asking is how to do multiple passes with bison flex? Use BEGIN var to turn on and off certain areas of the grammar? Also I am getting 1 shift/reduce warning (error?). It is for :
Quote:
statement : IF exp THEN statement
| IF exp THEN statement ELSE statement
;
I enabled %glr-parser but am not sure if this will take care of the problem.

Last edited by schmitta; 10-05-2013 at 12:52 AM.
 
Old 10-05-2013, 07:37 AM   #19
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by schmitta View Post
What I was asking is how to do multiple passes with bison flex? Use BEGIN var to turn on and off certain areas of the grammar?
You don't "turn off" parts of the grammar - that is a capability for preprocessing the input, though it can be done by a scanner (when it recognizes a disabled section it skips input processing until it recognizes a "turn on" again...) but it makes the scanner much harder to write - this is under "conditional compilation", and it usually means that you can't use flex for the scanner, and you have to hand code one. It can be done - but makes the action part of the rule recognizing the "turn off" symbol have to scan for the "turn on" symbol using yet another scanner...
I don't think the scanner is designed to be invoked recursively, at least not those generated by flex...

Neither flex, nor bison are good at two passes... that isn't what they were designed for. Now flex CAN be made to do two file passes (it is a simple scanner) as it does not maintain internal state (I believe seeking to the beginning of the input file will do - but now you also have to track how many times you reach the end of file or you have an infinite loop). But bison DOES maintain internal state.

The goal of bison is to either directly interpret the parse tree, or generate an output file... When the scanner reports an end of file to the parser, the parsing is supposed to be completed --- unless the internal state still has elements on the stack... in which case you have a parse error (unexpected end-of-file).

Now that said, you can accomplish the equivalent using something like

Code:
program:
    pass1 pass2;

pass1: statements EOF1

pass2: statements EOF2
where EOF1 is the token returned at the end of file (the first time)
and EOF2 is the token returned at the second end of file.

It also allows you to have to different action rules instead of one. This can be used to do an expected grammar check during pass1, generating errors and such that abort performing the "pass2" parsing, and have pass2 with different action rules.

It is awkward, and you HAVE to make sure the grammar specification is identical. It doubles the size of the parser... And all it really does is force double the work, as the action rules of the first part can include those of the second.

Quote:
Also I am getting 1 shift/reduce warning (error?). It is for :

I enabled %glr-parser but am not sure if this will take care of the problem.
your sample grammar is ambiguous - it allows a construct like
Code:
IF expr THEN IF expr THEN somestatement ELSE somestatement
The shift reduce error is cause by the fact that bison is a LR(1) parser (with extensions). In the case of your warning it is because the stack has two ways to continue - associating the ELSE with the first if, or the second. It is listed as a warning because bison has been coded to take the shortest match, which would associate the ELSE with the second IF, which is the most frequent meaning.
As describe in the documentation for bison:
http://www.gnu.org/software/bison/ma...02fReduce.html

You fix it as shown in
http://www.gnu.org/software/bison/ma...#Non-Operators

which forces a particular choice.

It can also be solved by having a "if...endif" construct to force the association as well, where the ELSE part is either null or an else clause... the grammar for that would look something like:

Code:
ifstatement:
 IF expr ifblock ENDIF

ifblock: statement elseblock

elseblock:             /* note null else block */
   | ELSE statement
But don't take the above as gospel - it has been a long time since I've done this. The only reason this should work is that the IF is terminated by an ENDIF which marks how the shift/reduce is to be processed.

Last edited by jpollard; 10-05-2013 at 07:51 AM.
 
1 members found this post helpful.
Old 10-05-2013, 10:33 PM   #20
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
Thanks jpollard for your expert help. I made it:
Code:
statement: IF exp THEN stmts ENDIF
         | IF exp THEN stmts ELSE stmts ENDIF
         ;
Would that work? Should I separate the ELSE statement into a separate production to make the generated C code easier to specify? Thanks. Alvin....

Last edited by schmitta; 10-05-2013 at 10:42 PM.
 
Old 10-05-2013, 10:49 PM   #21
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
How do you handle a forward branch in say a
Code:
 WHILE exp stmts WEND
statement without using two pass compiler to find the WEND statement on the first pass, for the branch when exp is not true? Would it be better form to have
Code:
 WHILE ( exp ) stmts WEND
? Thanks. Alvin...

Last edited by schmitta; 10-05-2013 at 10:52 PM.
 
Old 10-06-2013, 10:47 PM   #22
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by schmitta View Post
Thanks jpollard for your expert help. I made it:
Code:
statement: IF exp THEN stmts ENDIF
         | IF exp THEN stmts ELSE stmts ENDIF
         ;
Would that work? Should I separate the ELSE statement into a separate production to make the generated C code easier to specify? Thanks. Alvin....
It should work... but it puts double the work for you on the actions for the rules.
 
1 members found this post helpful.
Old 10-06-2013, 11:07 PM   #23
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by schmitta View Post
How do you handle a forward branch in say a
Code:
 WHILE exp stmts WEND
statement without using two pass compiler to find the WEND statement on the first pass, for the branch when exp is not true? Would it be better form to have
Code:
 WHILE ( exp ) stmts WEND
? Thanks. Alvin...
I keep telling you that the second pass doesn't work on the original source. It works on intermediate code generated during the first pass by the actions associated with the rule.

You really need to refer to a textbook for this level of detail.
 
1 members found this post helpful.
Old 10-07-2013, 10:33 AM   #24
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
But if I am interpreting the code then I have the original in core in order to process it on a second pass. To write a two pass algorithm would it not be best to come up with two different flex/bison routines each with its own unique yyparse() (yyparse1,yyparse2) and call them from a main routine with global symbol table and variable recording routines using different (needed) grammar for each? yyparse1 for first pass for forward branch labels, while wend statements and variable declare statements with yypass2 processing the program with the now known values? A kludge I know but a way to accomplish it.
 
Old 10-07-2013, 10:52 AM   #25
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Only if you want to bloat the translator.

Having two parsers in memory takes twice the space... and makes it harder to fix errors.

In a single pass, the rule
Code:
...: WHILE expr statements WEND  {
	    $$.value = generate_while($2.value, $3.value);
};
This simply adds to the parse tree (represented by $$.value), by creating a
(dynamic) structure with 3 elements - the entry is identified as a while, a
pointer to the parse tree for the expression (represented by $2.value), and a
pointer to the parse tree for the statements (represented by the $3.value).


No labels (forward or backward) required AT THIS TIME.

What is this entry? Something like:

Code:
struct entry {
    unsigned char	entry_type;
    struct entry	*left;
    struct entry	*center;
    struct entry	*right;
}
What are the values for entry_type? A number representing the significant
elments of the language: STATEMENT, IF, WHILE, FUNCTION, ASSIGN, VARIABLE, ADD, SUB,
MUL, DIV, EXP, ...


These are the junction points of the grammar. Each entry only points to other
entries. The end result is that a complete scan of the source code is translated
into one or more elements in this tree.

An expression like (a + b(c+d)) gets converted into a tree with elements like:
Code:
{add, left, null, right}
        |          |
        |          {MUL, left, null, right}
        |                  |          |
        |                  |          {ADD, left, null, right}
        |                  |                |             |
        |                  |                |             {VARIABLE, ptr to d,null}
        |                  |                VARIABLE, ptr to c, null}
        |                  {VARIABLE, ptr to b, null}
        {VARIABLE, pointer to symboltable entry a, null}
A while statement gets something similar:

Code:
{while, left, null, right}
         |            |
         |            {points to a list of statements}
         a tree for an expression
An if statement looks a little bit different:

Code:
{if, left, center, right}
       |      |       |
       |      |       {pointer to a list of statements}
       |      {pointer to a list of statements}
       {pointer to an expression tree }
So what would a list of statements look like?:

Code:
{STATEMENT, ptr to a single statement, null, ptr to another STATEMENT entry}
A "ptr to a single statement" can be one of ASSIGN, IF, WHILE, or FUNCTION
The last STATEMENT entry has a null for "ptr to another STATEMENT entry".

So now we have a parse tree that represents the program.

First pass done.

The SECOND PASS can be either a code generator, or an interpreter. If an
interpreter, no labels are needed. The interpreter only does what each element
of the tree specifies. The interpreter just walks the tree and invokes
functions for each entry type identified for each node in the tree.

If the second pass is a code generator, THEN AND ONLY THEN are labels necessary.
But it works ALMOST the same as the interpreter. The only difference is that
instead of evaluating the statements, it outputs an assembly/machine code.

A translator for the IF entry would be something like:

Code:
void trans_if(struct entry *left, struct entry *center, struct entry *right)
{
    char *label1, *label2;

    gencode(left);                /* the expression */
    label1 = newlabel();
    printf("    JZ   %s\n",label1);
    gencode(center);              /* the true part */
    if (NULL != right) {          /* the else part (if it exists)
        label2 = newlabel();
        printf("    JMP    %s\n",label2);
        printf("%s:\n",label1);
        gencode(right);
        printf("%s:\n",label2);
    } else {
        printf("%s\n:",label1);
    }
}
How about the while? Very similar:

Code:
void trans_while(struct entry *left, struct entry *right)
{
    char *label1, *label2;

    label1 = newlabel();
    printf("%s:\n",label1);	     /* this is where the while repeats */
    gencode(left);                   /* the expression again */
    label2 = newlabel();
    printf("    JZ  %s\n",label2);   /* how to get out of the loop */
    gencode(right);                  /* the list of statements */
    printf("    JMP   %s\n",label1); /* and this repeats */
    printf("%s:\n",label2);
}
All gencode does is walk the tree of statements...invoking the translator for
the current node - that is why it is recursive - the code generated for an IF
needs code generated for the expression... so the trans_if function just
invokes the gencode function to generate code for a particular branch of
the tree.

The only caution in the "newlabel" use is to be sure to get the value of the
generated label before it is needed.

Now, using the above as an example creates a three pass compiler. The FIRST
PASS does the syntax checking, error handling, and creates a parse tree.

The SECOND PASS generats an assembly output...

So a THIRD PASS would be the assember that translate the assembly code into machine code...

Please note - none of this has been tested.

Last edited by jpollard; 10-07-2013 at 01:11 PM. Reason: bad submit while creating.
 
1 members found this post helpful.
Old 10-08-2013, 12:03 AM   #26
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
Jpollard. Thank you for your most helpful explanation. I greatly appreciate it. Alvin...
 
Old 10-08-2013, 12:28 AM   #27
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
I am not sure I will be able to trust recursion as my mcu has a return stack only 16 levels deep. Some other way of recursion would have to be found but I will ask the compiler maker what he thinks should be done.
 
Old 10-08-2013, 07:17 AM   #28
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by schmitta View Post
I am not sure I will be able to trust recursion as my mcu has a return stack only 16 levels deep. Some other way of recursion would have to be found but I will ask the compiler maker what he thinks should be done.
Which is why the code generator doesn't run on it.

The sample is a translator to assembler - much smaller, and the resulting code generated doesn't require recursion.

All interpreters (with subroutines) need recursion. Now one way to get around the hardware limitation is to define a pcode interpreter. These don't necessarily require recursion, but do need subroutines. Then the pcode interpreter can implement a stack however it wants. The programs that run on the pcode interpreter (also called a virtual machine) can then use recursion with a much less limited stack.

The problem you have listed is the limited physical memory. It is possible to get a pcode interpreter there - but the language must aimed at the machine. That was why I tried pointing you to the language fourth - it was designed for minimal overhead, AND simple parsing (it is, a very simple stack machine, and uses Reverse Polish Notation to simplify the parsing - almost to the point of just being a scanner). Plus it has the advantage that source code is already available. It is only the conversion of the interpreter to the target machine that is necessary.

Using lex/bison is not what one uses to create compact programs - they aren't. They are designed to make implementing compilers easier. The parsers generated are huge, and require a lot of stack space due to the supporting functions required.
 
1 members found this post helpful.
Old 10-10-2013, 02:10 AM   #29
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 352

Original Poster
Rep: Reputation: Disabled
1)Are you saying that the parser should be run on the PC and the entry structure:
Code:
struct entry {
    unsigned char	entry_type;
    struct entry	*left;
    struct entry	*center;
    struct entry	*right;
}
downloaded to the mcu to be executed by a c program on the mcu?

2)Excuse me for being dense but is the entry structure generated on the fly and how is that done?

3)Can bison be called line by line by an external program and still keep its internal data structure intact?

Last edited by schmitta; 10-10-2013 at 03:20 AM.
 
Old 10-10-2013, 04:43 AM   #30
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by schmitta View Post
1)Are you saying that the parser should be run on the PC and the entry structure:
Code:
struct entry {
    unsigned char	entry_type;
    struct entry	*left;
    struct entry	*center;
    struct entry	*right;
}
downloaded to the mcu to be executed by a c program on the mcu?
No. The dynamic structure is part of the parser.
Quote:
2)Excuse me for being dense but is the entry structure generated on the fly and how is that done?
The tree is generated dynamically and is composed of nodes allocated during parsing.
Quote:
3)Can bison be called line by line by an external program and still keep its internal data structure intact?
no. It is an LALR(1) parser generator. The code generated interprets an entire grammar. Once it is finished with the grammar the only "state" remaining is any dynamic structure created during parsing. It is NOT a line-by-line translator.

Again, that is why I pointed you at a minimalist language requiring only simple parsing/scanning such as Forth (http://en.wikipedia.org/wiki/Forth_%...ng_language%29)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions nova49 Linux - Newbie 4 07-13-2011 07:05 AM
Regular Expressions Wim Sturkenboom Programming 10 11-19-2009 01:21 AM
regular expressions. stomach Linux - Software 1 02-10-2006 06:41 AM
Regular Expressions overbored Linux - Software 3 06-24-2004 02:34 PM
help with REGULAR EXPRESSIONS ner Linux - General 23 10-31-2003 11:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration