Lex/YACC Question

larryherman · 03-17-2010, 12:06 AM

Quote:

Originally Posted by MTK358

The problem is that I hardly understand anything in Calculator -> Description!

I'm not sure what your background is, but I first used lex and yacc after having taken a compilers course, so I was familiar with the background concepts, which may be difficult if you haven't seen them before. Depending upon your experience, you might start with a good compilers text and read about scanning and bottom-up parsing before really trying to use lex and yacc, which, after all, are tools to do those things.

My compilers text is rather old, so maybe someone else can recommend a more recent one if you're inclined to do some reading first (although actually I doubt the concepts have changed at all).

MTK358 · 03-17-2010, 03:57 PM

Quote:

Originally Posted by larryherman

I'm not sure what your background is

I basically know all I know about computers and programming from the internet!

Anyway, I was considering using a tree node structure that contains a name, value, and child nodes to represent the syntax tree.

A function that would print the tree recursively might be something like this, where each node structure stores it's name, value, and a NULL-terminated array of child nodes:

Code:

int level = 0;

void print_tree(struct node* root) {
	int i;
	for(i=0; i<level; i++) {
		printf("    ");
	}
	//TODO: print node's name and value
	level++;
	for(i=0; tree->children[i] != NULL; i++) {
		print_tree(tree->children[i]);
	}
	level--;
}

Sergei Steshenko · 03-17-2010, 04:02 PM

Quote:

Originally Posted by MTK358

I basically know all I know about computers and programming from the internet!

Anyway, I was considering using a tree node structure that contains a name, value, and child nodes to represent the syntax tree.

A function that would print the tree recursively might be something like this, where each node structure stores it's name, value, and a NULL-terminated array of child nodes:

Code:

int level = 0;

void print_tree(struct node* root) {
	int i;
	for(i=0; i<level; i++) {
		putchar(' ');
	}
	//TODO: print node's name and value
	level++;
	for(i=0; tree->children[i] != NULL; i++) {
		print_tree(tree->children[i]);
	}
	level--;
}

If it is not a homework/paid job strictly requiring lex/yacc better do it in Perl (or similar language). In Perl AST (Abstract Syntax Tree) is just a hierarchical data structure, and to dump it you cab easily use Data:

umper module.

You would also be able to dump AST into a file and consume it in another script. I've been successfully using this approach for years in/for serious EDA related stuff.

MTK358 · 03-18-2010, 11:58 AM

I am slowly getting to understand the Calculator example, and I finally got it to compile!

Code:

$ cat euclidian-algorithm.txt 
a = 12;
b = 8;
while(a != b) {
        if(a > b) {
                a = a - b;
        } else {
                b = b - a;
        }
}
$ ./calc3g < euclidian-algorithm.txt 


Graph 0:

     [=]
      |
   |-----|
   |     |
 id(A) c(12)


Graph 1:

    [=]
     |
   |----|
   |    |
 id(B) c(8)


Graph 2:

                            while
                              |
      |-----------------------------|
      |                             |
    [!=]                           if
      |                             |
   |-----|        |--------------|-----------------|
   |     |        |              |                 |
 id(A) id(B)     [>]            [=]               [=]
                  |              |                 |
               |-----|     |--------|        |--------|
               |     |     |        |        |        |
             id(A) id(B) id(A)     [-]     id(B)     [-]
                                    |                 |
                                 |-----|           |-----|
                                 |     |           |     |
                               id(A) id(B)       id(B) id(A)

The grapher really explains a lot about how this all works.

MTK358 · 03-18-2010, 04:03 PM

I'm slowly making some progress — I already wrote most of the Lex and YACC files for an interpreter that puts together a syntax tree structure and parses it.

I'm pretty sure I omitted a lot of important stuff, could you give me any suggestions?

theNbomr · 03-18-2010, 04:49 PM

Looks pretty good, considering that a short while ago you said:

Quote:

The problem is that I hardly understand anything in Calculator -> Description!

Right off the bat I see the definition for an integer is a little weak. Your regex matches any single digit. A more complete regex might look more like

Code:

[+-]?[0-9]+

Hard to see errors just by inspection. I assume you are using the more complex form of building up the tree in anticipation of adding conditionals/branching to your grammar.
Well done.
--- rod.

MTK358 · 03-18-2010, 07:24 PM

Quote:

Originally Posted by theNbomr

Right off the bat I see the definition for an integer is a little weak. Your regex matches any single digit. A more complete regex might look more like

Code:

[+-]?[0-9]+

But doesn't Lex match the token whose regex matches the most characters? If so, then this:

4-3

will be interpreted as:

4
-3

instead of

4
-
3

, which is invalid. I thought that it would work right if you would implement negation as part of the grammar. That would also let you negate variables, and expressions in parentheses, not just integer literals.

Quote:

Originally Posted by theNbomr

Hard to see errors just by inspection. I assume you are using the more complex form of building up the tree in anticipation of adding conditionals/branching to your grammar.
Well done.

Yes, the reason I am doing it this way is that I want to add conditionals and looping.

MTK358 · 03-18-2010, 07:44 PM

OK, I tried compiling my code and it has all these errors, I am totally puzzled because I hardly understand the code I wrote in the first place

At least lex and yacc passed without errors first time!

Code:

$ gcc *.c -o lang
lang.l:2:19: error: y.tab.h: No such file or directory
lang.l: In function ‘yylex’:
lang.l:8: error: ‘INTEGER’ undeclared (first use in this function)
lang.l:8: error: (Each undeclared identifier is reported only once
lang.l:8: error: for each function it appears in.)
lang.l:9: error: ‘ASSIGN’ undeclared (first use in this function)
lang.l:10: error: ‘ADD’ undeclared (first use in this function)
lang.l:11: error: ‘SUBTRACT’ undeclared (first use in this function)
lang.l:12: error: ‘MULTIPLY’ undeclared (first use in this function)
lang.l:13: error: ‘DIVIDE’ undeclared (first use in this function)
lang.l:14: error: ‘OPAREN’ undeclared (first use in this function)
lang.l:15: error: ‘CPAREN’ undeclared (first use in this function)
lang.l:16: error: ‘SEPARATOR’ undeclared (first use in this function)
lang.y:12: error: expected specifier-qualifier-list before ‘Node’
lang.y:24: error: redefinition of typedef ‘NodeLitIntValues’
lang.y:19: note: previous declaration of ‘NodeLitIntValues’ was here
lang.y:31: error: expected specifier-qualifier-list before ‘NodeVarValues’
lang.y:35: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkLitIntNode’
lang.y:36: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkVarNode’
lang.y:37: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkOpNode’
lang.y:38: error: expected ‘)’ before ‘n’
lang.y:39: error: expected ‘)’ before ‘n’
lang.y:76: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkLitIntNode’
lang.y:83: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkVarNode’
lang.y:90: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkOpNode’
lang.y:105: error: expected ‘)’ before ‘*’ token
lang.y:129: error: expected ‘)’ before ‘*’ token

theNbomr · 03-18-2010, 08:49 PM

Quote:

Originally Posted by MTK358

, which is invalid. I thought that it would work right if you would implement negation as part of the grammar. That would also let you negate variables, and expressions in parentheses, not just integer literals.

Yes, good point. Associativity and precedence in the grammar are the key to that.

Code:

lang.l:2:19: error: y.tab.h: No such file or directory

They y.tab.h header file declares &/or defines all of the symbols missing from your compilation. It is generated by yacc from the grammar.
---- rod.

larryherman · 03-18-2010, 10:29 PM

Section 5.4 of the Bison manual that I mentioned earlier illustrates how to handle the unary minus:
http://www.gnu.org/software/bison/ma...ual-Precedence. Give it a try.

MTK358 · 03-19-2010, 08:05 AM

@theNbomr

For some reason I run yacc, then lex, and it doesn't create a "y.tab.h" file!

@larryherman

I implemented that in my code now.

larryherman · 03-19-2010, 12:28 PM

Quote:

Originally Posted by MTK358

For some reason I run yacc, then lex, and it doesn't create a "y.tab.h" file!

bison's -d switch cases it to write an output file with definitions of the token types used in the grammar, etc., but by default it's not named y.tab.h (the name yacc uses). Instead it's named name.h, where the parser output file is name.c. If you want it named y.tab.h you can use bison's -y option, but otherwise just substitute the name of the file that bison actually generates where theNbomr had written y.tab.h.

(If you're actually using yacc rather than bison just ignore the above, but if you're using Linux then yacc may just be a one-line shell script exec'ing bison.)

Hope this helps.

MTK358 · 03-19-2010, 12:52 PM

Code:

$ cat /usr/bin/yacc
#! /bin/sh
exec '/usr/bin/bison' -y "$@"
$ ls
lang.l  lang.y
$ yacc lang.y
conflicts: 4 shift/reduce
$ ls
lang.l  lang.y  y.tab.c

larryherman · 03-19-2010, 03:29 PM

Can't say why the -y option to bison (in that script) doesn't seem to have the advertised effect, but the -d option to bison works for me, generating the file file.tab.h (from file.y).

MTK358 · 03-19-2010, 03:43 PM

bison -d worked.

I corrected some other minor errors, but what is this:

Code:

$ gcc *.c -o lang
lang.y:13: error: expected specifier-qualifier-list before ‘Node’
lang.y:25: error: redefinition of typedef ‘NodeLitIntValues’
lang.y:20: note: previous declaration of ‘NodeLitIntValues’ was here
lang.y:32: error: expected specifier-qualifier-list before ‘NodeVarValues’
lang.y:36: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkLitIntNode’
lang.y:37: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkVarNode’
lang.y:38: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkOpNode’
lang.y:39: error: expected ‘)’ before ‘n’
lang.y:40: error: expected ‘)’ before ‘n’
lang.y:79: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkLitIntNode’
lang.y:86: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkVarNode’
lang.y:93: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘mkOpNode’
lang.y:108: error: expected ‘)’ before ‘*’ token
lang.y:134: error: expected ‘)’ before ‘*’ token