Lex/YACC Question

larryherman · 03-19-2010, 03:57 PM

The compiler doesn't know what "Node" is at line 13, because it isn't defined until later, at line 27. Since you have two structures that use each others' types, you need a partial declaration. Plus you have some other mistakes, like duplicate definitions of the typedef name NodeLitIntValues at lines 20 and 25.

May I suggest defining your types this way:

Code:

struct node;

typedef struct node_op_values {
	int type;
	int numOps;
	struct node** ops;
} NodeOpValues;

typedef struct {
	int value;
} NodeLitIntValues;

typedef struct {
	char id;
} NodeVarValues;

typedef struct node {
	int type;
	union {
		NodeOpValues opValues;
		NodeLitIntValues litIntValues;
		NodeVarValues varValues;
	};
} Node;

(Of course you don't need a structure if you have only one field inside it, but it's not incorrect, only superfluous.)

Even given the corrected types, you have some other problems, like for example parseNode() and freeNode() have no return value, yet you're using calls to them in your "statements" rule as if they have a return value. But you can work on those once the type definitions are right.

MTK358 · 03-19-2010, 04:26 PM

Code:

$ gcc *.c -o lang
lang.y: In function ‘yyparse’:
lang.y:52: warning: passing argument 1 of ‘parseNode’ makes pointer from integer without a cast
lang.y:38: note: expected ‘struct Node *’ but argument is of type ‘YYSTYPE’
lang.y:52: warning: passing argument 1 of ‘freeNode’ makes pointer from integer without a cast
lang.y:39: note: expected ‘struct Node *’ but argument is of type ‘YYSTYPE’
lang.y:60: warning: assignment makes integer from pointer without a cast
lang.y:61: warning: assignment makes integer from pointer without a cast
lang.y:62: warning: assignment makes integer from pointer without a cast
lang.y:63: warning: assignment makes integer from pointer without a cast
lang.y:64: warning: assignment makes integer from pointer without a cast
lang.y:65: warning: assignment makes integer from pointer without a cast
lang.y:66: warning: assignment makes integer from pointer without a cast
lang.y:67: warning: assignment makes integer from pointer without a cast
lang.y: In function ‘mkVarNode’:
lang.y:88: error: ‘value’ undeclared (first use in this function)
lang.y:88: error: (Each undeclared identifier is reported only once
lang.y:88: error: for each function it appears in.)
lang.y: In function ‘mkOpNode’:
lang.y:100: error: redeclaration of ‘i’ with no linkage
lang.y:99: note: previous declaration of ‘i’ was here
lang.y:100: error: ‘for’ loop initial declarations are only allowed in C99 mode
lang.y:100: note: use option -std=c99 or -std=gnu99 to compile your code
lang.y: In function ‘parseNode’:
lang.y:112: error: ‘vars’ undeclared (first use in this function)
lang.y: In function ‘freeNode’:
lang.y:134: error: ‘opNodeType’ undeclared (first use in this function)
lang.y:135: error: ‘i’ undeclared (first use in this function)
lang.y:139: error: expected declaration or statement at end of input

krishnan · 03-20-2010, 12:15 PM

for constructing a calculator , use following lex code:
%{
#include <stdlib.h>
#include <stdio.h>
#include "y.tab.h"
void yyerror(char*);
int varindex(char *var);
%}

%%

[ \t] ; /* skip whitespace */

([0-9]+) { // to identify integer values
yylval.dval = atoi(yytext);
return INT;}

(([0-9]+(\.[0-9]*)?)|([0-9]*\.[0-9]+)) { // to identify float values
yylval.dval = atof(yytext);
return FLOAT;}

[-+()=/%*\n] { return *yytext; } // to identify symbols

[a-z][a-z0-9]* { // to identify variables
yylval.ivar = varindex(yytext);
return VARIABLE;}

. {char msg[25]; // for errors
sprintf(msg,"%s <%s>","invalid character",yytext);
yyerror(msg);
}

MTK358 · 03-20-2010, 01:23 PM

How does that help? I already have a fully functioning lexer with even more features.

The problem is that the parser will not compile.

Sergei Steshenko · 03-20-2010, 01:27 PM

Quote:

Originally Posted by MTK358

...
The problem is that the parser will not compile.

Now rethink again usage of lex/yacc. With other approaches this problem doesn't exist.

MTK358 · 03-20-2010, 02:47 PM

Will those approaches be as fast, or almost as fast, as Lex/YACC in C, and do they work in a similar way?

larryherman · 03-20-2010, 03:39 PM

Maybe it would help to know why you've been trying to use lex and yacc. I know you're trying to write a calculator program, but is the purpose just to learn about parsing, or to learn about lex and yacc, or do you have some specific application you need to work for some reason?

Unlike Sergei I don't necessarily think that using lex/yacc is a bad idea, even if it may be overkill for what you're trying to do; I often use languages or tools that are not necessary for what I'm trying to do, just for the purpose of trying to learn (or improve my knowledge of) something new. But that may not apply to what you're trying to do, so perhaps knowing more about what you're trying to accomplish by using lex and yacc can help others point you in the best direction.

Sergei Steshenko · 03-20-2010, 03:47 PM

Quote:

Originally Posted by MTK358

Will those approaches be as fast, or almost as fast, as Lex/YACC in C, and do they work in a similar way?

Fast in what sense ? The time you spend or the speed of the final product ?

What is "similar" in "do they work in a similar way?" ?

Anyway, you will spend less time because it will be easier to debug your mistakes, and the final code will be practically as fast as generated by lex/yacc if you know what you are doing.

If you look from a wider and more practical point of view, the compiler speed is not that important nowadays - real life projects consist of many files which can be compiled in parallel, and HW is cheap.

And the compiler spends most of its time in optimizations - they are not related to frontend (lex + yacc comprise frontend).

MTK358 · 03-20-2010, 03:58 PM

@larryherman

This is mostly to learn how parsing, interpreters, and compilers work.

@Sergei Steshenko

Fast is in the sense that the compiler/interpreter using your alternative will not be significantly slower than one using Lex/YACC.

I mean "similar" in the sense that it uses a lexer to create tokens, and a LALR parser to process the tokens.

I was considering that using dynamic arrays might be a better idea than the complicated blend of structs and unions used now. The first element of the array would contain the type of node, and the rest of the elements would be interpreted accordingly.

Also, how could I handle a language that has multiple types, such as integers, floats, strings, and maybe even pointers?

larryherman · 03-20-2010, 04:02 PM

Quote:

Originally Posted by MTK358

This is mostly to learn how parsing, interpreters, and compilers work.

My personal opinion is that lex/yacc are not the wrong tools to use in this case, although my advice would strongly be to first read a compilers text. Maybe you already have, but from some of your earlier comments it seemed as if not.

Sergei Steshenko · 03-20-2010, 04:43 PM

Quote:

Originally Posted by MTK358

...
I mean "similar" in the sense that it uses a lexer to create tokens, and a LALR parser to process the tokens.

I was considering that using dynamic arrays might be a better idea than the complicated blend of structs and unions used now. The first element of the array would contain the type of node, and the rest of the elements would be interpreted accordingly.

Also, how could I handle a language that has multiple types, such as integers, floats, strings, and maybe even pointers?

As I said, one doesn't need tokens. Consider input stream as the one of legal (and nested) language constructs. Consider parsing as trying in each parser state to find a language construct which fits the list of allowed ones in that state.

Yes, a dynamic language like Perl is more convenient, though don't expect too much speed from Perl. OTOH, you do not yet need speed. By the way, Perl parses pretty fast due to good RE engine, but multi-line parsing is somewhat of a nuisance in Perl.

Multiple types fit the various language constructs concept.

MTK358 · 03-20-2010, 05:05 PM

Quote:

Originally Posted by Sergei Steshenko

As I said, one doesn't need tokens. Consider input stream as the one of legal (and nested) language constructs. Consider parsing as trying in each parser state to find a language construct which fits the list of allowed ones in that state.

I still don't understand how it will work without a lexer, but...

Quote:

Originally Posted by Sergei Steshenko

Yes, a dynamic language like Perl is more convenient, though don't expect too much speed from Perl. OTOH, you do not yet need speed. By the way, Perl parses pretty fast due to good RE engine, but multi-line parsing is somewhat of a nuisance in Perl.

The multi-line thing isn't of much concern to me, because my toy language will probably be line-oriented, something like this:

Code:

while a != b
    if a > b
        a -= b
    else
        b -= a
    fi
loop

Quote:

Originally Posted by Sergei Steshenko

Multiple types fit the various language constructs concept.

I don't get it.

Sergei Steshenko · 03-20-2010, 05:27 PM

Quote:

Originally Posted by MTK358

I still don't understand how it will work without a lexer, but...
...

Again, consider that a language construct can be as simple as a single character and as complex as the whole program. The whole program is a nested language construct consisting of, say, declarations and operators.

Declarations are nested language constructs consisting of, say, type keyword and variables list, e.g.

Code:

int a, b, c;

Variables list is a nested construct consisting of single variable optionally followed by list separator and variables list; this left recursion can be easily implemented through iteration.

List separator is a (possibly) nested languages construct consisting of comma (comma is a good example of single character language construct) optionally surrounded by whitespaces and/or comments.

Nowhere in my explanations I needed the notion of token. Nowhere in my explanation I assumed a context-free grammar.

MTK358 · 03-20-2010, 07:07 PM

Show me how to parse this without using tokens:

Code:

while a != b
    if a > b
        a -= b
    else
        b -= a
    fi
loop

Sergei Steshenko · 03-20-2010, 07:20 PM

Quote:

Originally Posted by MTK358

Show me how to parse this without using tokens:

Code:

while a != b
    if a > b
        a -= b
    else
        b -= a
    fi
loop

First describe me in plain English (i.e. using the notion of language constructs) your language.