Parser Test 3 (A Return Stack & Compiler)
Tags computer mad science, debugger, mc2, parsing, simple c
Parser Test 3 (A Return Stack & Compiler)
A self extractor for the 'lib' files that we include for this (pretty cute) little calculator are here.
http://www.linuxquestions.org/questi...wsally-615861/
Re: the compiler part. It's not much of a compiler, but it serves as a proof of concept. (It's used internally, the program is not compiled.)
Before we dig in here, I should say...
Bug in Parser Test 2.
The function: 'store_result()' should be done after final unnesting due to the delayed execution of some of the tokens which get picked up at RPAREN.
That won't be fixed in test2 because it's just a toy. But test3 is a usable calculator though it needs a roll-back buffer and line editor (too big to post here).
Here's test3.
All we changed here is what the parser 'does'. It DOES compile calls into a dictionary area though how it chooses to execute those calls (as returns pushed on a stack) may not seem much like compiling. :-) But it is. It works.
Type 'help' when you get it running.
Turn 'status on' if you want to see the VM state after each computation. It shows the stacks which should always be empty when it shows, and it shows the function calls that were compiled.
file: src/test3.cpp
purpose: source file
This file provides the prototypes, jump vectors, priorities, and the names for all the function calls.
file: src/test3-tokens.dat
purpose: source file
This concludes the parser tests for now. The bottom line here is that each parser element, regardles of what it *does* must set the obj.state according to whether the pattern matched and if it's false, the input and output pointers must be restored to where they were before that element was entered.
Compile this thing and then...
Try this as well as some of the really narly stuff we did with test2.
I'll mark the lines I typed in.
Cute, no? :-)
-The Computer Mad Science Team
;-)
A self extractor for the 'lib' files that we include for this (pretty cute) little calculator are here.
http://www.linuxquestions.org/questi...wsally-615861/
Re: the compiler part. It's not much of a compiler, but it serves as a proof of concept. (It's used internally, the program is not compiled.)
Before we dig in here, I should say...
Bug in Parser Test 2.
The function: 'store_result()' should be done after final unnesting due to the delayed execution of some of the tokens which get picked up at RPAREN.
Code:
...
(destination_var() OPT) // optional VAR = ...
&& expr() // parse an expression
// && store_result() // store result in VAR *** moved down -rs
...
RPAREN();
store_result(); // *** moved from above -rs
return is_eoi(); // made it to the end
}
Here's test3.
All we changed here is what the parser 'does'. It DOES compile calls into a dictionary area though how it chooses to execute those calls (as returns pushed on a stack) may not seem much like compiling. :-) But it is. It works.
Type 'help' when you get it running.
Turn 'status on' if you want to see the VM state after each computation. It shows the stacks which should always be empty when it shows, and it shows the function calls that were compiled.
file: src/test3.cpp
purpose: source file
Code:
// main.cpp
/* The basic idea of a parser is to identify text and convert it to another form
* which may be text (a C program, for example) or byte codes or anything else.
*
* In this scheme the input is INVIOLATE! This is unlike flex which temporarily
* pokes holes in the input in order to terminate yytext. Instead we use the
* ability of the output buffer to 'undo' changes and push and pop temporary
* strings that automatically get recycled, which allows us great flexibility in
* manipulating the output text using C calls such as 'sprintf()'.
*
* This demo is best viewed in a debugger like kdbg or some other insight-like
* gdb front end. The 'watchpoint' you'll find most interesting is named 'obj'.
*
* In fond memory of the "readability of code" criterion,
*
* The Computer Mad Science Team
*
* :-)
*/
#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include "inc/parser2.cpp"
#include "inc/parser2-numbers.cpp"
#include <math.h>
void dbg(){} // for a non-moving breakpoint
bool parse();
#define obj (*g_parser_obj)
// fwd refs
extern int dptr; // double stack ptr
double get_var(const char* name); // returns value
bool compiler_status;
const char* token_name(xt_type xt);
int main(int argc, char** argv)
{
printf(
"Input some math expressions and we'll try to evaluate them.\n"
"Type 'quit' to end, 'help' for a brief help listing\n"
);
dbg();
char buffer[256];
// We don't need the pointer here because we're only using one parser, but
// for form...
PARSER_OBJ* o = parser_new(buffer, 256, false);
o = o; // UNUSED(o);
while(1)
{
// we have set the buffer in the parser, but we need to
// load it again in order to set the end of input pointer
// correctly for the input string.
fgets((char*)buffer, 255, stdin);
parser_reload(buffer, strlen(buffer), false);
if(parse())
{
if(compiler_status)
{
printf("DStack: %d\t RStack: %d\t PStack: %d\n", dptr, obj.rp, obj.sp);
printf("Dictionary: %d\n", obj.dp);
printf("COMPILED: ");
for(int i = 0; i < obj.dp; i++)
{
printf("\t%s", token_name(obj.dict[i]));
}
printf("\n");
}
printf("ANS: %lg\n", get_var("ans"));
}
else
{
printf("Parse error:\n");
// assume short offset here
printf("%s", obj.inbuf);
for(int i = 1; i < ip2offset(); i++)
printf("-");
printf("^\n");
}
}
return 0;
}
/////////////////////////////////////////////////////////////////////
// CODE declarations for XT's (executable tokens)
// Priorities set by position in the table.
// declarations
#define TOKEN(a) void a();
#include "test3-tokens.dat"
#undef TOKEN
// This table contains tokens by priority.
xt_type tokens[] =
{
#define TOKEN(a) a,
#include "test3-tokens.dat"
#undef TOKEN
0
};
const char* token_names[] =
{
#define TOKEN(a) #a,
#include "test3-tokens.dat"
#undef TOKEN
0
};
const char* token_name(xt_type xt)
{
int index = -1;
for(int i = 0; tokens[i] != 0; i++)
{
if(tokens[i] == xt)
{
index = i;
break;
}
}
if(index < 0)
return "unknown";
// else
return token_names[index];
}
// -1 if not found
int tok_priority(xt_type tok)
{
for(int i = 0; tokens[i] != 0; i++)
{
if(tokens[i] == tok)
return i;
}
return -1; // not found
}
/////////////////////////////////////////////////////////////////////
// COMPILER stuff, stacks and CODE definitions.
typedef unsigned char byte;
#define DSTACK_SIZE 128
#define SSTACK_SIZE 128
// screwball C stacks grow up instead of down and have integer pointers
double dstack[DSTACK_SIZE]; // numeric values stack
int dptr; // stack pointer var
void dstack_init() { dptr = 0;}
void dpush(double d) { dstack[dptr++] = d;}
double dpop() { return dstack[--dptr];}
// pstack and rstack are internal
void pstack_init() { obj.sp = 0;}
void rstack_init() { obj.rp = 0; }
void rpush(xt_type xt) {obj.rstack[obj.rp++] = xt;}
xt_type rpop() { return obj.rstack[--obj.rp];}
/////////////////////////////////////////////////////////////////////
// COMPILER stuff, variable names and values
typedef struct
{
const char* name;
double value;
}var_type;
var_type vars[32];
int max_vars = 32;
#define STREQ(a, b) (strcasecmp(a,b) == 0)
var_type* var_new(const char* name)
{
// if name was found, return it, else first open cell.
for(int i = 0; i < max_vars; i++)
{
if(!vars[i].name)
{
var_type* p = vars + i;
p->name = strdup(name);
p->value = 0.0;
return p;
}
if(STREQ(name, vars[i].name))
return vars+i;
}
// error, no more vars
fprintf(stderr, "Out of variable memory!\n");
exit(1);
}
// returns null if not found
var_type* find_var(const char* name)
{
for(int i = 0; i < max_vars; i++)
{
if(!vars[i].name)
return 0; // assume packed consecutive
if(STREQ(name, vars[i].name))
return vars + i;
}
return 0;
}
var_type* create_var(const char* name, double value)
{
var_type* p = var_new(name);
p->value = value;
return p;
}
void set_var(const char* name, double value)
{
var_type *p = find_var(name);
if(p == 0)
create_var(name, value);
else
p->value = value;
}
double get_var(const char* name)
{
var_type *p = find_var(name);
if(!p)
return 0;
else
return p->value;
}
/////////////////////////////////////////////////////////////////////
// COMPILER stuff, executable tokens (jump vectors)
#define TOKEN void
TOKEN NOOP() {}
TOKEN RPAREN()
{
xt_type tok;
while(1)
{
tok = rpop();
if(!tok)
break;
tok(); // execute
}
}
TOKEN PLUS()
{
double b = dpop();
double a = dpop();
dpush(a+b);
}
TOKEN MINUS()
{
double b = dpop();
double a = dpop();
dpush(a-b);
}
TOKEN TIMES()
{
double b = dpop();
double a = dpop();
dpush(a*b);
}
TOKEN DIVIDE()
{
double b = dpop();
double a = dpop();
dpush(a/b);
}
TOKEN POWER()
{
double b = dpop();
double a = dpop();
dpush(pow(a, b));
}
TOKEN TIMESVAR() // See times_var() below
{
double a = dpop();
const char* name = pop_str();
var_type * p = find_var(name);
if(p == 0)
p = create_var(name, 0);
double b = p->value;
dpush(a*b);
}
TOKEN LPAREN()
{
rpush(0); // terminator
rpush(NOOP); // dummy to catch first call back
}
#undef TOKEN
/////////////////////////////////////////////////////////////////////
// protos for parser elements used in evaluating expressions
bool expr();
bool power();
bool times();
bool divide();
bool plus();
bool minus();
bool builtin();
bool lparen();
bool rparen();
////////////////////////////////////////////////////////////
// protos of custom mid-level parsers for this application:
// this is a custom function which skips whitespaces in this
// case, but in a more full-featured app should also skip
// comments and newlines.
bool skip_ignored();
// what to do for...
// an integer part of a number
bool integer();
// a fraction part of a number
bool frac();
// a decimal number consisting of integer and/or frac parts
bool number();
//the e or E if a number has an exponent
bool exp_prefix();
// the sign, possibly multiply occurring
bool sign();
// the whole exponent part of a number
bool exponent();
// a floating point number
bool float_num();
// a named value
bool var();
// an operator
bool oper();
// a legal variable name
bool varname();
// an assignment prefix
bool destination_var();
// store result
bool store_result();
static int expr_nestlvl;
//////////////////////////////////////////////////////////////
// Here's the main loop for the parser(s).
static char* op_watch; // needed because SAVE_PTRS() now uses relative addr
bool parse()
{
obj.dp = 0; // restart compiler
LPAREN();
expr_nestlvl = 0; // init expression parser recursion counter
while(1)
{
op_watch = obj.op; // for debugger
// skip to the next text or the end of the input buffer
skip_ignored();
if(is_eoi())
break; // true
// parse and/or evaluate the expression
obj.state =
(
// check reserved keywords first so they aren't seen as a var names
builtin()
||
(
(destination_var() OPT) // optional VAR = ...
&& expr() // parse an expression
// && store_result() // store result in VAR moved down -rs
)
);
if(!obj.state)
break; // failed
}
RPAREN();
store_result(); // moved from above -rs
return is_eoi(); // made it to the end
}
bool integer() { return copy_integer(); }
bool frac() { return copy_frac(); }
bool number()
{
// INTEGER [FRAC] | FRAC
obj.state =
(
integer() && (frac() OPT)
)
|| frac()
;
return obj.state;
}
bool exp_prefix()
{
// e|E
return obj.state = copy("e") || copy("E");
}
bool exponent()
{
// E [SIGN] NUMBER
obj.state =
exp_prefix() && (sign() OPT) && number();
return obj.state;
}
bool parse_float_num()
{
return obj.state =
( // [SIGN]
sign() OPT
)
&&
(
( // INTEGER [FRAC] [EXP]
copy_integer() // req'd
&&
(
copy_frac() OPT
)
&&
(
exponent() OPT
)
) // OR FRAC [EXP]
||
(
copy_frac()
&& (exponent() OPT)
)
// OR EXP
|| exponent()
);
}
bool float_num()
{
SAVE_PTRS();
skip_ignored();
int mark = op2offset();
parse_float_num();
if(!obj.state)
RESTORE_PTRS();
else
{
push_str(mark);
double res = 0;
sscanf(pop_str(), "%lg", &res);
dpush(res);
}
return obj.state;
}
#define MIN(a, b) (a < b ? a : b)
#define current_xt (obj.dict[obj.dp-1])
void do_current()
{
int prioa = tok_priority(obj.rstack[obj.rp-1]);
int priob = tok_priority(current_xt);
if(prioa >= priob)
{
// rpush rswap rpop execute
xt_type exec = rpop();
rpush(current_xt);
exec(); // execute
}
else
// delay execution
rpush(current_xt);
}
bool oper()
{
skip_ignored();
SAVE_PTRS();
obj.state =
(
(skip("+") && compile(PLUS)) ||
(skip("-") && compile(MINUS)) ||
(skip("*") && compile(TIMES)) ||
(skip("/") && compile(DIVIDE)) ||
(skip("^") && compile(POWER))
)
;
// AND do_current xt, either push and wait for results or
// push current and execute previous
if(obj.state)
do_current();
return obj.state;
}
bool lparen()
{
skip_ignored();
skip("(");
if(obj.state)
{
compile(LPAREN);
do_current();
}
return obj.state;
}
bool rparen()
{
skip_ignored();
skip(")");
if(obj.state)
RPAREN();
return obj.state;
}
// the var name always follows so this could really execute
// TIMESVAR functions right here.
bool times_var()
{
SAVE_PTRS();
int mark = op2offset();
copy_cname();
if(obj.state)
{
push_str(mark);
compile(TIMESVAR);
do_current();
}
else
RESTORE_PTRS();
return obj.state;
}
bool expr()
{
// this is just for watching recursion in a debugger
expr_nestlvl++;
// ( (FLOAT_NUM [*VAR] | VAR) [OPERATOR EXPR] ) <end>
skip_ignored(); // skip to meaningful text
SAVE_PTRS(); // so we can undo "*" insertion, etc.
obj.state =
( // VAR | (FLOAT_NUM [*VAR])
(
var()
|| ( float_num() && (times_var() OPT))
||
( // | {EXPR} [OPERATOR EXPR]
lparen()
&& expr()
&& rparen()
)
)
&&
(
( oper() && expr() ) OPT
)
);
// The expression must be complete one or it's an error.
// This prevents things like two var names in a row
// without an intermediate operator.
expr_nestlvl--;
if(expr_nestlvl == 0)
obj.state = skip_ignored() && is_eoi();
return obj.state;
}
bool parse_word(char* buf256)
{
bool ret;
skip_ignored();
int mark = op2offset();
ret = copy_cname();
push_str(mark);
strcpy(buf256, pop_str());
return ret;
}
bool builtin() // needs to push a value for 'store_result'
{
SAVE_PTRS();
char name[256];
char param[256];
int mark = op2offset();
obj.state = copy_cname();
if(!obj.state)
{
RESTORE_PTRS(); // redundant but harmless
return obj.state = false; // ''
}
get_opstr(name, 245, mark);
// find built-in and execute here...
if((obj.state = STREQ(name, "quit")))
{
printf("Later gator! :-)\n");
exit(0);
}
else if ((obj.state = STREQ(name, "status")))
{
// ON | OFF
parse_word(param);
if(STREQ(param, "on"))
compiler_status = true;
else if(STREQ(param, "off"))
compiler_status = false;
else
printf("STATUS IS [%s]\n", compiler_status ? "ON" : "OFF");
obj.state = true; // opt
}
else if((obj.state = STREQ(name, "help")))
{
printf("HELP | STATUS [ON|OFF] | QUIT\n");
}
if(obj.state)
dpush(0);
else
RESTORE_PTRS();
return obj.state;
}
bool sign() { return copy_sign(); }
// must start with alpha, no underscore
bool copy_varname()
{
if(!isalpha(*obj.ip))
return obj.state = false;
return copy_cname();
}
bool varname()
{
skip_ignored();
return copy_varname();
}
// for stuff like 'AVar = ...'
char varname_str[256];
char destvar_str[256];
bool destvar_loaded;
bool destination_var()
{
// VARNAME =
skip_ignored();
destvar_loaded = false; // init destvar stack as needed
int mark = op2offset(); // save start of name.
SAVE_PTRS();
// destination var can't use multiplier
if(!copy_varname())
{
RESTORE_PTRS(); // don't keep the name in the queue
return obj.state = true;
}
get_opstr(varname_str, sizeof(varname_str), mark);
// todo: find varname and push it's address onto the destinaton stack
skip_ignored();
if(!skip("="))
{
RESTORE_PTRS();
return obj.state = false;
}
// else OK!
destvar_loaded = true;
strcpy(destvar_str, varname_str);
return obj.state = true;
}
bool store_result()
{
set_var("ans", dpop());
if(!destvar_loaded)
return obj.state = false;
var_type* p = find_var(destvar_str);
if(!p)
p = create_var(destvar_str, 0);
p->value = get_var("ans");
destvar_loaded = false;
return obj.state = true;
}
bool skip_ignored()
{
while(skip_ws() || skip_newline()) // || skip_comments()
; //
return obj.state = true;
}
bool var()
{
int signval = 1;
skip_ignored();
SAVE_PTRS(); // may have a sign.
int mark = op2offset();
sign();
push_str(mark);
if(STREQ(pop_str(), "-"))
signval = -1;
// should be the start of the name
if(!varname())
{
RESTORE_PTRS();
return obj.state = false;
}
push_str(mark);
const char* tmp = pop_str();
var_type *p = find_var(tmp);
if(p == 0)
p = var_new(tmp);
dpush(p->value * signval);
return obj.state = true;
}
file: src/test3-tokens.dat
purpose: source file
Code:
// xt-tokens.dat // Reusable loader TOKEN(NOOP) TOKEN(PLUS) TOKEN(MINUS) TOKEN(TIMES) TOKEN(DIVIDE) TOKEN(POWER) TOKEN(TIMESVAR) TOKEN(LPAREN) TOKEN(RPAREN)
Compile this thing and then...
Try this as well as some of the really narly stuff we did with test2.
I'll mark the lines I typed in.
Code:
Type 'quit' to end, 'help' for a brief help listing > 123 ANS: 123 > A=ANS ANS: 123 > B=2A ANS: 246 > B ANS: 246
Code:
> Cute, no? Parse error: Cute, no? ---^
;-)
Total Comments 0




