LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Blogs > rainbowsally
User Name
Password

Notices

Rate this Entry

Parser Test 3 (A Return Stack & Compiler)

Posted 01-03-2013 at 09:53 AM by rainbowsally
Updated 01-10-2013 at 06:05 PM by rainbowsally

Parser Test 3 (A Return Stack & Compiler)

A self extractor for the 'lib' files that we include for this (pretty cute) little calculator are here.
http://www.linuxquestions.org/questi...wsally-615861/

Re: the compiler part. It's not much of a compiler, but it serves as a proof of concept. (It's used internally, the program is not compiled.)

Before we dig in here, I should say...

Bug in Parser Test 2.

The function: 'store_result()' should be done after final unnesting due to the delayed execution of some of the tokens which get picked up at RPAREN.
Code:
  ...
        (destination_var() OPT)   // optional VAR = ...
        && expr()                 // parse an expression
        // && store_result()      // store result in VAR *** moved down -rs
  ...
  
  RPAREN();
  store_result();   // *** moved from above -rs
  return is_eoi();  // made it to the end
}
That won't be fixed in test2 because it's just a toy. But test3 is a usable calculator though it needs a roll-back buffer and line editor (too big to post here).

Here's test3.

All we changed here is what the parser 'does'. It DOES compile calls into a dictionary area though how it chooses to execute those calls (as returns pushed on a stack) may not seem much like compiling. :-) But it is. It works.

Type 'help' when you get it running.

Turn 'status on' if you want to see the VM state after each computation. It shows the stacks which should always be empty when it shows, and it shows the function calls that were compiled.

file: src/test3.cpp
purpose: source file
Code:
// main.cpp 

/* The basic idea of a parser is to identify text and convert it to another form
 * which may be text (a C program, for example) or byte codes or anything else.
 * 
 * In this scheme the input is INVIOLATE! This is unlike flex which temporarily 
 * pokes holes in the input in order to terminate yytext.  Instead we use the 
 * ability of the output buffer to 'undo' changes and push and pop temporary 
 * strings that automatically get recycled, which allows us great flexibility in 
 * manipulating the output text using C calls such as 'sprintf()'.
 * 
 * This demo is best viewed in a debugger like kdbg or some other insight-like 
 * gdb front end.  The 'watchpoint' you'll find most interesting is named 'obj'.
 * 
 * In fond memory of the "readability of code" criterion,
 * 
 * The Computer Mad Science Team
 * 
 * :-)
 */

#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include "inc/parser2.cpp"
#include "inc/parser2-numbers.cpp"
#include <math.h>

void dbg(){} // for a non-moving breakpoint

bool parse();

#define obj (*g_parser_obj)

// fwd refs
extern int dptr;                  // double stack ptr
double get_var(const char* name); // returns value
bool compiler_status;
const char* token_name(xt_type xt);

int main(int argc, char** argv)
{
  
  printf(
         "Input some math expressions and we'll try to evaluate them.\n"
         "Type 'quit' to end, 'help' for a brief help listing\n"
        );
  
  dbg();
  
  char buffer[256];
  
  // We don't need the pointer here because we're only using one parser, but 
  // for form...
  PARSER_OBJ* o = parser_new(buffer, 256, false);
  o = o; // UNUSED(o);
    
  while(1)
  {
    // we have set the buffer in the parser, but we need to 
    // load it again in order to set the end of input pointer
    // correctly for the input string.
    
    fgets((char*)buffer, 255, stdin);
    parser_reload(buffer, strlen(buffer), false);
    if(parse())
    {
      if(compiler_status)
      {
        printf("DStack: %d\t RStack: %d\t PStack: %d\n", dptr, obj.rp, obj.sp);
        printf("Dictionary: %d\n", obj.dp);
        printf("COMPILED: ");
        for(int i = 0; i < obj.dp; i++)
        {
          printf("\t%s", token_name(obj.dict[i]));
        }
        printf("\n");
      }
      printf("ANS: %lg\n", get_var("ans"));
    }
    else
    {
      printf("Parse error:\n");
      // assume short offset here
      printf("%s", obj.inbuf);
      for(int i = 1; i < ip2offset(); i++)
        printf("-");
      printf("^\n");
    }
  }
  return 0;
}


/////////////////////////////////////////////////////////////////////
// CODE declarations for XT's (executable tokens)
 
// Priorities set by position in the table.
// declarations

#define TOKEN(a) void a();
#include "test3-tokens.dat"
#undef TOKEN

// This table contains tokens by priority.
xt_type tokens[] =
{
#define TOKEN(a) a,
#include "test3-tokens.dat"
#undef TOKEN
0
};

const char* token_names[] =
{
#define TOKEN(a) #a,
#include "test3-tokens.dat"
#undef TOKEN
0  
};

const char* token_name(xt_type xt)
{
  int index = -1;
  for(int i = 0; tokens[i] != 0; i++)
  {
    if(tokens[i] == xt)
    {
      index = i;
      break;
    }
  }
  if(index < 0)
    return "unknown";
  // else
  return token_names[index];
}

// -1 if not found
int tok_priority(xt_type tok)
{
  for(int i = 0; tokens[i] != 0; i++)
  {
    if(tokens[i] == tok)
      return i;
  }
  return -1; // not found
}

/////////////////////////////////////////////////////////////////////
// COMPILER stuff, stacks and CODE definitions.

typedef unsigned char byte;

#define DSTACK_SIZE 128
#define SSTACK_SIZE 128

// screwball C stacks grow up instead of down and have integer pointers

double dstack[DSTACK_SIZE];   // numeric values stack
int dptr;           // stack pointer var

void dstack_init()    { dptr = 0;}
void dpush(double d)  { dstack[dptr++] = d;}
double dpop()         { return dstack[--dptr];}

// pstack and rstack are internal
void pstack_init()    { obj.sp = 0;}
void rstack_init()    { obj.rp = 0; }
void rpush(xt_type xt)  {obj.rstack[obj.rp++] = xt;}
xt_type rpop()  { return obj.rstack[--obj.rp];}

/////////////////////////////////////////////////////////////////////
// COMPILER stuff, variable names and values

typedef struct
{
  const char* name;
  double value;
}var_type;

var_type vars[32];
int max_vars = 32;

#define STREQ(a, b) (strcasecmp(a,b) == 0)

var_type* var_new(const char* name)
{
  // if name was found, return it, else first open cell.
  for(int i = 0; i < max_vars; i++)
  {
    if(!vars[i].name)
    {
      var_type* p = vars + i;
      p->name = strdup(name);
      p->value = 0.0;
      return p;
    }
    if(STREQ(name, vars[i].name))
       return vars+i;
  }
  // error, no more vars
  fprintf(stderr, "Out of variable memory!\n");
  exit(1);
}

// returns null if not found
var_type* find_var(const char* name)
{
  for(int i = 0; i < max_vars; i++)
  {
    if(!vars[i].name)
      return 0; // assume packed consecutive
    if(STREQ(name, vars[i].name))
       return vars + i;
  }
  return 0;
}

var_type* create_var(const char* name, double value)
{
  var_type* p = var_new(name);
  p->value = value;
  return p;
}

void set_var(const char* name, double value)
{
  var_type *p = find_var(name);
  if(p == 0)
    create_var(name, value);
  else
    p->value = value;
}

double get_var(const char* name)
{
  var_type *p = find_var(name);
  if(!p)
    return 0;
  else
    return p->value;
}

/////////////////////////////////////////////////////////////////////
// COMPILER stuff, executable tokens (jump vectors)

#define TOKEN void

TOKEN NOOP() {}

TOKEN RPAREN()
{
  xt_type tok;
  while(1)
  {
    tok = rpop();
    if(!tok)
      break;
    tok(); // execute
  }
}

TOKEN PLUS()
{
  double b = dpop();
  double a = dpop();
  dpush(a+b);
}
TOKEN MINUS()
{
  double b = dpop();
  double a = dpop();
  dpush(a-b);
}
TOKEN TIMES()
{
  double b = dpop();
  double a = dpop();
  dpush(a*b);
}
TOKEN DIVIDE()
{
  double b = dpop();
  double a = dpop();
  dpush(a/b);
}
TOKEN POWER()
{
  double b = dpop();
  double a = dpop();
  dpush(pow(a, b));
}
TOKEN TIMESVAR() // See times_var() below
{
  double a = dpop();
  const char* name = pop_str();
  var_type * p = find_var(name);
  if(p == 0)
    p = create_var(name, 0);  
  double b = p->value;
  dpush(a*b);
}
TOKEN LPAREN()
{
  rpush(0);     // terminator
  rpush(NOOP);  // dummy to catch first call back
}

#undef TOKEN

/////////////////////////////////////////////////////////////////////
// protos for parser elements used in evaluating expressions

bool expr();
bool power();
bool times();
bool divide();
bool plus();
bool minus();
bool builtin();
bool lparen();
bool rparen();

////////////////////////////////////////////////////////////
// protos of custom mid-level parsers for this application:

// this is a custom function which skips whitespaces in this 
// case, but in a more full-featured app should also skip 
// comments and newlines.
bool skip_ignored();

// what to do for...
// an integer part of a number
bool integer();
// a fraction part of a number
bool frac();
// a decimal number consisting of integer and/or frac parts
bool number();
//the e or E if a number has an exponent
bool exp_prefix();
// the sign, possibly multiply occurring
bool sign();
// the whole exponent part of a number
bool exponent();
// a floating point number
bool float_num();
// a named value
bool var();
// an operator
bool oper();
// a legal variable name
bool varname();
// an assignment prefix
bool destination_var();
// store result
bool store_result();

static int expr_nestlvl;

//////////////////////////////////////////////////////////////
// Here's the main loop for the parser(s).

static char* op_watch; // needed because SAVE_PTRS() now uses relative addr

bool parse()
{
  obj.dp = 0; // restart compiler
  LPAREN();
  
  expr_nestlvl = 0; // init expression parser recursion counter
  
  while(1)
  {
    op_watch = obj.op; // for debugger
    // skip to the next text or the end of the input buffer
    skip_ignored();
    
    if(is_eoi())
      break; // true
    
    // parse and/or evaluate the expression
    obj.state =
    (
      // check reserved keywords first so they aren't seen as a var names
      builtin()
      || 
      (
        (destination_var() OPT)   // optional VAR = ...
        && expr()                 // parse an expression
//        && store_result()       // store result in VAR moved down -rs
      )
    );
    
    if(!obj.state)
      break; // failed
  }
  RPAREN();
  store_result();   // moved from above -rs
  return is_eoi();  // made it to the end
}

bool integer() { return copy_integer(); }
bool frac() { return copy_frac(); }

bool number()
{
  // INTEGER [FRAC] | FRAC
  obj.state =
  (
    integer() && (frac() OPT)
  )
  || frac()
  ;
  return obj.state;
}
    
bool exp_prefix()
{
  // e|E
  return obj.state = copy("e") || copy("E");
}
    
bool exponent()
{
  // E [SIGN] NUMBER
  obj.state =
    exp_prefix() && (sign() OPT) && number();
  
  return obj.state;
}

bool parse_float_num()
{
  return obj.state =
  ( // [SIGN]
    sign() OPT 
  ) 
  &&
  (
    ( // INTEGER [FRAC] [EXP]
      copy_integer() // req'd
      && 
      (
        copy_frac() OPT
      )
      &&
      (
        exponent() OPT
      )
    ) // OR  FRAC [EXP]
    || 
    (
      copy_frac()
      && (exponent() OPT)
    )
    // OR EXP
    || exponent()
  );
}

bool float_num()
{
  SAVE_PTRS();
  skip_ignored();  
  int mark = op2offset();
  
  parse_float_num();
  
  if(!obj.state)
    RESTORE_PTRS();
  else
  {
    push_str(mark);
    double res = 0;
    sscanf(pop_str(), "%lg", &res);
    dpush(res);
  }
  return obj.state;
}

#define MIN(a, b) (a < b ? a : b)

#define current_xt (obj.dict[obj.dp-1])

void do_current()
{
  int prioa = tok_priority(obj.rstack[obj.rp-1]);
  int priob = tok_priority(current_xt);
  if(prioa >= priob)
  {
    // rpush rswap rpop execute
    xt_type exec = rpop();
    rpush(current_xt);
    exec(); // execute
  }
  else
    // delay execution
    rpush(current_xt);
}

bool oper()
{
  skip_ignored();
  SAVE_PTRS();
  obj.state =
  (
    (skip("+") && compile(PLUS)) ||
    (skip("-") && compile(MINUS)) ||
    (skip("*") && compile(TIMES)) ||
    (skip("/") && compile(DIVIDE)) ||
    (skip("^") && compile(POWER))    
  )
  ;

  // AND do_current xt, either push and wait for results or 
  // push current and execute previous
  if(obj.state)
    do_current();
  return obj.state;
}

bool lparen()
{
  skip_ignored();
  skip("(");
  if(obj.state)
  {
    compile(LPAREN);
    do_current();
  }
  return obj.state;
}

bool rparen()
{
  skip_ignored();
  
  skip(")");
  if(obj.state)
    RPAREN();
  return obj.state;
}

// the var name always follows so this could really execute 
// TIMESVAR functions right here.
bool times_var()
{
  SAVE_PTRS();
  int mark = op2offset();
  copy_cname();
  if(obj.state)
  {
    push_str(mark);
    compile(TIMESVAR);
    do_current();
  }
  else
    RESTORE_PTRS();
  return obj.state;
}

bool expr()
{
  // this is just for watching recursion in a debugger
  expr_nestlvl++;
    
  // ( (FLOAT_NUM [*VAR] | VAR) [OPERATOR EXPR] ) <end>
  skip_ignored(); // skip to meaningful text
  SAVE_PTRS(); // so we can undo "*" insertion, etc.
  obj.state =
  ( // VAR | (FLOAT_NUM [*VAR]) 
    (
      var() 
      || ( float_num() && (times_var() OPT))
      ||
      ( // | {EXPR} [OPERATOR EXPR]
        lparen() 
        && expr() 
        && rparen()
      )
    )
    && 
    ( 
      ( oper() && expr() ) OPT
    )
  );
  
  // The expression must be complete one or it's an error.
  // This prevents things like two var names in a row
  // without an intermediate operator.
  
  expr_nestlvl--;
  
  if(expr_nestlvl == 0)
    obj.state = skip_ignored() && is_eoi();

  return obj.state;
}

bool parse_word(char* buf256)
{
  bool ret;
  skip_ignored();
  int mark = op2offset();
  ret = copy_cname();
  push_str(mark);
  strcpy(buf256, pop_str());
  return ret;
}

bool builtin() // needs to push a value for 'store_result'
{ 
  SAVE_PTRS();
  char name[256];
  char param[256];
  
  int mark = op2offset();
  obj.state = copy_cname();
  
  if(!obj.state)
  {
    RESTORE_PTRS();             // redundant but harmless
    return obj.state = false;   // ''
  }
  
  get_opstr(name, 245, mark);  
  
  // find built-in and execute here...
  if((obj.state = STREQ(name, "quit")))
  {
    printf("Later gator! :-)\n");
      exit(0);
  }
  else if ((obj.state = STREQ(name, "status")))
  {
    // ON | OFF
    parse_word(param);
    if(STREQ(param, "on"))
       compiler_status = true;
    else if(STREQ(param, "off"))
      compiler_status = false;
    else
      printf("STATUS IS [%s]\n", compiler_status ? "ON" : "OFF");
    obj.state = true; // opt    
  }
  else if((obj.state = STREQ(name, "help")))
  {
    printf("HELP | STATUS [ON|OFF] | QUIT\n");
  }
  if(obj.state)
    dpush(0);
  else
    RESTORE_PTRS();
  return obj.state;
}

bool sign() { return copy_sign(); }

// must start with alpha, no underscore
bool copy_varname()
{
  if(!isalpha(*obj.ip))
    return obj.state = false;
  return copy_cname();
}

bool varname()
{
  skip_ignored();
  return copy_varname();  
}

// for stuff like 'AVar = ...'
char varname_str[256];
char destvar_str[256];
bool destvar_loaded;

bool destination_var()
{
  // VARNAME =
  skip_ignored();
  destvar_loaded = false; // init destvar stack as needed
  int mark = op2offset(); // save start of name.
  
  SAVE_PTRS();  

  // destination var can't use multiplier
  if(!copy_varname())
  {
    RESTORE_PTRS(); // don't keep the name in the queue
    return obj.state = true;
  }
  
  get_opstr(varname_str, sizeof(varname_str), mark);
  
  // todo: find varname and push it's address onto the destinaton stack
  skip_ignored();
    
  if(!skip("="))
  {
    RESTORE_PTRS();
    return obj.state = false;
  }
  // else OK!
  
  destvar_loaded = true; 
  strcpy(destvar_str, varname_str);
  return obj.state = true;
}

bool store_result()
{
  set_var("ans", dpop());
  
  if(!destvar_loaded)
    return obj.state = false;
  
  var_type* p = find_var(destvar_str);
  if(!p)
    p = create_var(destvar_str, 0);
  p->value = get_var("ans");
  destvar_loaded = false; 
  return obj.state = true;
}

bool skip_ignored()
{
  while(skip_ws() || skip_newline()) // || skip_comments()
    ; //
  return obj.state = true;
}

bool var()
{
  int signval = 1;
  
  skip_ignored();
  SAVE_PTRS(); // may have a sign.
  int mark = op2offset();
  sign();
  push_str(mark);
  if(STREQ(pop_str(), "-"))
    signval = -1;
  
  // should be the start of the name  
  if(!varname())
  {
    RESTORE_PTRS();
    return obj.state = false;
  }
  
  push_str(mark);
  const char* tmp = pop_str();
  var_type *p = find_var(tmp);
  if(p == 0)
    p = var_new(tmp);
  dpush(p->value * signval);
  return obj.state = true;
}
This file provides the prototypes, jump vectors, priorities, and the names for all the function calls.

file: src/test3-tokens.dat
purpose: source file
Code:
// xt-tokens.dat
// Reusable loader

TOKEN(NOOP)
TOKEN(PLUS)
TOKEN(MINUS)
TOKEN(TIMES)
TOKEN(DIVIDE)
TOKEN(POWER)
TOKEN(TIMESVAR)
TOKEN(LPAREN)
TOKEN(RPAREN)
This concludes the parser tests for now. The bottom line here is that each parser element, regardles of what it *does* must set the obj.state according to whether the pattern matched and if it's false, the input and output pointers must be restored to where they were before that element was entered.

Compile this thing and then...

Try this as well as some of the really narly stuff we did with test2.

I'll mark the lines I typed in.

Code:
Type 'quit' to end, 'help' for a brief help listing
> 123
  ANS: 123
> A=ANS
  ANS: 123
> B=2A
  ANS: 246
> B
  ANS: 246
Cute, no? :-)

Code:
> Cute, no?
  Parse error:
  Cute, no?
  ---^
-The Computer Mad Science Team

;-)
Posted in Uncategorized
Views 531 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 05:08 PM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration