LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Blogs > rainbowsally
User Name
Password

Notices

Rate this Entry

Parser Test 2 (Something a bit more ambitious, maybe?)

Posted 01-03-2013 at 02:36 AM by rainbowsally

Parser Test 2 (Something a bit more ambitious, maybe?)

Here's test2.

Features:
  • Lots of custom parser funcs in order to create a parser for algebraic expressions. See 'expr' 'times', and all that good stuff in the source file 'src/test2.cpp' below.

This is the output from a session.
Code:
Input some math expressions and maybe we can identify them.
Type 'quit' to end
3b + 5 * 4e-5 / (2 * 5) + 4
<--------------------------->
STRT: (PreLoad Stack)
FUNC: [0] PushRP
FUNC: [NOOP] PushRP
DATA: [3]
FUNC: (priority 6)[*] PushRP
VAR : [b]
FUNC: (priority 1) [+] PushRP SwapRP PopRPExec
DATA: [5]
FUNC: (priority 2)[*] PushRP
DATA: [4e-5]
FUNC: (priority 3) [/] PushRP
(...
STRT: (PreLoad Stack)
FUNC: [0] PushRP
FUNC: [NOOP] PushRP
DATA: [2]
FUNC: (priority 2)[*] PushRP SwapRP PopRPExec
DATA: [5]
FINI: PopRPExec
END : (Check Stack)...)
FUNC: (priority 1) [+] PushRP SwapRP PopRPExec
DATA: [4]
FINI: PopRPExec
FINI: PopRPExec
FINI: PopRPExec
END : (Check Stack)
As you can see there are priorities associated with every function. This determines whether an xt (executable token) will be pushed to the top of RETURN STACK or it will be tucked under the current top of the stack after which the top of the stack will be popped like a return address (which it in fact is) and executed.

At the end of a string of semi-compiled code such as this, due to the prioritizing of the parameters which could very easily procede from beginning to end without ever executing the previous xt so at the end of the 'statement' (we'll call it), the final operation is to pop return addresses and execute them until it hits a null pointer.

Notice the first code pushed onto the return stack (RP) is a noop. That's because it's quite possible that the first opcode will execute the previous one, and we don't want anything to happen there.

This is narly stuff. And I don't yet know if the code version will behave but it looks like it should from the parser output.

I also think you'll enjoy the fact that we can use exponential notation and variable names with multiplier prefixes, such as "3b" in the expression input at the top there. And this, of course has the highest priority of all.

Here's the operative snippet.

Code:
DATA: [3]
FUNC: (priority 6)[*] PushRP
VAR : [b]
FUNC: (priority 1) [+] PushRP SwapRP PopRPExec
Data 3 (an integer: '3') is placed on the parameter stack.

The function '*' with a priority of 6 (higher than normal multiplication) is pushed onto the return stack. Then ANYTHING that comes along with a lower priority than this will SWAP and EXECUTE the multiplier.

The reason we call this a 'return stack' even though there are no calls to it is because of how this operates in a compiled program where, in fact, the RP is truly a return stack. This is what I did in Forth. It may not work in C due to the frame pointers getting in the way of the real call stack (and this will not be portable between even 32 and 64 bit ix86) but it still might work. Needs some more study.

The option to this complicated call-return arrangement is to divide every function into two parts. One for first pass and one for second pass. And who knows, that too might actually work somehow.

In the meantime, I think you'll enjoy the magic of parsing in this application. If you love puzzles, that is!

2B * -2B

That is the question.

And here's the answer:

Code:
2B*-2B
STRT: (PreLoad Stack)
FUNC: [0] PushRP
FUNC: [NOOP] PushRP
DATA: [2]
FUNC: (priority 5)[*] PushRP
VAR : [B]
FUNC: (priority 2)[*] PushRP SwapRP PopRPExec
DATA: [-2]
FUNC: (priority 5)[*] PushRP
VAR : [B]
FINI: PopRPExec
FINI: PopRPExec
END : (Check Stack)
Yeah... and you were expecting exactly what. A number? The end of a philosophical quandry? :-)

Well, here's the code.

file: src/test2.cpp
purpose: source file
Code:
// main.cpp 

/* The basic idea of a parser is to identify text and convert it to another form
 * which may be text (a C program, for example) or byte codes or anything else.
 * 
 * In this scheme the input is INVIOLATE! This is unlike flex which temporarily 
 * pokes holes in the input in order to terminate yytext.  Instead we use the 
 * ability of the output buffer to 'undo' changes and push and pop temporary 
 * strings that automatically get recycled, which allows us great flexibility in 
 * manipulating the output text using C calls such as 'sprintf()'.
 * 
 * This demo is best viewed in a debugger like kdbg or some other insight-like 
 * gdb front end.  The 'watchpoint' you'll find most interesting is named 'obj'.
 * 
 * In fond memory of the "readability of code" criterion,
 * 
 * The Computer Mad Science Team
 * 
 * :-)
 */

#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include "inc/parser2.cpp"
#include "inc/parser2-numbers.cpp"

void dbg(){} // for a non-moving breakpoint

bool parse();

#define obj (*g_parser_obj)

int main(int argc, char** argv)
{
  
  printf(
         "Input some math expressions and maybe we can identify them.\n"
         "Type 'quit' to end\n"
        );
  
  dbg();
  
  char buffer[256];
  
  // We don't need the pointer here because we're only using one parser, but 
  // for form...
  PARSER_OBJ* o = parser_new(buffer, 256, false);
  o = o; // UNUSED(o);
    
  while(1)
  {
    // we have set the buffer in the parser, but we need to 
    // load it again in order to set the end of input pointer
    // correctly for the input string.
    
    fgets((char*)buffer, 255, stdin);
    parser_reload(buffer, strlen(buffer), false);
    if(parse())
      printf("%s\n", get_result());
    else
    {
      printf("Parse error:\n");
      // assume short offset here
      printf("%s", obj.inbuf);
      for(int i = 1; i < ip2offset(); i++)
        printf("-");
      printf("^\n");
    }
  }
  return 0;
}

/////////////////////////////////////////////////////////////////////
// protos for parser elements used in evaluating expressions

bool expr();
bool power();
bool times();
bool divide();
bool plus();
bool minus();
bool builtin();
bool lparen();
bool rparen();

////////////////////////////////////////////////////////////
// protos of custom mid-level parsers for this application:

// this is a custom function which skips whitespaces in this 
// case, but in a more full-featured app should also skip 
// comments and newlines.
bool skip_ignored();

// what to do for...
// an integer part of a number
bool integer();
// a fraction part of a number
bool frac();
// a decimal number consisting of integer and/or frac parts
bool number();
//the e or E if a number has an exponent
bool exp_prefix();
// the sign, possibly multiply occurring
bool sign();
// the whole exponent part of a number
bool exponent();
// a floating point number
bool float_num();
// a named value
bool var();
// an operator
bool oper();
// a legal variable name
bool varname();
// an assignment prefix
bool destination_var();
// store result
bool store_result();

static int expr_nestlvl;

// for priorities
int save_prio_sp();
void restore_prio_sp(int sp);

static int prio_stack[256];
static int prio_sp;
void push_prio(int n)
{
  prio_stack[prio_sp++] = n;
}
int pop_prio()
{
  return prio_stack[--prio_sp];
}

int prev_prio() {return prio_stack[prio_sp - 2];}
int cur_prio()  {return prio_stack[prio_sp - 1];}

enum
{
  PLUS_PRIO = 1,
  MINUS_PRIO = 1,
  TIMES_PRIO = 2,
  DIVIDE_PRIO = 3,
  POWER_PRIO = 4,
  PAREN_PRIO = 5,
  TIMESVAR_PRIO = 6
};


bool do_nest()
{
  return put("STRT: (PreLoad Stack)\nFUNC: [0] PushRP\nFUNC: [NOOP] PushRP\n");
}

bool do_unnest()
{
  while(pop_prio())
  {
    put("FINI: PopRPExec\n");
  }
  put("END : (Check Stack)");
  return obj.state = true;
}

//////////////////////////////////////////////////////////////
// Here's the main loop for the parser(s).

static char* op_watch; // needed because SAVE_PTRS() now uses relative addr

bool parse()
{
  int svsp = save_prio_sp();
  push_prio(0);
  push_prio(0);
  
  expr_nestlvl = 0; // init expression parser recursion counter
  
  do_nest();
  while(1)
  {
    op_watch = obj.op; // for debugger
    // skip to the next text or the end of the input buffer
    skip_ignored();
    
    if(is_eoi())
      break; // true
    
    // parse and/or evaluate the expression
    obj.state =
    (
      // check reserved keywords first so they aren't seen as a var names
      builtin() 
      || 
      (
        (destination_var() OPT) // optional [VAR =]
        && expr()               // parse an expression
        && (store_result() OPT) // optional store result in VAR
      )
    );
    
    if(!obj.state)
      break; // failed
  }
  do_unnest();
  restore_prio_sp(svsp);
  return is_eoi();  // made it to the end
}

bool integer() { return copy_integer(); }
bool frac() { return copy_frac(); }

bool number()
{
  // INTEGER [FRAC] | FRAC
  obj.state =
  (
    integer() && (frac() OPT)
  )
  || frac()
  ;
  return obj.state;
}
    
bool exp_prefix()
{
  // e|E
  return obj.state = copy("e") || copy("E");
}
    
bool exponent()
{
  // E [SIGN] NUMBER
  obj.state =
    exp_prefix() && (sign() OPT) && number();
  
  return obj.state;
}

// this is a...
// subroutine shared by varname with a multiplier and a regular
// float number.  These could be optimized by changing 
// the parser to accept FLOAT_NUM & [VARNAME] but it would 
// have to make sure there's no whitespaces between the multiplier 
// and the varname which currently queues up the text and would 
// not work right.

bool parse_float_num()
{
  return obj.state =
  ( // [SIGN]
    sign() OPT 
  ) 
  &&
  (
    ( // INTEGER [FRAC] [EXP]
      copy_integer() // req'd
      && 
      (
        copy_frac() OPT
      )
      &&
      (
        exponent() OPT
      )
    ) // OR  FRAC [EXP]
    || 
    (
      copy_frac()
      && (exponent() OPT)
    )
    // OR EXP
    || exponent()
  );

}

bool float_num()
{
  // we need to be able to undo changes if the parse fails after doing
  // something.
  SAVE_PTRS();

  // This is the start of an expression so we need to make sure the 
  // parser input is pointing to valid text.
  skip_ignored();  

  // There are three ways to start a float, all of them 
  // beginning with an optional 'sign'.
  // [SIGN] NUMBER [EXPONENT]
  // (
  //    integer [ .frac ] [exp] 
  //  | .frac [exp] 
  //  | exp
  // )
  
  int mark = op2offset();
  
  parse_float_num();
  
  if(!obj.state)
    RESTORE_PTRS();
  else
  {
    // here's one way we can swap two strings. the string
    // stack is volatile but we use it right away.
    push_str(mark);     // save $1
    put("DATA: [");     // write $2
    put(pop_str());     // write $1
    // ...
    put("]\n");
  }
  return obj.state;
}


bool set_priority(int n)
{
  //prev_priority = cur_priority;
  //cur_priority = n;
  push_prio(n);
  return obj.state = true;
}

int get_priority()
{
  return cur_prio();
}

int save_prio_sp()
{
  return prio_sp;
}

void restore_prio_sp(int sp)
{
  prio_sp = sp;
}

#define MIN(a, b) (a < b ? a : b)

// also pops priority if it's a swap type
const char* get_prioritystr()
{
  if(cur_prio() > prev_prio())
    return "PushRP";
  else
  {
    int a = pop_prio();
    int b = pop_prio();
    push_prio(MIN(a, b));
    return "PushRP SwapRP PopRPExec";
  }
}

bool oper()
{
  // cue up text
  skip_ignored();
  
  int mark = op2offset();
  SAVE_PTRS();
  // find_token()
  
  /*
  priorities:
  +:1 subtotal push and continue
  -:1 subtotal push and continue
  *:2 if prev priority >= 2 subtotal else push and continue
      /:3 if prev priority >= 3 subtotal else push and continue
  ^:4 if prev priority >= 4 subtotal else push and continue
  */      
  
  // prev_priority = priority;
  obj.state =
  (
    (copy("+") && set_priority(PLUS_PRIO)) ||
    (copy("-") && set_priority(MINUS_PRIO)) ||
    (copy("*") && set_priority(TIMES_PRIO)) ||
    (copy("/") && set_priority(DIVIDE_PRIO)) ||
    (copy("^") && set_priority(POWER_PRIO))    
  )
  ;
  // AND do_token()
  if(obj.state)
  {
    // swap 2 strings (See above.)
    push_str(mark);
    putfmt("FUNC: (priority %d) [", get_priority());
    put(pop_str());
    putfmt("] %s\n", get_prioritystr());
  }
  return obj.state;
}

bool lparen()
{
  skip_ignored();
  skip("(");
  if(obj.state)
  {
    set_priority(0); // terminator
    set_priority(5); // actual
    obj.op += sprintf(obj.op, "(...\n");
    do_nest();
  }
  return obj.state;
}

bool rparen()
{
  skip_ignored();
  
  skip(")");
  if(obj.state)
  {
//    set_priority(0);
    do_unnest();
    obj.op += sprintf(obj.op, "...)\n");
  }
  return obj.state;
}

bool times_var()
{
  SAVE_PTRS();
  int svsp = save_prio_sp();
  int mark = op2offset();
  put("*");
  set_priority(TIMESVAR_PRIO);
  push_str(mark);
  putfmt("FUNC: (priority %d) [", get_priority());
  put(pop_str());
  obj.op += sprintf(obj.op, "] %s\n", get_prioritystr());
  

  put("VAR : [");
  copy_cname();
  if(obj.state)
    put("]\n");
  else
  {
    restore_prio_sp(svsp);
    RESTORE_PTRS();
  }
  return obj.state;
}

bool expr()
{
  // this is just for watching recursion in a debugger
  expr_nestlvl++;
    
  // ( (FLOAT_NUM [*VAR] | VAR) [OPERATOR EXPR] ) <end>
  
  skip_ignored(); // skip to meaningful text
  
  SAVE_PTRS(); // so we can undo "*" insertion, etc.
  
  obj.state =
  ( // VAR | (FLOAT_NUM [*VAR]) 
    (
      var() 
      || ( float_num() && (times_var() OPT))
      ||
      ( // | {EXPR} [OPERATOR EXPR]
        lparen() 
        && expr() 
        && rparen()
      )
    )
    && 
    ( 
      ( oper() && expr() ) OPT
    )
  );
  
  // The expression must be complete one or it's an error.
  // This prevents things like two var names in a row
  // without an intermediate operator.
  
  expr_nestlvl--;
  
  if(expr_nestlvl == 0)
    obj.state = skip_ignored() && is_eoi();

  return obj.state;
}

bool builtin()
{ 
  SAVE_PTRS();
  char name[256];
  int mark = op2offset();
  obj.state = copy_cname();
  
  if(!obj.state)
  {
    RESTORE_PTRS();             // redundant but harmless
    return obj.state = false;   // ''
  }
  
  get_opstr(name, 245, mark);
  
  // find built-in and execute here...
  obj.state = strcasecmp(name, "quit") == 0;
  if(obj.state)
  {
    printf("Later gator! :-)\n");
      exit(0);
  }
  // else if() ... more built-ins
  
  // and default, undo all changes and return false
  
  RESTORE_PTRS();
  return obj.state = false; // not a match
}

bool sign() { return copy_sign(); }

// subroutine to parse a varname currently exactly at ip
bool copy_varname()
{
  return copy_cname();
}

bool varname()
{
  skip_ignored();
  // '_'|letter & alphanumeric or '_'
  return copy_varname();  
}

char varname_str[256];
char destvar_str[256];
bool destvar_loaded;

bool destination_var()
{
  // VARNAME =
  skip_ignored();
  
  destvar_loaded = false; // init destvar stack as needed
  
  int mark = op2offset(); // save start of name.
  
  SAVE_PTRS();  

  // destination var can't use multiplier
  if(!copy_varname())
  {
    RESTORE_PTRS(); // don't keep the name in the queue
    return obj.state = false;
  }
  
  get_opstr(varname_str, sizeof(varname_str), mark);
  
  // todo: find varname and push it's address onto the destinaton stack
  skip_ignored();
    
  if(!skip("="))
  {
    RESTORE_PTRS();
    return obj.state = false;
  }
  // else OK!
  
  // push dest var
  destvar_loaded = true; 
  strcpy(destvar_str, varname_str);
  return obj.state = true;
}

bool store_result()
{
  if(!destvar_loaded)
    return obj.state = false;
  
  // pop dest_var and store result in it
  obj.op += sprintf(obj.op, "STOR: [%s]\n", varname_str);
  destvar_loaded = false; 
  return obj.state = true;
}

bool skip_ignored()
{
  while(skip_ws() || skip_newline()) // || skip_comments()
    ; //
  return obj.state = true;
}

bool var()
{
  skip_ignored();
  
  SAVE_PTRS(); // may have a sign.
  sign();
  // should be the start of the name
  int mark = op2offset();
  
  // CAREFUL: varname copies the name to op.  This is not 
  // clear from the name (no action prefix) but this file 
  // is for demonstrating syntactic similiarities between C 
  // and BNF-like blocks.
  
  if(!varname())
  {
    RESTORE_PTRS();
    return obj.state = false;
  }
  
  push_str(mark);
  obj.op += sprintf(obj.op, "VAR : [%s]\n", pop_str());
  return obj.state = true;
}

// This is about pushing the size limit so we're going to let it fly as-is 
// without the mechanism to acually 'run' the evaluation which is coming up 
// in test3 if all goes well.
Hey. I'm wondering if the 'sign' should be a function. Oh well. We shall see.

Easy enough to change. (Try inputting '-+-+5')

Code:
Type 'quit' to end
-+-+5
STRT: (PreLoad Stack)
FUNC: [0] PushRP
FUNC: [NOOP] PushRP
DATA: [+5]
END : (Check Stack)
How does it *DO* that??? ;-)

And so the parser appears to pass the test as a parser. Now we are wondering if it can also pass muster as a self-contained compiler.

- The Computer Mad Science Team

:-)
Posted in Uncategorized
Views 364 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 04:36 AM.

Main Menu
Advertisement

Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration