LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Blogs > rainbowsally
User Name
Password

Notices


Rate this Entry

The C/C++ Parser Generator (v. 2.x)-- Intro & Example 1 (ToCamelCase)

Posted 04-20-2015 at 07:42 PM by rainbowsally
Updated 05-04-2015 at 01:27 PM by rainbowsally (not enough sleep that day)

The C/C++ Parser Generator (v. 2.x)-- Intro & Example 1 (ToCamelCase)

Based loosely on the long defunct but exceedingly wonderful non-gnu BNF project at Savanah...

Today's Amazing Features
  • Character-wise parser with basic string (abbrev 'str') functions built-in.
  • Uses normal intuitive C/C++ boolean grammar, if(), while(), &&, ||, etc.
  • Powerful string translation and inline EXTERN( ... ) capability.
  • Funcs:
  • is_*()
  • not_*()
  • copy_*()
  • skip_*()
  • write_*()
  • and for 'undo' operations.
  • Special functions:
  • SAVE_PTRS(obj) saves ip and op (nestable)
  • RESTORE_PTRS(obj) restores both ip and op at current nest level
  • Also exports and parses strings! Amazing or wut?
  • This version focuses on making parser logic (AND, OR, etc) identical to C logic.

Get the brains of the parser here [Bug fixes will be posted here too.]:
http://www.linuxquestions.org/questi...-brains-36510/

Say goodbye to regex -- except for when you actuallY want it. No more "^*\($]".

And, no I'm not cussing. That's my attempt at a regex to delete all of the NSA's files on American citizens whose 4th Amendment of the "Supreme Law of the Land" our right to privacy has been secretly, illegally, and wantonly violated. [Who are 'really' the criminals here? The just powers of government are derived from the consent of the people but you continue to put this republic at risk never having sought our consent at all. No deal!]

But back to the topic at hand...

GOTCHAS

There are probably only a few 'gotchas' that may throw you here.

The main one is that we can't use the name 'obj' because it's a macro used within the parsing system to address the global vars within the main parser object.

The other two likely to cause some problems deal with the way gcc now complains about deprecated 'const char*' (called constant strings in the compiler warnings) casts to simple 'char*'. So to avoid a lot of typing we renamed these two types:

cstr = constant string aka "char*" type [c = constant]
vstr = variable string aka "const char*" type [v = variable]

And there's one more, but it's not likely to mess anyone up. The 'strcmp()' function in C returns a value that relates to a boolean TRUE state, so rather than using the NOT operator on every one of these and running the risk of not seeing it when our code fails, we have created:

streq(A, B)

which returns a true state only when the strings are equal.

Oh, and if you nest the logic too deeply, gcc may complain about putting ANDs within ORs in parens but it still may not like it. So when that happens there's always

if(ok)...

A C/C++ Parser Generator (v. 2.0) -- Example 1

Get the brains of the parser here:
http://www.linuxquestions.org/questi...-brains-36510/


Try this to-camel example. In a compressed form, this is the C code to do the entire conversion.

Code:
    ok = ( ( (n == 1) ||  skip_str("_") ) && copy_toupper(1) ) || copy_n(1);
The above converts "this_string_to_convert" to "ThisStringToConvert".

You can try changing from underscore style to camel case style with sed or regex if you like but personally, the C/C++ just makes sense and with a good debugger like kdbg, you can see why and how it works.

Check out the 'bool to_camel()' routine in the following file to see the nesting of the logic more clearly than the "compressed" version above.

file: src/to-camel.cpp
purpose: source file
Code:
// main.cpp -- skeleton created by new.main

#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include "parser.h"

#include "common/parser.cpp" // don't need a lib

void dbg(){} // for a non-moving breakpoint

#define streq(a, b) (strcmp(a, b) == 0)
#define VERSION_MAJOR 1
#define VERSION_MINOR 0

int usage(int errcode);
int print_version();

// parser functions all do 'return obj.state = <T or F>;'
// so that both the internal state and the returned values
// are the same.  Here's a typical parser_t function
// prototype, though in C/C++ there's no restriction to having
// parameters except for parsers and certain cases such as 
// do_until(test_func, action_funct), and the like.

bool to_camel();

int main(int argc, char** argv)
{
  dbg();
  if(argc < 2) return usage(1);
  if(streq(argv[1], "--help")) return usage(0);

  // we need a constant array of chars, null terminated for the
  // input buffer.
  
  cstr inbuf = argv[1];
  
  // This could just be the size of the input string or file

#define INLEN (strlen(inbuf))
  
  // The output buffer is not resizable due to the added
  // complexity and risks invovled in a tracking a moving 
  // buffer if/when it resizes.  So we anticipate a worst
  // case, which we'll "catch" if an error occurs.

#define OUTLEN ((INLEN * 2) + 256)

  // We need a variable array of chars for the output buffer.
  vstr outbuf = (vstr)malloc(OUTLEN);
  
  // We need one global obj, but we can't use the name 'obj' anywhere
  // except in any new parser_t functions we may create.
  
  PARSER_OBJ o;

  // Set up the globals using (in this case) an OBJ created on our
  // stack.  (It's safe here at the top level.)

  parser_init(&o, inbuf, INLEN, outbuf, OUTLEN);
  
  bool ok = true;

  // Example of C-like syntax for parser funcs.
  
  ok =
    parse()         // when true, we're done
    ||              // OR (else)
    parse_error()  // prints out from last saved UPDATE_ERRMSG() pointer
    ;

  if(ok)
    printf("%s\n", obj.outbuf);
  
  free(outbuf);
  dbg();
  // add routines here
  return 0;
}

int usage(int errcode)
{
#include "common/to_camel_str.dat"
  fprintf( errcode == 0 ? stdout : stderr, "%s", usage_str);
  return errcode;
}

// The global 'obj' macro now can access the fields of the
// PARSER_OBJ we created above, so no parameters are req'd
// for most of the parser functions.

bool parse()
{

  // set a watchpoint at 'obj' and 'ok' to see what it is doing
  // internally as well as externally.
  
  bool ok;
  
  static cstr errmsg = "Unknown error in 'parse()'";
  
  // For error messages we may want to use the build-in ERRMSG() functions.
  // which must be 'static' so they will not crash when the strings are
  // displayed at the top level.  Locals are all savved on the stack and
  // the stack at this level will not exist when the program exits.
  
  SAVE_STATIC_ERRMSG(errmsg);

  // This is redundant in this case but if this was a loop this would update
  // the pointers to the input buffer so we could get an accurate line count
  // and see the position where the parse job failed.
  
  UPDATE_STATIC_ERRMSG();

  // So with no more ado ...
  ok = to_camel();
  
  if(ok)
    RESTORE_STATIC_ERRMSG();
  
  return obj.state = ok;
}


// converts "this_kind_of_string" to "ThisKindOfString"
bool to_camel()
{
  // typical header:
  bool ok = true;

  // Body:
  //////////////////////
  int n = 0;
  
  while(not_eoi() && ok)
  {
    
    UPDATE_STATIC_ERRMSG(); // update error position each pass
    n++;
    
    ok =
      (
        (
          (n == 1)
          ||
          skip_str("_")
        )
        &&
        copy_toupper(1)
      )
      || copy_n(1)
      ;
    
    // The above reads if we can skip an underscore or if n == 1, uppercase
    // the next letter, otherwise, just copy a character and repeat until
    // end of input or until ok = false.

    // to see what happens if we have an error try this.
    
#if 0 // set to 1 to enable the error display test
    if(n == 3)
      ok = obj.state = false;
#endif
    
  } // end of while() loop
  
  //////////////////////

  // typical footer:
  return obj.state = ok; // or not ;-)
}

file: src/common/to_camel_str.dat
purpose: usage strings
Code:
/* to_camel_str.txt converted with txt2cstr */
const char* usage_str =
    "Usage: input a string like \"this_one\" to convert it to one\n"
    "like \"ThisOne\".\n"
    ;

file: Makefile
purpose: build using mc2
Code:
## Makefile created with makefile creator 'mc2' v 3.2.0

################################################################
## Note: The following section can be put into a 'mc2.def' and 
## modified for automatic override of defaults as needed.
################################################################

################################################################
## Variable Definitions Section

## User Defined Vars

PREFIX = $(HOME)/usr
BUILDDIR := $(PWD)
## The output file name not including path
OUTNAME = MULTI

## The directories for sources, (temp) objects, and binary output(s)
BINDIR = .
SRCDIR = src
OBJDIR = o

## What COMPILE should do.
COMPILE = g++ -c -o
CFLAGS = -Wall -Wno-comment -g3
INCLUDE = -I $(SRCDIR) -I. -I /usr/include

## What LINK should do.
LINK = g++ -o
LDFLAGS = 
LIB = -L/usr/lib -L$(PREFIX)/lib

## MODIFY BELOW THIS LINE WITH GREAT CARE
################################################################
## File Lists Section

## The full path to the output file
MAIN = $(MAIN_FILES)

SRC = \
  src/to-camel.cpp \
  #############

HDR = \
  src/parser.h \
  #############

OBJ = \
  o/to-camel.o \
  #############

MAIN_FILES = \
  ./to-camel \
  #############

################################################################
## Rules Section

all: $(EXT_ALL) $(MAIN) $(HDR) $(SRC)

$(BINDIR)/to-camel: $(OBJDIR)/to-camel.o
	@echo
	@echo "Linking to-camel"
	$(LINK) $(BINDIR)/to-camel $(OBJDIR)/to-camel.o $(LDFLAGS) $(LIB)
	$(POST)

$(OBJDIR)/to-camel.o: $(SRCDIR)/to-camel.cpp $(HDR)
	@echo
	@echo "Compiling to-camel"
	$(COMPILE) $(OBJDIR)/to-camel.o $(SRCDIR)/to-camel.cpp $(CFLAGS) $(INCLUDE)

################################################################
## Additional Targets

update: $(EXT_UPDATE)
	@mc2 -update

# example targets
#mc2-semiclean: $(EXT_SEMICLEAN)
#	@rm -f *~ */*~ */*/*~

#mc2-clean: $(EXT_CLEAN)
#	@rm -f $(MAIN)
#	@rm -f $(OBJ)
#	@rm -f *~ */*~ */*/*~ */*/*/*~
################################################################
## User Defined Targets


semiclean: $(EXT_SEMICLEAN)
	@rm -f $(OBJ)
	@rm -f *~ */*~ */*/*~ */*/*/*~
	@rm -f *.kdevelop.filelist  *.kdevelop.pcs  *.kdevses Doxyfile

strip:
	@strip $(MAIN_FILES)
	@make semiclean

clean: $(EXT_CLEAN)
	@rm -f $(MAIN_FILES)
	@rm -f $(OBJ)
	@rm -f *.kdevelop.pcs *.kdevses
	@rm -f *~ */*~ */*/*~ */*/*/*~ tmp.mak

force: # used to force execution

 
################################################################

file: mc2.def
purpose: source file for the makefile.
Code:
# mc2.def template created with Makefile Creator 'mc2'

# sandbox path and other new variables
PREFIX = $(HOME)/usr
BUILDDIR := $(PWD)

OUTNAME = MULTI

SRCDIR = src
OBJDIR = o
BINDIR = .

# what COMPILE should do
COMPILE = g++ -c -o # COMPILE <output_file> ...
 CFLAGS = -Wall -Wno-comment -g3 # debug
# CFLAGS = -Wall -O2 # optimized
INCLUDE = -I $(SRCDIR) -I. -I /usr/include 

# what LINK should do
LINK = g++ -o # LINK <output_file> ...
LDFLAGS = 
LIB = -L/usr/lib -L$(PREFIX)/lib


semiclean: $(EXT_SEMICLEAN)
  @rm -f $(OBJ)
  @rm -f *~ */*~ */*/*~ */*/*/*~
  @rm -f *.kdevelop.filelist  *.kdevelop.pcs  *.kdevses Doxyfile

strip:
  @strip $(MAIN_FILES)
  @make semiclean

clean: $(EXT_CLEAN)
  @rm -f $(MAIN_FILES)
  @rm -f $(OBJ)
  @rm -f *.kdevelop.pcs *.kdevses
  @rm -f *~ */*~ */*/*~ */*/*/*~ tmp.mak
  
force: # used to force execution
Posted in Uncategorized
Views 966 Comments 0
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 02:56 AM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration