[SOLVED] PNFPP: Syntax Errors while parsing strings

des_a · 06-12-2016, 09:29 PM

I am trying to parse a string in a preprocessor I am creating. It is based off of the C++ preprocessor, but it is going to be a little different. I am having trouble parsing strings right now.

I am trying to parse them, without breaking anything I've previously done prior to this.

I have gotten #include <"STRING">, and #include "STRING", and #import <"STRING">, and #import "STRING" to work. I was working on #define, when I encountered these errors.

The behavior I want is for #define to ignore everything within a string. The syntax is: #define "STRING" "STRING".

There is just a simple string replacement done here. Nothing fancy. That's all my #define will do. It records the name and value within a table, and from that point on, it will do string replacement. There IS such a thing as an empty string too.

Escape sequences are allowed within a string. That way you can specify quotes within quotes without effect, in case you want to replace a string with another string that is seen by my compilers.

A string is okay to be outside of a #define, because it could be used for processing plain text. In that case, all of the strings are ignored and will go verbatim into the output. Except for that the escape sequences will still be processed.

des_a · 06-12-2016, 09:30 PM

Here is my code so far, since it is small right now:

pnfpp.ypp:

Code:

%{
/* Prologue */
#include <desLib/desLib.hpp>

#include <process.h>


#define YYSTYPE String


class PNFPP_Def
{
 protected:
  String itsname;
  String itsvalue;


 public:
  PNFPP_Def(String name = "", String value = "");
  PNFPP_Def(int value);
  PNFPP_Def(const PNFPP_Def & def);
  ~PNFPP_Def();

  void name(String name);
  String name();

  void value(String value);
  String value();


  PNFPP_Def & operator =(const PNFPP_Def & def);
};

PNFPP_Def::PNFPP_Def(String name, String value)
{
 itsname = name;
 itsvalue = value;
}

PNFPP_Def::PNFPP_Def(int value)
{
 itsname = "";
 itsvalue = "";
}

PNFPP_Def::~PNFPP_Def()
{

}

PNFPP_Def::PNFPP_Def(const PNFPP_Def & def)
{
 itsname = def.itsname;
 itsvalue = def.itsvalue;
}

void PNFPP_Def::name(String name)
{
 itsname = name;
}

String PNFPP_Def::name()
{
 return itsname;
}

void PNFPP_Def::value(String value)
{
 itsvalue = value;
}

String PNFPP_Def::value()
{
 return itsvalue;
}

PNFPP_Def & PNFPP_Def::operator =(const PNFPP_Def & def)
{
 if (this == &def)
  return *this;

 itsname = def.itsname;
 itsvalue = def.itsvalue;

 return *this;
}


void yyerror(char const * c);
int yylex();

String strip_quotes(String str);
String remove_extension(String file);
String get_extension(String file);

int conprint(const char * format, ...);
void generate_code();
void generate_defines();


FILE * output;


Array<String> files;
Array<PNFPP_Def> definitions;
%}
/* Bison Declarations */
%expect 1


%locations


%token OPLT
%token OPGT

%token STRING

%token INCLUDE
%token IMPORT
%token DEFINE



%%
/* Grammar Rules */

input:	// Empty
	| line input
	;

line:	'\n'
	| command
	| strings
	| error
	{
	 yyerrok;
	}
	;

command:	include_command
		| import_command
		| define_command
		;

include_command:	INCLUDE STRING
			{
			 Array<String> args;
   
    			 args[0] = (char *)"PNFPP";
		         args.insert();
			 args[1] = strip_quotes($2);
			 args.insert();
			 args[2] = remove_extension(strip_quotes($2)) + ".pnfpp";
    			 args.insert();
    			 args[args.length() - 1] = (char *)"";
 
   			 char ** args2 = new char * [args.length()];
    			 for (unsigned long i = 0; i < args.length(); ++i)
     			  args2[i] = (char *)args[i].getString().c_str();
    
    			 args2[args.length() - 1] = NULL;
   
   
   			 int ret = _spawnvp(_P_WAIT, "PNFPP", args2);
    			 delete args2;
   
   			 if (ret == -1 && errno != 0)
    			 {
   	 		  ret = 1;
   	 		  error(ERRORMSG, (char*)"Error running PNFPP program.");
    			 }
			 if (ret == -1)
			 {
			  yyerror("File cannot include itself.");
			  exit(-1);
			 }

			 fin.open((remove_extension(strip_quotes($2)) + ".pnfpp").getString().c_str());
			 if (!fin)
			  yyerror("Can't open file.");

			 String str;
			 unsigned long i = 0;
			 do
			 {
			  if (i != 0)
			   conprint("\n");

			  fin >> str;
			  conprint("%s", str.getString().c_str());
			  ++i;
			 } while (!fin.eof());
			 fin.close();
			}
			| INCLUDE OPLT STRING OPGT
			{
			 String extension = get_extension(strip_quotes($3));
			 $3 = "..\\include\\" + remove_extension(strip_quotes($3));
			 Array<String> args;
   
    			 args[0] = (char *)"PNFPP";
		         args.insert();
			 args[1] = $3 + extension;
			 args.insert();
			 args[2] = $3 + ".pnfpp";
    			 args.insert();
    			 args[args.length() - 1] = (char *)"";
 
   			 char ** args2 = new char * [args.length()];
    			 for (unsigned long i = 0; i < args.length(); ++i)
     			  args2[i] = (char *)args[i].getString().c_str();
    
    			 args2[args.length() - 1] = NULL;
   
   
   			 int ret = _spawnvp(_P_WAIT, "PNFPP", args2);
    			 delete args2;
   
   			 if (ret == -1 && errno != 0)
    			 {
   	 		  ret = 1;
   	 		  error(ERRORMSG, (char*)"Error running PNFPP program.");
    			 }
			 if (ret == -1)
			 {
			  yyerror("File cannot include itself.");
			  exit(-1);
			 }

			 fin.open(($3 + ".pnfpp").getString().c_str());
			 if (!fin)
			  yyerror("Can't open file for fin.");

			 String str;
			 unsigned long i = 0;
			 do
			 {
			  if (i != 0)
			   conprint("\n");

			  fin >> str;
			  conprint("%s", str.getString().c_str());
			  ++i;
			 } while (!fin.eof());
			 fin.close();
			}
			;

import_command:		IMPORT STRING
			{
			 bool found = false;
			 for (unsigned long i = 0; i < files.length(); ++i)
			 {
			  if (files[i] == $2)
			   found = true;
			 }

			 if (found == true)
			  ;
			 else
			 {
			  Array<String> args;
   
     			  args[0] = (char *)"PNFPP";
		          args.insert();
			  args[1] = strip_quotes($2);
			  args.insert();
			  args[2] = remove_extension(strip_quotes($2)) + ".pnfpp";
    			  args.insert();
    			  args[args.length() - 1] = (char *)"";
 
   			  char ** args2 = new char * [args.length()];
    			  for (unsigned long i = 0; i < args.length(); ++i)
     			   args2[i] = (char *)args[i].getString().c_str();
    
    			  args2[args.length() - 1] = NULL;
   
   
   			  int ret = _spawnvp(_P_WAIT, "PNFPP", args2);
    			  delete args2;
   
   			  if (ret == -1 && errno != 0)
    			  {
   	 		   ret = 1;
   	 		   error(ERRORMSG, (char*)"Error running PNFPP program.");
    			  }
			  if (ret == -1)
			  {
			   yyerror("File cannot include itself.");
			   exit(-1);
			  }

 			  fin.open((remove_extension(strip_quotes($2)) + ".pnfpp").getString().c_str());
			  if (!fin)
			   yyerror("Can't open file.");

 			  String str;
			  unsigned long i = 0;
			  do
			  {
			   if (i != 0)
			    conprint("\n");

			   fin >> str;
			   conprint("%s", str.getString().c_str());
			   ++i;
			  } while (!fin.eof());


			  if (files.length() > 1)
			   files.insert();

                          files[files.length() - 1] = $2;
			 }
			}
			| IMPORT OPLT STRING OPGT
			{
			 String extension = get_extension(strip_quotes($3));
			 $3 = "..\\include\\" + remove_extension(strip_quotes($3));


			 bool found = false;
			 for (unsigned long i = 0; i < files.length(); ++i)
			 {
			  if (files[i] == $3)
			   found = true;
			 }

			 if (found == true)
			  ;
			 else
			 {
			  Array<String> args;
   
    			  args[0] = (char *)"PNFPP";
		          args.insert();
			  args[1] = $3 + extension;
			  args.insert();
			  args[2] = $3 + ".pnfpp";
    			  args.insert();
    			  args[args.length() - 1] = (char *)"";
 
   			  char ** args2 = new char * [args.length()];
    			  for (unsigned long i = 0; i < args.length(); ++i)
     			   args2[i] = (char *)args[i].getString().c_str();
    
    			  args2[args.length() - 1] = NULL;
   
   
   			  int ret = _spawnvp(_P_WAIT, "PNFPP", args2);
    			  delete args2;
   
   			  if (ret == -1 && errno != 0)
    			  {
   	 		   ret = 1;
   	 		   error(ERRORMSG, (char*)"Error running PNFPP program.");
    			  }
			  if (ret == -1)
			  {
			   yyerror("File cannot include itself.");
			   exit(-1);
			  }

			  fin.open(($3 + ".pnfpp").getString().c_str());
			  if (!fin)
			   yyerror("Can't open file.");

			  String str;
			  unsigned long i = 0;
			  do
			  {
			   if (i != 0)
			    conprint("\n");

			   fin >> str;
			   conprint("%s", str.getString().c_str());
			   ++i;
			  } while (!fin.eof());


 			  if (files.length() > 1)
			   files.insert();

                          files[files.length() - 1] = $3;
			 }
		 	}
			;

define_command:		DEFINE STRING STRING
			{
			 bool found = false;
			 for (unsigned long i = 0; i < definitions.length(); ++i)
			 {
			  if (definitions[i].name() == strip_quotes($2))
			  {
			   if (definitions[i].value() != strip_quotes($3))
			    definitions[i].value(strip_quotes($3));

			   found = true;
			  }
			 }

			 if (found == false)
			 {
			  definitions[definitions.length() - 1].name(strip_quotes($2));
			  definitions[definitions.length() - 1].value(strip_quotes($3));
			  definitions.insert();
			 }
			
			 conprint("#define %s %s", $2.getString().c_str(), $3.getString().c_str());
			}
			;


strings:		STRING
			{
			 conprint("%s", $1.getString().c_str());
			}
			| strings STRING
			{
			 conprint("%s", $2.getString().c_str());
			}


%%
/* Additional C/C++ Code */
String strip_quotes(String str)
{
 String str2 = "";

 for (unsigned long i = 0; i < str.length(); ++i)
 {
  if (i == 0)
   continue;
  else if (i == str.length() - 1)
   continue;
  else
   str2 += str[i];
 }

 return str2;
}

String remove_extension(String file)
{
 String ret;

 file = strrev((char *)file.getString().c_str());
 unsigned long pos = file.getString().find('.');
 if (pos == string::npos)
  ret = strrev((char *)file.getString().c_str());

 ret = file.getString().substr(pos + 1);
 ret = strrev((char *)ret.getString().c_str());


 return ret;
}

String get_extension(String file)
{
 String extension;

 unsigned long index = file.getString().find(".");
 if (index == string::npos)
 {
  extension = file;
  return extension;
 }

 extension = file.getString().substr(index);


 return extension;
}

pnfpp.lpp

Code:

%{
/* Prologue */
#include <stdarg.h>


#include "pnfpp.tab.cpp"


Array<String> out;
unsigned long outcount;
char outbuffer[256];
char linebuf[256];
String str;
%}
/* Flex Definitions */
%x STRING


OPLT		"<"

OPGT		">"

TCHARACTER	[^"\n\\]

TSTRING		[^ \n\"]+

INCLUDE 	"#include "

IMPORT		"#import "

DEFINE		"#define"


/* Flex Patterns Below %% */
%%

{OPLT}		yylval = "0"; return OPLT;

{OPGT}		yylval = "1"; return OPGT;

{INCLUDE}	yylval = "2"; return INCLUDE;

{IMPORT}	yylval = "3"; return IMPORT;

{DEFINE}	yylval = "4"; return DEFINE;

\"		str = '\"'; BEGIN(STRING); 

<STRING>\\n	str += '\n';

<STRING>\\t	str += '\t';

<STRING>\\\\	str += '\\';

<STRING>\\\"	str += '\"';

<STRING>\\[0-9]+	str += (char)strtol(yytext+1, 0, 10);

<STRING>{TCHARACTER}*	str += yytext;

<STRING>\\.		yyerror("Bogus escape in string.");

<STRING>\n		yyerror("Newline in string.");

<STRING><<EOF>>		yyerror("Unquoted string.");

<STRING>\"	str += '\"'; yytext = (char *)str.getString().c_str();  BEGIN(INITIAL); yylval = yytext; return STRING;

[ \n]		conprint("%s", yytext);

{TSTRING}	strncpy(linebuf, yytext, sizeof(linebuf)); conprint("%s", linebuf);


%%
/* Additional Code */
int main(int argc, char ** argv)
{
 if (argc == 3)
 {
  FILE * input = fopen(argv[1], "r");
  if (!input)
  {
   yyerror("can't open file");
   return -1;
  }
  yyin = input;
  output = fopen(argv[2], "w+");
  if (!output)
  {
   yyerror("can't open file for write");
   return -1;
  }
    
  int ret = yyparse();

  generate_defines();
  generate_code();


  return ret;
 }
 else
  yyerror("can't find input file or output file.");
}

void yyerror(char const * c)
{
 printf("%s", "* ERROR: ");
 printf("%d: ", yylloc.first_line);
 printf("@ '%s' yylval = '%s': ", yytext, yylval.getString().c_str());
 printf("yychar = '%d': ", yychar);
 printf("%s", c);
 printf("%s", "\n");
}

int conprint(const char * format, ...)
{
 va_list arg;
 int done = 0;

 va_start(arg, format);

 out.insert();
 done = vsnprintf(outbuffer, 256, format, arg);
 String str = "";
 if (outbuffer != NULL)
  str += outbuffer;
 out[outcount] = str;
 ++outcount;

 va_end(arg);


 return done;
}


void generate_code()
{
 unsigned long len = out.length();
 for (unsigned long i = 0; i < out.length(); ++i)
 {
  if (out[i].getString() == "")
   out.remove(i);
 }

 for (unsigned long i = 0; i < out.length(); ++i)
 {
  fprintf(output, "%s", out[i].getString().c_str());
 }
}

/*void generate_defines()
{
 for (unsigned long i = 0; i < out.length(); ++i)
 {
  unsigned index = out[i].getString().find("#define");

  if (index == string::npos)
   continue;

  index += 7;
  String str = out[i].getString().substr(index);
  str = strip_quotes(str);
  unsigned long index2 = str.getString().find('\"');
  str = str.getString().substr(index2 + 1);
  unsigned long index3 = str.getString().find('\"');
  unsigned long len = index3 - index2;
  str = str.getString().substr(index2, len);

  bool found = false;
  unsigned long index4 = 0;
  for (unsigned long j = 0; j < definitions.length(); ++j)
  {
   if (definitions[j].name() == str)
   {
    found = true;
    index4 = j;
   }
  }

  if (found == true)
  {
   String name = definitions[index4].name();
   String value = definitions[index4].value();
   cout << name << endl;
   cout << value << endl;
   for (unsigned long j = i; j < out.length(); ++j)
   {
    cout << out[j];
    unsigned long index5 = out[j].getString().find(name.getString());
    if (index5 == string::npos)
     continue;

    out[j].getString().insert(index5, value.getString());
    index5 = out[j].getString().find(name.getString());

    if (index5 == string::npos)
     continue;

    out[j].getString().erase(index5, name.length());
   }
  }
  //out.remove(i);
 }
}*/

/*void generate_defines()
{
 for (unsigned long i = 0; i < out.length(); ++i)
 {
  unsigned long index = out[i].getString().find("#define");

  if (index == string::npos)
  {
   continue;
  }
  else
  {
   index += 7;
   String str = out[i].getString().substr(index);
   str = strip_quotes(str);
   unsigned long index2 = str.getString().find('\"');
   str = str.getString().substr(index2 + 1);
   unsigned long index3 = str.getString().find('\"');
   unsigned long len = index3 - index2;
   str = str.getString().substr(index2, len);

   bool found = false;
   unsigned long index4 = 0;
   for (unsigned long j = 0; j < definitions.length(); ++j)
   {
    if (definitions[j].name() == str)
    {
     found = true;
     index4 = j;
    }
   }

   if (found == true)
   {
    String name = definitions[index4].name();
    String value = definitions[index4].value();

    for (unsigned long j = i; j < out.length(); ++j)
    {
     if (name == out[j])
      out[j] = value;
    }   
   }
   out.remove(i);
   out.remove(i - 1);
   //out.remove(i + 1);
  }
 }
}*/

void generate_defines()
{
 for (unsigned long i = 0; i < out.length(); ++i)
  cout << out[i] << endl;
}

astrogeek · 06-13-2016, 02:17 AM

To be fair to you, I have followed some of your previous threads and have never been able to make sense of your PNF project. I have wondered whether you were as serious as you seem to be at times, but you are persistent, so I'll interpret that as the measure of seriousness!

And I see that you are now making effort to base the language on the right foundation and are using Flex and Bison to build your parser. Good form and +1 for progress!

Now, I have looked over your code (above) a little more than superficially, but even confining my focus to the #define handling alone, I cannot say that I followed what you have in mind. Therefore I cannot offer specific code advice.

However, I will try to offer some constructive comments on your overall implementation methods in hopes of helping you avoid painting yourself into a box that will be difficult to escape from.

To say again, if you are writing a language processor/compiler/assembler, Flex and Bison are the right tools for your job!

But I would suggest to you that your tokenizer and parser are both already at a level of complexity that will likely quickly overwhelm you as you try to add more to it. That complexity is really unnecessary from what I see, and reminds me of some things that I wrote when learning parsing!

First, the tokenizer... it is just doing far too much for what you are trying to get out of it, in my opinion. The token handling looks overly complex and the boundary between what the tokenizer should do and what the parser should do is already blurred, which will only get worse as you extend it.

For example, OPLT and OPGT - why set yylval, you are not using it? And why tokenize OPLT and OPGT at all (at least for the present uses)? Why not use "<" and ">" and handle them as terminals in the grammer?

Should #define and #include occur anywhere, or only at the beginning of a line? What is the meaning of yylval for those?

And the quoted string handling is too complex, and really could do without the use of a start state at all. Why the strtol(...) at all inside a quoted string - makes no sense to me? In fact, the token return line for STRING looks very strange to me. So if you are getting syntax errors when a STRING is involved, this would be a good place to look!

And yylval... why not define a union of int and pointer to string at least, and avoid all that string concatenation and passing between the tokenizer and parser?

And pass <EOL> as a token for "\n" instead of adding it to the strings - your grammer rules will benefit greatly from that!

I would suggest that a little more effort putting some tight definitions to the meaning and handling of your tokens will save many hours and days of headache immediately and down the road! Simplify it, and keep it to minimum complexity as you add new handling. Remeber what a tokenizer does - it recognizes patterns in the input stream and returns an integer token, and possibly a value for each, to the parser. Make it that simple with good pattern definitions and typed values.

Next, the parser...

Many of the same comments apply... it is just way too complex for what it is doing in my opinion.

You have not declared any types, probably because you have not declared a union for yylval. So that does not allow you to declare non-terminal types, which I think is the cause of some overworked grammer rules.

And you have no end symbol (<EOL>) to disambiguate the start rule, input. So you put it into the line rule, which makes it a bit awkward. What will happen if you decide to allow multiple ';' separated statements on one line in your language, similar to C? How would you be able to (easily) extend your grammer for that case?

And the right hand side code is just doing way, way too much!

When the tokenizer returns you should have a known token and optionally a typed value.

When a grammer rule matches, you then have one or more tokens and values in a well defined phrase and you already know the syntax is valid - otherwise you wouldn't be there! So you don't need more tests and parsing on the right side! If you do, then redefine the tokens and grammer rules to get rid of it!

So with well defined tokens and types, well structured grammer rules with a unique end symbol (<EOL>) and some relatively simple right hand code - you are done!

I see that you are already telling the parser to expect shift-reduce errors, and you do not have a grammer that should be capable of producing them! So the first thing to do is refine the grammer, and the tokens that feed it - again, simplify.

It would be very helpful to yourself and those who would help if you can try to express your grammer (or the relevant parts) in BNF or EBNF as well. Until you can express it in that notation, you cannot really write your parser because that is what the grammer rules are - so good exercise and focus for you! And with a well expressed grammer, it is easier for others to see what you are doing, rather than reading through code.

So please take this as helpful, if rambling commentary, I hope it is useful to you! Good luck!

des_a · 06-13-2016, 04:58 AM

So what you're really telling me is I need to go ahead and rewrite it. So okay, I'll rewrite it. I cannot find examples of bison and flex syntax C++ preprocessors, but I can find examples of them in BNF (or EBNF was it?). So I'll translate a good one of them to bison and flex, and then start modifying the grammar from there to do what I really want it to do.

If I DID find an existing grammar for a C++ preprocessor in bison and flex, I don't want to use it verbatim, nor steal it's code, just learn from it. The same would be true for BNF or EBNF. I don't want a duplicate of a C++ preprocessor, but it's the closest thing out there to what I do want.

It's so close I could use the cpp program and it would work well enough, which is a C++ preprocessor. But I don't want my language to have to rely on somebody else's already written program, so I'm creating my own. Plus, if I don't use an exact copy of the C++ preprocessor, I will be better off, because it will have better features in it and conform to MY standards, instead of C++'s. But, it is BASED OFF OF the C++ preprocessor (syntactically speaking).

As for using bison and flex, I am. For this project, I have simply used the best language for the job so far for each component. My PNF language is written in C++, because that was the best language for it. I once tried bison and flex in the past, and it seemed to complicate the simple task. Perhaps future versions that have more features will need bison and flex, because they will "become" the best tools for the job.

For PNFASM, it's written in bison and flex. For PNFHA, it's written in bison and flex. This language (PNFPP), is written in bison and flex. PNF2CPP, is written in C++, and is the simplest component of them all. language is written in C++ as well, which ties it all together.

I am simply using the simplest language possible to express things in for the task, as the current definitions require. The only change I might have made, is using my other language (technically needed just a scanner), which is biflex. Wherever bison and flex are used and work together to create one program, biflex can be used instead. It adds an extra compilation step, but that's okay. I am using a local OS script to compile most of those these days anyway.

The only reason I am not using biflex right now, is because I don't want to rely on it yet, since it is itself just a language on my machine. Or do you think that is a dumb reason for not using biflex? I want it out eventually, just as I want this language to be out in the wild, it's just that I need to protect myself and not get it out too soon. I may have lots of code out about it, floating about, but the finaly product turns out to be really different all the time... In any case, biflex would probably be relased in the wild first.

A little about me and why I have not released things yet:

I am on SSI, and the only reason I ever really needed it was because I never properly learned to "take care of myself". I have always worked hard on that, since I knew it was a problem. I have made progress from being someone who is smart, yes, but not about how to actually get a job or anything, to someone who knows the basic idea of how to get jobs, but needs to learn more about home stuff before doing so.

I am going through school, trying to earn an IT Support Specialist Degree, which will help aid in my job search. More is planned education wise too. I didn't originally know how to start college on my own, but now I do. I wanted to go to college right out of high school, but the high school didn't show me what I needed to know to do that.

I am working on goals at home towards "learning to take care of myself", which is mostly but not quite all, getting into my routine, which is hard for me. It is in my spare time right now, that I am creating these things, but not all of my spare time is used this way.

I don't want to release things finished or not, until I have a plan to use them to make money off of them and can safely get off of SSI, because I don't need the help provided by SSI anymore. That day is coming, but is not here yet.

Also, it is a bit of a hassle on $733 to pay $35 for each component to get it protected for me, but I don't see how I can make money with it not being protected either. I have to remain somewhat secretive about things in order to protect them right now. Like I said, I may be getting help here and there on both my network and my programming, but the final result comes from hard work, and my brain too, so it's original.

I am ultimately wanting to "change the world" with all my ideas, going somewhat in the same direction as Mr. Bill Gates sees, but also in some of my own direction too. I'm trying to become a part of things like that. I'm not just trying to do so in the field of computers, but some other areas of life too. Each different area, requires a different skill set.

One of the other areas I'm trying to change things in, is to make sure that people who are smart, like me, but have a disability like me, are NEVER treated the same way again, but I have to succeed against all odds to make that happen.

des_a · 06-13-2016, 05:07 AM

Sorry for being real wordy back. Let's see if I've covered all points, and then since I know my course of action, I will know that even though I did not get working code yet, this is indeed solved by the fact that I know what I need to do to fix it now, which is rewrite it. Just making sure you don't have any more advice. Sometimes things DO just need to be rewritten to fix them. I have rewritten the PNF component a number of times, since I was 15. Yes, some in bison and flex. But I've never gotten this far before, and although new features will eventually mandate another rewrite, I will be fine to release versions of it far before then. In fact, I had some release canidates before, but then the need to add new features made this one the new release canidate. It's so much better than the other, that the other, will never make it out there. Until I get all my components together, that sort of thing might happen.

Like g++ and gcc, this takes multiple complex programs to solve this problem. I am trying not to add new features if I don't have to though, so that I've got my release canidate. Then I can continue developing development versions and have them ready in a jiffy anytime I want them to be ready. One ahead does it, for this program. It's complex enough that that's better.

I could, just have #include and #import working, and then have one release canidate with only those features in it. That way I could continue developing on top of that.

des_a · 06-13-2016, 05:12 AM

P.S. I sort of know how to read BNF and EBNF, but not write it. I only know how to describe things in terms of flex and bison, and biflex. There, I think I've gotten all your main points and digested them so I know what to do, and you know how to help me accomplish the goal.

astrogeek · 06-14-2016, 01:22 AM

Thanks for the complete replies.

Quote:

Originally Posted by des_a

So what you're really telling me is I need to go ahead and rewrite it. So okay, I'll rewrite it.

Not just to rewrite it, but to rethink it at a more fundamental level, then rewrite it.

For myself, even after some very dedicated study and playing with examples, I did not really, intuitively understand how to approach such problems until after I had implemented several of them poorly and had to understand why!

In the end, it really isn't about the code at all - it is about really understanding what has to happen, independent of the implementation code (flex and bison, your application) that is used to make it happen.

When you see this, "A generative grammar is a 4-tuple (Vn,Vt,R,S)", and actually know what it is and why it is important, you will be on your way! (It isn't really difficult, only unfamiliar! But it is a necessary concept.)

As I do not know what resources and knowledge you have, I will offer some specific direction based on my own journey down this path. Please use what of it is most applicable to your own case.

First thing - learn, and really understand the underlying concepts and vocabulary of parsing. There is no better source in my opinion than Dick Grune's Parsing Techniques - A Practical Guide. If you can find a printed edition, new or used, it will be money well spent. The link here will allow you to download his first edition in PDF format - do it now if you do not already have it (look down that page to Availability section)!

Read and try to absorb the Preface and Chapter 1, then pause and consider what you have read.

Next, read Chapter 2, carefully and in stages, working the problems at the end of each subsection.
-- Section 2.1 - Languages as infinite sets, only 8 pages, read it several times and think deeply about it.
-- Section 2.2 - Formal Grammers, only 4 pages, but this is the key to all that follows - get it well!
-- Section 2.3 - The Chomsky hierarchy of grammers and languages, 10 pages, get context-free grammers!
-- Section 2.4 - VW Grammers, only 5 pages, you will find BNF notation introduced here, but not completely.
-- Section 2.5 - Actually generating sentences from a grammer, 3 pages, one sentence at a time...
-- Section 2.6 - 2.10 - Putting the concepts all together, parse trees, production graphs, the big ideas

At this point you think you know it! But you don't! But you do have the core concepts laid out so return to them until it all feels natural to you - that day will come!

Then Chapter 3, which is really more about the internal workings of the parser, not the language. You should work through it, but it is more important to really absorb everything that has gone before first. Once you really know the concepts from Chapters 1 and 2, you will see Chapters 3 and 7 as how Bison in particular works internally to accomplish its ends.

If you can really understand the first 3 chapters of this book - about 79 pages total, all else will follow easily! You should follow at least with chaapter 7 as it is the basis of Bison, but get the first 3 as solidly as possible! I can't stress that enough!

And then supplement that with some additional BNF study online. The ideas are introduced in the book, but it is not further developed. But, with the ideas from the book you will finally be able to see why BNF exists, and how to use it - the variations in its exact syntax will become much less important!

Now, after all of that, which really won't take so long on the first pass, we get you to the meat of the matter...

You are writing parsers, compilers and assemblers for your own language. Parsing, and compiler development is said to be one of the best understood branches of computer science - so let's get you the absolute best resource available for that, what is called the "red tree" book, also by Dick Grune, Modern Compiler Design.

I have both of these in print but am not aware of a PDF version for Modern Compiler Design. I have linked to the Better World Books page and suggest that you check it every few days. The prices vary with the seller and I have seen this one jump between $20 and $80 recently. When you see a price you like, buy it! You will not be sorry!

Now, armed with some very well thought out concepts and methods developed by some very smart people in their own right, you will be ready to work your way through your own project with more efficiency and confidence!

I am out of time and have only responded to your first statement, perhaps more will follow later. Good luck!