LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Blogs > astrogeek
User Name
Password

Notices


Rate this Entry

Alphanumeric Calculator: GNU Bison Specification

Posted 10-01-2021 at 07:34 PM by astrogeek
Updated 10-07-2021 at 01:34 PM by astrogeek (Rev. 3 source / typo)

This post is Part 4 of the Alphanumeric Calculator program group.

Note: This is Rev. 3 Bison version and requires Rev. 3 Flex version.

Copy the code below into a file named alphacalc.y in your local build directory for this project, the same directory where Makefile and alphacalc.l are located.

I wrote this version using GNU Bison 3.7.6 and gcc 10.3 on a Slackware -current platform, but it should work using any not-too-old Bison with one small change for Bison < 3.5.90 related to use of the special token YYerror noted in the comments. I have successfully built with Bison 3.0.4 and gcc 5.5.0 and the noted change.

A brief description of the file contents follows, but if you are not familiar with GNU Bison, context-free grammars, syntactic or semantic analysis in general, or any of the many related subjects, I encourage questions and further discussion, here or in the LQ Programming forum! It is a topic rich in opportunities for discovery!

The file structure is very similar to that of the Flex specification, consisting of three sections separated by lines which contain the characters '%%' and nothing else.

The first section is the definitions section which may contain literal code blocks, C declarations, %union, %token, %type and other definitions. See info bison for all the details!

The second section is the rules section which consists of rules of a context-free grammar in a form of BNF notation from which Bison generates C source for a parser, an engine which can recognize valid, and invalid sentences of the language described by the grammar in a stream of TOKENs obtained from the lexical analyzer or lexer, a process called syntactic analysis. Each rule may also be followed by C code within curly braces, {...}, called action code, which is executed when the rule forms part of a valid sentence from the language, the basis of semantic analysis.

The third section is the user subroutines section and consists of C code which is copied verbatim into the resulting alphacalc.tab.c source file. This project is small enough that I have included the complete application code within this section of the Bison specification file.

Code:
/* Bison specification and source for alphanumeric calculator
   Rev. 3 - 10/6/21 Added SGN, numbex, yyerrok in recovery rule, headings, multi-line expressions
   (Works with Rev. 3 Flex specification)

   2021 Robert Allen, astrogeek:linuxquestions.org
   License: Love others as yourself, do unto others as you would have them do to you,
   WITHOUT REGARD TO ANY OTHER standard, law, order, license, command or authority.
   Period.

   Build with supplied Makefile
      make DEBUG=1 - Builds with runtime debug trace options
      make         - Builds without debug options

   run with -h option for usage information.

   'info bison' is the source manual.
*/
%code {

#include <stdlib.h>
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>

/* Always define YYDEBUG, Makefile sets -DYYDEBUG=1 when invoked with DEBUG=1 option */
#ifndef YYDEBUG
#define YYDEBUG 0
#endif

/* Range and limits of application
   Limiting factors are:
      * LLONG_MAX
      * Lexer pattern rules and multiplier TOKENs
      * Output translation names and scale (determined by EXP_VALUE_MAX and mnames[] below)
      * Grammar
   EXP_VALUE_MAX _must be_ 10^3 times max named multiplier in mnames[] and multiple of 10^3
   mnames[] is array of string pointers to supported multiplier names, greatest to least
   You may extend range to a limit of LLONG_MAX:
      * Multiply EXP_VALUE_MAX in increments of 10^3
      * Add multiplier names to mnames[]
      * Add M* TOKENs below, add pattern for names to Flex specification (where * is power of ten, M3, M6, M9, etc.)
      * Add/adjust grammar rules below with appropriate lm* and rm* non-terminals, etc.
      * Hint: Scale existing range down by 10^3 first, see how it all works - then scale up with new rules...
   */
#define EXP_VALUE_MAX 1000000000000LL
const char *mnames[]={"billion","million","thousand","oops!"};

/* From info bison
The Bison parser expects to report the error by calling an error
reporting function named ‘yyerror’, which you must supply.  It is called
by ‘yyparse’ whenever a syntax error is found, and it receives one
argument.  For a syntax error, the string is normally ‘"syntax error"’.

The forward declarations for ‘yylex’ and ‘yyerror’ are needed because
the C language requires that functions be declared before they are used.
These functions will be defined in the epilogue, but the parser calls
them so they must be declared in the prologue.
*/
extern int yylex();
extern int yy_flex_debug;
void yyerror(char *s, ...);
int yydebug;


/* Declarations needed by grammar/semantic actions */
int num2str (long long num, int nl);
void dohundreds (int num, int trail, int tail);
void hlpsyntax();
void usage();

int outformat=0;
int andall=0;
int trailall=0;
int interact=0;
int head=0;
char *filename;
char *prgnam;
}
   /* info bison: 5.8.1 LR Table Construction */
%define lr.type ielr
%locations

%union {
   int i;
   long long l;
}

%code requires {
/* Following adapted from info bison:3.5.3 Default Action for Locations, and
   Flex & Bison, O'Reilly 2009, John Levine, pg 202
   Adds filename support to locations struct
*/

typedef struct YYLTYPE {
  int first_line;
  int first_column;
  int last_line;
  int last_column;
  char *filename;
} YYLTYPE;
# define YYLTYPE_IS_DECLARED 1

# define YYLLOC_DEFAULT(Current, Rhs, N)                                \
    do                                                                  \
      if (N)                                                            \
        {                                                               \
          (Current).first_line   = YYRHSLOC (Rhs, 1).first_line;        \
          (Current).first_column = YYRHSLOC (Rhs, 1).first_column;      \
          (Current).last_line    = YYRHSLOC (Rhs, N).last_line;         \
          (Current).last_column  = YYRHSLOC (Rhs, N).last_column;       \
          (Current).filename     = YYRHSLOC (Rhs, 1).filename;          \
        }                                                               \
      else                                                              \
        { /* empty RHS */                                               \
          (Current).first_line   = (Current).last_line   =              \
            YYRHSLOC (Rhs, 0).last_line;                                \
          (Current).first_column = (Current).last_column =              \
            YYRHSLOC (Rhs, 0).last_column;                              \
          (Current).filename  = NULL;                                   \
        }                                                               \
    while (0)

/* Semantic action error target with addressable locations, tags, must be in header for lexer */
void lyyerror(YYLTYPE t, const char *w, char *s, ...);
#define LYY_ERROR_TAG "error"
#define LYY_WARN_TAG "warning"
#define LYY_FAIL_TAG "failure"
#define LYY_RECOVER_TAG "recovery"

}

/* TOKEN, type and precedence definitions */
%token <l> M2 M3 M6 M9 NUMB ZERO ONES TEN TEENS TENS
%token <i> FMT HLP SGN DBG
%token AND COM
%token QUIT END NUMERR

/* We use the special token YYerror to allow the lexer to trigger parser error recovery for undefined
input. This token was introduced to bison April 26 2020 about version 3.5.90 (uncertain version number)
If compiling with older version you may see an error like this:

   alphacalc.l:134:178: error 'YYerror' undeclared (first use in this function)

If so simply uncomment the following token and it should work as expected except that you will see
a second 'syntax error' message displayed after the lexer generated 'Undefined input' error.
Parser will enter error recovery mode and resync normally.

%token YYerror
*/

%left '+' '-'
%left '*' '/'

%type <l> d1 d2 l2 lm2 lm3 lm6 lm9 rm2 rm3 rm6 numbex number exp term factor

/* Debug trace type formatting */
%printer { fprintf (yyo, "%lld", $$); } <l>
%printer { fprintf (yyo, "%d", $$); } <>

%start stmt

%%

stmt:
   | stmt result { if( interact ){ putchar('>'); } }
   ;

result : exp END { if(outformat==0){ num2str($1,1); }
                  else if(outformat==1){ printf("%d\n", $1); }
                  else if(num2str($1,0)){ printf(" (%lld)\n", $1); }
                  trailall=0;
                  }
 | DBG END { if(YYDEBUG){ if($1){ yydebug = !yydebug; }else{ yy_flex_debug = !yy_flex_debug; }}
            else{lyyerror(@1, LYY_WARN_TAG, "Not built with debug options! Build with DEBUG=1 to enable!");} }
 | QUIT END { YYACCEPT; }
 | HLP END { if($1){ usage(); }else{ hlpsyntax(); }}
 | FMT END { outformat = $1; }
 | AND END { andall=!andall; }
 | COM END { head=!head; }
 | END
 | error END { yyerrok; }
 ;
   /* All numeric values enter as valid long long: +/-9,223,372,036,854,775,807 (~9.22e18)
         with the lexer generating an error for any overflow values entered
      All english descriptions of numeric values become long long's but limited by grammar
      All expression resultant values ultimately limited by the range of the output translation
         function which has a maximum limit of EXP_VALUE_MAX - 1, "billion" being the
         maximum named power of ten in this implementation.
      We first limit all long long values to this max value, +/- EXP_VALUE_MAX - 1 at the point
         they pass from the input parse rules into the expression calculator rules: factor non-terminal,
         and enforce limit for all operation results.
      Only multiplication can overflow, addition and subtraction can only produce out of range results
      (You may extend this to allow out of range intermediate results by adding overflow checks to addition/subtraction)
   */
exp : term
 | exp '+' term { $$ = $1+$3; if(llabs($$)>=EXP_VALUE_MAX){ lyyerror(@2, LYY_ERROR_TAG, "Sum out of range!"); YYERROR; } }
 | exp '-' term { $$ = $1-$3; if(llabs($$)>=EXP_VALUE_MAX){ lyyerror(@2, LYY_ERROR_TAG, "Difference out of range!"); YYERROR; } }
 ;

term : factor
 | term '*' factor { if(($1!=0) && (LLONG_MAX / llabs($1))<llabs($3)){lyyerror(@2, LYY_ERROR_TAG, "Multiplication overflow!"); YYERROR; }
                     else{$$ = $1*$3; if(llabs($$)>=EXP_VALUE_MAX){lyyerror(@2, LYY_ERROR_TAG, "Product out of range!"); YYERROR; }} }
 | term '/' factor { if($3==0){ lyyerror(@3, LYY_ERROR_TAG, "Division by zero!"); YYERROR; }
                     else{$$ = $1/$3;} }
 ;

factor: number { if(llabs($1)>=EXP_VALUE_MAX){lyyerror(@1, LYY_ERROR_TAG, "Value out of range!"); YYERROR;  }else{$$ = $1;} }
 | '(' exp ')' { $$ = $2; }
 ;

   /* Restricts numex to single sign same behavior as strtoll() for symbolic numbers
      and prohibits mixing of word/symbol signs and values
   */

number: NUMB
      | NUMERR { lyyerror(@1, LYY_ERROR_TAG, "Invalid numeric input!"); YYERROR;  }
      | numbex
      | SGN numbex { $$ = $1<0 ? -1*$2 : $2; }
      ;

   /* numbex defines possible left-right orderings by expression magnitude */
numbex: ZERO
 | d2
 | lm2
 | lm3
 | lm6
 | lm9
 ;

   /* Left-most billions */
lm9: d2 M9 { $$ = $1*$2; }
   | d2 M9 rm6 { $$ = ($1*$2)+$3; }
   | lm2 M9 { $$ = $1*$2; }
   | lm2 M9 rm6 { $$ = ($1*$2)+$3; }
   ;

   /* Left-most millions */
lm6: d2 M6 { $$ = $1*$2; }
   | d2 M6 rm3 { $$ = ($1*$2)+$3; }
   | lm2 M6 { $$ = $1*$2; }
   | lm2 M6 rm3 { $$ = ($1*$2)+$3; }
   ;
   /* Right millions */
rm6: rm2 M6 { $$ = $1*$2; }
   | rm2 M6 rm3 { $$ = ($1*$2)+$3; }
   | rm3
   ;

   /* Left-most thousands */
lm3: d2 M3 { $$ = $1*$2; }
   | d2 M3 rm2 { $$ = ($1*$2)+$3; }
   | lm2 M3 { $$ = $1*$2; }
   | lm2 M3 rm2 { $$ = ($1*$2)+$3; }
   ;
   /* Right thousands */
rm3: rm2 M3 { $$ = $1*$2; }
   | rm2 M3 rm2 { $$ = ($1*$2)+$3; }
   | rm2
   ;

   /* Left-most hundreds */
lm2: l2 M2 { $$ = $1*$2; }
   | l2 M2 d2 { $$ = ($1*$2)+$3; }
   | l2 M2 optand d2 { $$ = ($1*$2)+$4; }
   ;
   /* Right hundreds */
rm2: d1 M2 { $$ = $1*$2; }
   | d1 M2 d2 { $$ = ($1*$2)+$3; }
   | d1 M2 optand d2 { $$ = ($1*$2)+$4; }
   | optand d2 { $$=$2; }
   | d2
   ;

optand: AND { trailall=1; }

   /* Any */
d2 : l2
 | TEN
 | TENS
 ;

 /* Restricted left-most hundred multipliers  */
l2 : d1
 | TEENS
 | TENS d1 { $$ = $1+$2; }
 ;

d1: ONES

%%

/* yyerror() is req'd and is called for all parser errors,
   may also be called by semantic actions, uses default location,
   this version allows variable argument list for semantic action calls
   */
void yyerror(char *s, ...)
{
   va_list ap;
   va_start(ap, s);

   if(yylloc.first_line)
      fprintf(stderr, "%s:%d.%d-%d.%d: error: ", yylloc.filename, yylloc.first_line, yylloc.first_column,
            yylloc.last_line, yylloc.last_column);
   vfprintf(stderr, s, ap);
   va_end(ap);
   fprintf(stderr, "\n");
}

/* lyyerror() provides for messages generated from semantic actions
   with location addressable via @N symbol references.
   Adapted from Flex & Bison, O'Reilly 2009, John Levine, pg 200 */
void lyyerror(YYLTYPE t, const char *w, char *s, ...)
{
   va_list ap;
   va_start(ap, s);

   if(t.first_line)
      fprintf(stderr, "%s:%d.%d-%d.%d: %s: ", t.filename, t.first_line, t.first_column,
            t.last_line, t.last_column, w);
   vfprintf(stderr, s, ap);
   va_end(ap);
   fprintf(stderr, "\n");
}

void usage()
{
#if YYDEBUG == 1
   printf("Usage: %s [-a -c -d -h -t] [filename]\n"
   "\tIf no filename given reads input from stdin\n", prgnam);
   printf("\t-a\tEnable trailing AND output syntax (also enabled by use in input stream)\n"
   "\t-c\tEnable ^#...$ comments as headings in output stream\n"
   "\t-d\tEnable parser debug trace\n"
   "\t-t\tEnable lexer debug trace\n"
   "\t-h\tPrint this help message and exit\n"
   "\tRUNTIME OPTIONS (must be only text on line)\n"
   "\tdebug\tToggle parser debug trace state\n"
   "\ttrace\tToggle lexer debug trace state\n"
   "\talpha\tProduce alpha output\n"
   "\tand\tToggle global trailing AND state\n"
   "\tboth\tProduce alpha and numeric output\n"
   "\tnumeric\tProduce numeric output\n"
   "\thead | heading\tToggle headings ( ^#...$ comments to output stream)\n"
   "\thelp | usage\tShow additional runtime and syntax help messages\n"
   "\tquit\tExit program\n");
#else
   printf("Usage: %s [-a -c -h] [filename]\n\tIf no filename given reads input from stdin\n", prgnam);
   printf("\t-a\tEnable trailing AND output syntax (also enabled by use in input stream)\n"
   "\t-c\tEnable ^#...$ comments as headings in output stream\n"
   "\t-h\tPrint this help message and exit\n"
   "\tRUNTIME OPTIONS (must be only text on line)\n"
   "\talpha\tProduce alpha output\n"
   "\tand\tToggle global trailing AND state\n"
   "\tboth\tProduce alpha and numeric output\n"
   "\tnumeric\tProduce numeric output\n"
   "\thead | heading\tToggle headings ( ^#...$ comments to output stream)\n"
   "\thelp | usage\tShow additional runtime and syntax help messages\n"
   "\tquit\tExit program\n"
   "\t(To enable debug support build with DEBUG=1)\n");
#endif
}

void hlpsyntax()
{
   printf("Input Expression Syntax\n"
   "\tInteger expressions in the range +/-%lld may be entered as words or numbers\n", EXP_VALUE_MAX -1);
   printf(
   "\tNumber names: positive, negative, zero...nineteen, twenty...ninety, hundred, thousand, million, billion\n"
   "\t\t\tmay be combined into any valid numeric expression as commonly spoken\n"
   "\t\tInteger values including sign must be atomic: expressed as all english words or as number,\n"
   "\t\t\tmay be otherwise mixed in expressions\n");
   printf("\t\tEx: one hundred [and] eleven thousand three hundred [and] twenty two (111322)\n"
   "\t\tEx: three million [and] three over three (1000001)\n"
   "\t\tEx: one hundred seven plus 3, or 107 + three, but not one hundred 7 plus 3\n");
   printf("\tArithmetic Operations and Sign\n\t\tplus | + addition ('+' with no trailing space is sign)\n"
   "\t\tminus | - subtraction ('-' with no trailing space is sign)\n"
   "\t\ttimes | * multiplication\n"
   "\t\tdivided by | over | / division\n"
   "\t\t( expression ) nested parentheses follow normal precedence rules\n");
   printf("\tRuntime Options\n"
   "\t\talpha | numeric | both : Set output mode\n"
   "\t\tand : Toggles default trailing 'and' in alpha output (always follows input)\n");
   printf("\t\thead | heading\tToggle heading state (makes ^#...$ comments visible)\n");
#if YYDEBUG == 1
   printf("\t\tdebug : Toggle parser debug trace mode\n\t\ttrace : Toggle lexer debug trace mode\n");
#endif
   printf("\t\thelp : Show this help message\n"
   "\t\tusage : Show invocation help message (same as -h)\n"
   "\t\tquit : Exit program\n");
}

int main(int argc, char *argv[])
{
   /* Accepts single filename as arg or reads from stdin
      If args are given, first non-'-' arg is filename, others ignored
      -h : Print brief help message and exit
      If built with DEBUG=1 enable parser debug output with -d,
      enable lexer trace output with -t, both with -d -t
      */
   prgnam=argv[0];
   yy_flex_debug = 0;
   extern FILE *yyin;
   for(int i=1; i<argc; i++)
   {
      if(argv[i][0] != '-')
      {
         if(filename)
         {
            fprintf(stderr, "warning: ignoring trailing characters from: %s\n", argv[i]);
            break;
         }
         else
            filename = strdup(argv[i]);
      }
      else{
         if(!strcmp(argv[i], "-h"))
         {
            usage();
            return EXIT_SUCCESS;
         }
         else if(!strcmp(argv[i], "-a"))
            andall=1;
         else if(!strcmp(argv[i], "-c"))
            head=1;
#if YYDEBUG == 1
         else if(!strcmp(argv[i], "-d"))
            yydebug=1;
         else if(!strcmp(argv[i], "-t"))
            yy_flex_debug=1;
#endif
         else
         {
            fprintf(stderr, "error: unknown option: %s\n", argv[i]);
            usage();
            return EXIT_FAILURE;
         }
      }
   }

   if(filename)
   {
      if((yyin = fopen(filename, "r")) == NULL)
      {
         perror(filename);
         return EXIT_FAILURE;
      }
      if(yyparse())
         return EXIT_FAILURE;
      fclose(yyin);
   }
   else
   {
      interact=1;
      filename = "(stdin)";
      printf(">Type help for runtime reference info\n>");
      if(yyparse())
         return EXIT_FAILURE;
      printf("Bye!\n");
   }
   return EXIT_SUCCESS;
}

int num2str(long long num, int nl){
   long long maxnum=EXP_VALUE_MAX/1000;
   int indx=0;
   int trail=0;
   /* We have to limit what we accept! EXP_VALUE_MAX-1 will allow +-999*maxnum
      while blocking next third power of ten */
   if(llabs(num)>=EXP_VALUE_MAX){ fprintf(stderr, "Result out of range\n"); return 0;}
   if(num==0){ printf("zero%s", nl ? "\n" : "" ); return 1; }
   if(num<0){
      num*=-1;
      printf("negative ");
   }
   while(maxnum>=1e3){
      if(num>=maxnum){
         //dohundreds((num-(num%maxnum))/maxnum, trail++, 1); //Original behavior, less restrictive
         dohundreds((num-(num%maxnum))/maxnum, trail++, 0); //Rev. 1 behavior more restrictive
         printf("%s%s", trail ? " " : "" , mnames[indx] );
         num=(num%maxnum);
      }
      maxnum/=1e3;
      indx++;
   }
   if(num){
      dohundreds((int)num, trail, 1);
   }
   if(nl)
      putchar('\n');
   return 1;
}

void dohundreds(int num, int trail, int tail){
   static const char *ones[] = {"","one","two","three","four","five","six","seven","eight","nine","ten",
      "eleven","twelve","thirteen","fourteen","fifteen","sixteen","seventeen","eighteen","nineteen"};
   static const char *tens[] = {"","","twenty","thirty","forty","fifty","sixty","seventy","eighty","ninety"};
   int d2 = (num - (num%100))/100;
   int d1 = ((num%100) - (num%10))/10;
   int d0 = num%10;
   if(d2>0){
      printf("%s%s hundred", trail++ ? " " : "" , ones[d2]);
      tail++;
   }
   /* non-empty tail && something to the left && (tail or hund) && enabled */
   if((d1|d0) && trail && tail && (andall|trailall)){
      printf(" and");
   }
   if(d1>=2){
      printf("%s%s", trail++ ? " " : "" , tens[d1]);
   }
   else if(d1==1){
      printf("%s%s", trail++ ? " " : "" , ones[10+d0]);
   }
   if(d1!=1 && d0>0){
      printf("%s%s", trail ? " " : "" , ones[d0]);
   }
}
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 02:19 AM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration