ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
extern int yyerror(const char *msg){
fprintf(stderr,"%d: %s at '%s'\n",yylineno,msg,yytext);
}
I run the program and when I type:
Quote:
.lm 7
or
this is a test
.lm 7 is a command in my language to set left margin to 7 spaces.
"this is a test" is a line that should print out the WORDs <this is a test> on separate lines
I am now using flex and bison
I get nothing out
1) I put yytext in yacc and it does not find it until I add extern
2) yytext is an int not a string?
3) $3 is an int when I think it is a string.
4) how do I access the lex string (which I think is yytext) in yacc?
I will not have a chance to look more closely until this evening, but this does not look right...
Code:
" "+.+\n? {return *yytext;}
Two things:
1. You should return only an integer TOKEN value, not the matched text...
2. This is not a valid statement to return the matched text anyway...
yytext is a char * so returning its dereferenced value makes no sense. But the thing pointed to by yytext, the matched text, is not necessarily valid after the return so when returning its value you should only return a pointer to a copy of the text. And YACC/Bison only looks for that value in a YYSTYPE object named yylval (int by default, see %defines or %union directives):
Code:
yylval=strdup(yytext); return TOKEN;
That is just what I see on a quick look so there may be other difficulties as well.
It would be helpful if you would also provide the commands you are actually using to build with so we can reproduce what you are seeing. I'll have a closer look later today.
**UPDATE**
Sorry I have not had opportunity to return to this but had another quick look and offer the following...
Code:
command : PA
...
| LM BLANKS NUM '\n' {lm=$2;fprintf(stdout,"here:%d",lm);}
| RM BLANKS NUM '\n' {rm=$2;}
Here and other places you reference semantic values ($2) which you have not passed from the lexer - needs to be fixed.
Code:
words : WORD {printf ("%s\n",yytext);}
| words BLANKS WORD {printf ("%s\n",yytext);}
Per my previous notes, yytext is not valid after the return from yylex() and should not be used.
The token WORD is not returned by any lexer rule so these grammar rules will never be used.
The definition of yyerror() does not look to be correct, no return value for one.
A suggestion: Write a stand alone lexer with a main(){...} function that will show you what it is doing so that you can get those parts right independent of the parser first, then work on the grammar. Perhaps something like:
Code:
int main(){
int tok;
while(tok=yylex()){
switch(tok){
... suitable messages here ...
}
}
return 0;
}
You will also need to define an enum for the tokens as they are normally in the header created by bison and will not be available to the standalone lexer (hint: just copy it from the existing bison generated source, very easy). Then generate the lexer with the -d option to build with debug trace - very useful!
I updated the lex and yacc files in my previous note with the proper and current text. I also added comments to that note to explain what I am doing and what I am confused about. Do I need %union ? Union to me for c is various variables occupying the same storage space.
I found out that $n type is different depending on where it is in the parsing stack sometimes a int some times a string depending on I guess what yylval was assigned and where we are on the stack
---I do not know if the above is true. I do not understand %union which seems to name TYPEs intValue, stringValue rather than assign common storage like c.
schmitta@schmitta-ThinkPad-T500:~/Dropbox/PRODUCTS/APS_PRODUCTS/OUTLINE/OUTLINELY$ ./makely.sh test01
lex.yy.c:662:12: warning: prototype for ‘yywrap’ follows non-prototype definition
662 | extern int yywrap ( void );
| ^~~~~~
y.tab.c: In function ‘yyparse’:
y.tab.c:1306:16: warning: implicit declaration of function ‘yylex’ [-Wimplicit-function-declaration]
1306 | yychar = yylex ();
| ^~~~~
/usr/bin/ld: y.tab.o: in function `main':
y.tab.c.text+0x9dd): multiple definition of `main'; /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libl.a(libmain.o).text.startup+0x0): first defined here
/usr/bin/ld: lex.yy.o: in function `yywrap':
lex.yy.c.text+0x0): multiple definition of `yywrap'; y.tab.o:y.tab.c.text+0x0): first defined here
collect2: error: ld returned 1 exit status
chmod: cannot access 'test01': No such file or directory
This is what I have now. it compiles but does not recognize .lm 8 (set left margin to 8) but gets trapped at .l 33 which has a . and it does not get trapped at .l line 20 where it is suppose to get trapped.
Good to see you using %union to set up your semantic value types - always a good idea.
You may also want to declare the types for those tokens and non-terminals which have values and avoid all those bracketed type references. For example, define the types for NUM and WORD in the %token declaration like this...
Code:
%token <intValue> NUM
%token <stringValue> WORD
...then when you reference them in action code...
Code:
words : WORD {printf ("%s\n",$<stringValue>1);}
...may be simply...
words : WORD {printf ("%s\n",$1);}
| RM BLANKS NUM '\n' {rm=$<intValue>3;}
...may be simply...
| RM BLANKS NUM '\n' {rm=$3;}
For any non-terminals which have a type use the %type declaration to set them up.
Quote:
Originally Posted by schmitta
...it compiles but does not recognize .lm 8 (set left margin to 8) but gets trapped at .l 33 which has a . and it does not get trapped at .l line 20 where it is suppose to get trapped.
Your regular expressions are responsible for that. When Flex matches more than one pattern it will select the longest match, or the one which occurrs first in the specification if they are the same length. So this rule at line 33...
... is going to match just about anything that you send as input. That rule says "match literally anything from the start of a non-empty line up to and including a newline", which is going to override any of your dot-rules whether or not they are followed by a number because those characters plus the newline will be a longer match.
In fact, there are probably other problems with those regular expressions so you need to look carefully at them and be sure you know what they are actually going to match. You are building with Flex debug trace enabled so just look at what that is telling you for each case of test input. Additionally, as mentioned in a previous post, you may want to set up a stand-alone lexer so that you can test those rules independent of the parser and have certainty about what they are producing.
One final comment - I see that you changed the declaration of yyerror() in the Bison file, but it does not match what is in the Flex file. It will probably be more convenient if you move the definition into the Bison file as well (unless you have some reason for including it in the Flex spec).
Last edited by astrogeek; 10-23-2021 at 11:55 PM.
Reason: ptosy
I need a way of accepting any input for a word so I used
Code:
^.+\n?
at the end of all other rules but this does not capture numbers as they are captured several lines before
It will still override the number match rule too because it matches any number of characters followed by the newline. That rule is going to be very problematic for you and you should probably rethink it.
**UPDATED**
To be very clear, that rule (in red above) will match all of these lines as they appear in input, instead of their intended rules...
Code:
.l Some text
.t Some text
.lm 8
.rm 20
.pnon
.pnoff
1234
Anything else you put here...
thank you ASTROGEEK for your gracious help. I am basically coming up from knowing nothing buy using the internet to educate myself in this.
You are very welcome!
I know why I am here too, best expressed by someone named schmitta:
Quote:
Originally Posted by schmitta
I do this stuff, as all of us do this stuff, because it is fun.
Agreed! And I have found the whole progression of ideas behind parsing and compilers (and there are a lot of them!) more interesting, and more fun than many other problems encountered in computing! It is a treat to encounter others exploring them too!
To astrogeek - 400 years before the birth of Christ Isaiah tells of his coming in chapter 53. Tells of his virgin birth; that he would be wounded for our transgressions and that by his strips we would be healed. He tells of Christ's death and resurrection and that belief in him is the only way to heaven. Only stupid people go to hell because hell is a lake of liquid fire that an angle of God throws the sinner in . When he hits he screams bloody murder and the pain is unreal and forever. I don't want to see anyone go there. Heaven is where you get what you were always looking for even if you did not know that on earth. Please confess with your mouth to someone that Jesus is God and believe in your heart that God the father raised Jesus the son from the dead and you shall be saved from an eternity of misrely. Thank you - I just felt the need to share that.
Thank you for sharing that which is most important to you!
LQ, like the Free software movement and the culture we share, were all founded on the idea of sharing, helping others, doing to others as we would want for ourselves. However we may express it, that is really why we are here, isn't it!
Your comments are received and appreciated in the spirit in which they were given, thank you for sharing them!
But this is a technical forum, so let's continue our explorations of the topic at hand in that same spirit of sharing, for the benefit of us all.
"My" lexer is still just your lexer code. All I did was copied in the union and tokentype enum, and added a simple main(){} function to call yylex just as the Bison code would do. The main point of doing that is to separate the lexer from the parser so that you can interact with the lexer independent of the parser (and grammar) and gain a better degree of certainty over how your lexer rules actually work. I have found it to be a useful exercise for most of my own projects, and it is easy to do.
The only "specification" I have for your project is what you have described in your posts and what I imagine you intend from looking at your rules - which is incomplete at best.
I would suggest that you write a simple description in plain english of how each part of input is supposed to be handled. For example, try to write a concise one line description of each of the dot-rules, specifying how it must appear in the input stream (i.e., at start of the document or embedded, must be at start of line or may be inline with other text, followed by number or not, one per line, etc.), and what effect it has on the output stream.
Then do the same for the text you are trying to process. Should whitespace be preserved? Does it recognize paragraph breaks? Page breaks? Is all text just words and whitespace of do you need to recognize any special keywords or symbols? Etc...
Try to then put together a simplest test case, or a few of them, which you can then feed into your standalone lexer or combined parser application to work out the necessary rules... it is really only at this point that you are in position to work those descriptions and examples into a proper grammar.
Because you are trying to process the inut text as blocks of text, as opposed to a small set of keywords or other symbols, you will probably find it helpful to make use of different start states in the lexer. That will allow you to separate out the control commands from random text without getting things crossed up as your current rules are doing. If you are not familiar with Flex start states I'll be happy to suggest an example based on any test case you care to post.
Also, as you are new to this, I suggest you try to find a copy of Parsing Techniques: A Practical Guide by Dick Grune and Ceriel J. H. Jacobs. For several years Grune offered free download of an earlier edition online, although that was gone last time I looked. You can probably find a used copy of the original edition online for $5-$10 and it will repay you many times the cost! If you can clearly understand the ideas presented in just the first three chapters your world will be changed - at least with regard to the basic ideas of parsing!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.