LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-28-2012, 05:28 AM   #1
heth
LQ Newbie
 
Registered: Jun 2012
Posts: 3

Rep: Reputation: Disabled
How to handle extremely long string in flex and bison


Hi,

I am writing a parser using flex and bison. Some input files having extremely long token and causing segmentation fault while parsing.

I am setting the YYLMAX value to a very large number and allocate the maximum size for the string as the YYLMAX value as well. I think the value that I set has exceed the buffer limit for lexer thus it does not really solve the issue.

Any idea on how to handle extremely long token in flex and bison?
 
Old 06-28-2012, 05:44 AM   #2
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Try to include the `%pointer' directive in the first section of flex input. In this case yytext will grow automatically.

From `info flex':
Quote:

Note that `yytext' can be defined in two different ways: either as a
character _pointer_ or as a character _array_. You can control which
definition `flex' uses by including one of the special directives
`%pointer' or `%array' in the first (definitions) section of your flex
input. The default is `%pointer', unless you use the `-l' lex
compatibility option, in which case `yytext' will be an array. The
advantage of using `%pointer' is substantially faster scanning and no
buffer overflow when matching very large tokens (unless you run out of
dynamic memory). The disadvantage is that you are restricted in how
your actions can modify `yytext' (*note Actions::), and calls to the
`unput()' function destroys the present contents of `yytext', which can
be a considerable porting headache when moving between different `lex'
versions.

The advantage of `%array' is that you can then modify `yytext' to
your heart's content, and calls to `unput()' do not destroy `yytext'
(*note Actions::). Furthermore, existing `lex' programs sometimes
access `yytext' externally using declarations of the form:

extern char yytext[];

This definition is erroneous when used with `%pointer', but correct
for `%array'.
 
Old 06-28-2012, 11:04 AM   #3
heth
LQ Newbie
 
Registered: Jun 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Hi firstfire,

Thanks for the suggestion. Do you have any example of how the %pointer is used?
 
Old 06-28-2012, 01:13 PM   #4
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

I just realized that %pointer is the default behavior in flex. So you either run out of memory or use lex instead of flex, or there are bugs in the code. Can you provide more info? Example is trivial:
Code:
%pointer
%%
[[:alnum:]]+	printf("[%s] : %d bytes\n", yytext, strlen(yytext));
%%
Even using %array you get the following error message if token length exceeds YYLMAX:
Quote:
token too large, exceeds YYLMAX
instead of Segmentation Fault. You may have memory leaks. Try `valgrind ./a.out' to check it out. GDB is another useful tool in this situation.
 
Old 07-02-2012, 12:02 PM   #5
heth
LQ Newbie
 
Registered: Jun 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Hi firstfire,

This is the example in my flex:

%{
#undef YYLMAX
#define YYLMAX 40000
%}



id { strcpy(yylval.string, (char*)yytext); return (ID); }
call { strcpy(yylval.string, (char*)yytext); return (CALL); }

digit [0-9]
id [a-zA-Z0-9_\/\-=><.\"]*
%%

call { strcpy(yylval.string, (char*)yytext); return (CALL); }
{digit}+ {
//yylval.integer = atoi((char*)yytext);

strcpy(yylval.string, (char*)yytext);
return(NUMBERS);
}
{id} { strcpy(yylval.string, (char*)yytext); return(ID); }
%%



In my bison file:

%{
extern "C" {
extern char yytext[];
}
%}
%union {
char string[40000];
}
%token <string> ID CALL NUMBERS

%%
file: commands {};
commands: command {}
| commands command {};
command: id {}
| call {};
id: ID
{
sprintf($$, $1);
}
| NUMBERS
{
sprintf($$, $1);
};
call: CALL NUMBERS ',' ID <-Segmentation fault when the file contain long values for call
{

};
 
Old 07-02-2012, 04:06 PM   #6
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Try to replace
Code:
extern char yytext[];
by
Code:
extern char *yytext;
Here is your code a bit modified to be compilable:
lexer.l:
Code:
%{
#undef YYLMAX
#define YYLMAX 40000

#include "parser.h"
%}

digit [0-9]
id [a-zA-Z0-9_\/\-=><.\"]+
%%
call	 { strcpy(yylval.string, (char*)yytext);  return (CALL); }
{digit}+ { strcpy(yylval.string, (char*)yytext); return(NUMBER); }
{id}	{ strcpy(yylval.string, (char*)yytext); return(ID); }
[ \t]+	/* eat up whitespaces */
\n	return '\n';
.	return *yytext;
parser.y:
Code:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern char *yytext;
void yyerror(const char * e)
{
	fprintf(stderr, "Error: %s\n", e);
}
%}
%union {
char string[40000];
}
%token ID CALL NUMBER
%type<string> id command call ID NUMBER

%%
input: /* empty */
     | input line	
;
line: '\n'	{ puts("here"); } 
    | command '\n'	{ printf("command: %s\n", $1); }
;
command: id
       | call {}
;

id: ID		{ sprintf($$, "%s(id)", $1); }
  | NUMBER	{ sprintf($$, "%s(num)", $1); }
;

call: CALL NUMBER ',' ID { sprintf($$, "call(%s, %s)", $2, $4); }
;
%%
int main(void)
{
	return yyparse();
}
Makefile:
Code:
a.out: parser.o lex.yy.o
	$(CC) $^ -o $@ -lfl

lex.yy.c: lexer.l
	flex $<
parser.c: parser.y
	bison --defines=parser.h $< -o $@

%.o: %.c
	$(CC) -c $< -o $@

clean:
	rm -f lex.yy.c parser.c *.o a.out
Sample session:
Code:
$ make
bison --defines=parser.h parser.y -o parser.c
gcc -c parser.c -o parser.o
flex lexer.l
gcc -c lex.yy.c -o lex.yy.o
cc parser.o lex.yy.o -o a.out -lfl
$ ./a.out 
123
command: 123(num)
qwe
command: qwe(id)
call 123, me
command: call(123, me)
call me,123
Error: syntax error
P.S. Please use [CODE]...[/CODE] tags around your code and data to preserve formatting.

Last edited by firstfire; 07-02-2012 at 04:07 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] String buffer with flex and bison King_DuckZ Programming 3 11-30-2011 05:06 AM
[Flex & Bison] How to check which state Flex is in? courteous Programming 0 06-03-2011 11:46 AM
Is there any support for bison-bridge and bison-locations in flex on windows systems? rami alkhateeb Linux - Software 0 12-29-2010 09:10 AM
flex and bison saurav.nith Linux - General 1 04-06-2010 06:38 AM
LXer: Handle Your Errors Using Flex and Bison LXer Syndicated Linux News 0 08-02-2006 09:33 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration