LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-07-2011, 06:16 AM   #16
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454

The deep problem is that for nested strings plain '"' is not enough - one needs opening and closing "quote". This has been implemented in Perl, for example: http://perldoc.perl.org/perlop.html#...like-Operators .
 
Old 05-07-2011, 08:26 AM   #17
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
@Sergei Steshenko

Either you're being very cryptic again or you have no idea what I'm trying to do.

"${foo${bar}}" is a syntax error, it is NOT translated into "foo(bar)". Substitution does not occur inside the substitution operator, which means that "${foo${bar}}" will try to evaluate the expression "foo${bar}", which is invalid.

I think that the way this would work is that if the lexer comes acroos a "$" followed by a "{" inside a double-quoted string, it cuts out everything from the "{" to the matching "}", and creates more instances of the scanner/lexer/parser that will parse it as it if were a separate program in the interpreted language. Since the parser does not know that the program it's parsing is embedded in a string, it doesn't do any ${} substitution. It can, however, contain double quoted strings, and those strings can contain ${} substitutions, and this can recursively go on and on as long as there's room on the stack.
 
Old 05-07-2011, 08:52 AM   #18
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
@Sergei Steshenko

Either you're being very cryptic again or you have no idea what I'm trying to do.

"${foo${bar}}" is a syntax error, it is NOT translated into "foo(bar)". Substitution does not occur inside the substitution operator, which means that "${foo${bar}}" will try to evaluate the expression "foo${bar}", which is invalid.

I think that the way this would work is that if the lexer comes acroos a "$" followed by a "{" inside a double-quoted string, it cuts out everything from the "{" to the matching "}", and creates more instances of the scanner/lexer/parser that will parse it as it if were a separate program in the interpreted language. Since the parser does not know that the program it's parsing is embedded in a string, it doesn't do any ${} substitution. It can, however, contain double quoted strings, and those strings can contain ${} substitutions, and this can recursively go on and on as long as there's room on the stack.
You said (IIRC) that in ${something} the "something" is an expression. Applying the "inner items are dealt with first" principle I've created my 'foo(1)' example.
 
Old 05-07-2011, 09:04 AM   #19
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I though that outer items are dealt with first?

Another problem is how to find the matching "}": they should be ignored inside nesting strings.

For now I did simple variable substitution, I might do expressions later is something is figured out.

Last edited by MTK358; 05-07-2011 at 09:05 AM.
 
Old 05-07-2011, 09:28 AM   #20
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
I though that outer items are dealt with first? ...
You parse from outside to inside, but you evaluate from inside to outside. For example, when in 'kcalc' I enter

Code:
3*(4+5)
, at the moment I enter ')', 'kcalc' shows '9', which is '4 + 5', and when I press <ENTER>, it shows '27'. I.e. evaluation started from inside.
 
Old 05-07-2011, 09:30 AM   #21
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
... I might do expressions later is something is figured out.
You know, all those guys who invented various languages introduced

Code:
eval <string>
for a reason. And I think the reason is not making them and us confused.
 
Old 05-07-2011, 10:01 AM   #22
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Sergei Steshenko View Post
You know, all those guys who invented various languages introduced

Code:
eval <string>
for a reason. And I think the reason is not making them and us confused.
What's sonfusing me isn't the concept of eval, but how to figure out what string to pass to it.
 
Old 05-07-2011, 11:42 AM   #23
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by MTK358 View Post
What's sonfusing me isn't the concept of eval, but how to figure out what string to pass to it.
I think the "founding fathers" were confused too and decided not to complicate their (and our) lives: if one wants more than pure variables substitution, he/she needs to explicitly call 'eval'.
 
Old 05-07-2011, 11:57 AM   #24
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Sergei Steshenko View Post
I think the "founding fathers" were confused too and decided not to complicate their (and our) lives: if one wants more than pure variables substitution, he/she needs to explicitly call 'eval'.
Maybe, but there is a language that does expression substitution exactly the way I described it: Ruby.

Code:
foo = 3
bar = 8
puts("#{foo} + #{bar} = #{foo + bar}")

puts("#{ "#{foo + bar}" + ' here are some curly braces: { }{}}}}{{' }")

# this causes a syntax error (the program only runs with it commented out)
# puts("#{foo#{bar}}")

Last edited by MTK358; 05-07-2011 at 12:02 PM.
 
Old 05-09-2011, 10:22 AM   #25
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I did it!

Code:

Code:
#!/home/michael/Projects/lang/build/src/lang

foo = 'te'
bar = 'st'

"\"$foo\" + \"$bar\" = \"${foo + bar}\"\n":print()

"${ "nested" + "${" embedded expressions"}" }":println()

# this is a syntax error
# "${foo${bar}}":println()
Output:

Code:
$ ./test_program 
"te" + "st" = "test"
nested embedded expressions
With the "syntax error" line un-commented:

Code:
$ ./test_program 
./test_program:11:7: syntax error: Invalid token

Last edited by MTK358; 05-09-2011 at 10:23 AM.
 
Old 05-09-2011, 01:06 PM   #26
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by MTK358 View Post
I did it!
Good to hear. Did you use another instance of the lexer/parser to convert such string constants to AST, or how did you do it?
 
Old 05-09-2011, 01:48 PM   #27
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Nominal Animal View Post
Good to hear. Did you use another instance of the lexer/parser to convert such string constants to AST, or how did you do it?
The way I did it is that when the lexer comes across a "${" inside a double-quoted string, it returns a special token. When the parser gets that token, it creates a new lexer and parser but tells them to use the original scanner (since it remembers the place in the text file).

I also had to slightly modify the parser to be able to recognize any specified token (not just EOF) as the end of the program, in this case the closing curly bracket.

From the lexer:

Code:
	if (isInDoubleQuotes) {
		if (s->current() == '"') {
			s->next();
			isInDoubleQuotes = false;
			curTok = DoubleQuoteTok;
		} else if (s->current() == '$') {
			s->next();
			if (s->current() == '{') {
				s->next();
				curTok = DoubleQuotedExpressionTok;
			} else if (!isCharFirstNameCharacter(s->current())) {
				curTok = InvalidInput;
			} else {
				do {
					str.push_back(s->current());
				} while (isCharNameCharacter(s->next()));
				curText = str.c_str();
				curTok = DoubleQuotedVariableTok;
			}
		} else if (s->current() == Scanner::ReadError) {
			isInDoubleQuotes = false;
			curTok = ReadError;
		} else if (s->current() == Scanner::EndOfFile) {
			isInDoubleQuotes = false;
			curTok = InvalidInput;
		} else {
			do {
				if (s->current() != '\\') {
					str.push_back(s->current());
				} else {
					s->next();
					switch (s->current()) {
						case '\\':
							str.push_back('\\');
							break;
						case 'n':
							str.push_back('\n');
							break;
						case 'r':
							str.push_back('\r');
							break;
						case '0':
							str.push_back('\0');
							break;
						case 'a':
							str.push_back('\a');
							break;
						case 'b':
							str.push_back('\b');
							break;
						case 't':
							str.push_back('\t');
							break;
						case 'v':
							str.push_back('\v');
							break;
						case 'f':
							str.push_back('\f');
							break;
						case 'e':
							str.push_back('\e');
							break;
						case '"':
							str.push_back('"');
							break;
						default:
							str.push_back(s->current());
					}
				}
				s->next();
			} while (s->current() != '"' && s->current() != '$' && s->current() >= 0);
			curText = str.c_str();
			curTok = DoubleQuotedTextTok;
		}
		return curTok;
	}
From the parser:

Code:
	else if (accept(Lexer::DoubleQuoteTok))
	{
		int l = lex->prevLine(), c = lex->prevCol();
		node = new SubstitutionStringNode();
		while ( lex->current() == Lexer::DoubleQuotedTextTok       ||
		        lex->current() == Lexer::DoubleQuotedVariableTok   ||
		        lex->current() == Lexer::DoubleQuotedExpressionTok ) {
			if (lex->current() == Lexer::DoubleQuotedTextTok) {
				((SubstitutionStringNode*) node)->addText(String::fromAscii(lex->text()));
			} else if (lex->current() == Lexer::DoubleQuotedExpressionTok) {
				Lexer l2;
				l2.setScanner(lex->getScanner());
				Parser p2;
				Node* node2 = p2.parse(&l2, Lexer::CCurlyTok);
				((SubstitutionStringNode*) node)->addExpr(node2);
			} else if (lex->current() == Lexer::DoubleQuotedVariableTok) {
				((SubstitutionStringNode*) node)->addVar(lex->text());
			}
			lex->next();
		} 
		if (!accept(Lexer::DoubleQuoteTok)) throw SyntaxError("No closing double-quote", l, c);
	}
 
Old 05-09-2011, 02:22 PM   #28
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by MTK358 View Post
The way I did it is that when the lexer comes across a "${" inside a double-quoted string, it returns a special token. When the parser gets that token, it creates a new lexer and parser but tells them to use the original scanner (since it remembers the place in the text file).
Quite neat.

Quote:
Originally Posted by MTK358 View Post
I also had to slightly modify the parser to be able to recognize any specified token (not just EOF) as the end of the program, in this case the closing curly bracket.
Does it still return an error on a stray closing brace (}), or does it treat it as the end of the program?
 
Old 05-09-2011, 03:22 PM   #29
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by Nominal Animal View Post
Does it still return an error on a stray closing brace (}), or does it treat it as the end of the program?
It treats it as the end of the program.

The parser is a recursive descent parser. The "program" rule matches an expr-list followed by the ending token (EOF or "}", depending on how the parser was initialized). The expr-list rule matches 0 or more newlines, and then it checks if the next token could be the first token of an expression (for example, "if" or "(" tokens could be the start of an expression, while ")" or "end" could not). If so, it matches an expression and starts over. If not, it quits, returning a node that evaluates all the expressions in the list, and returns the value of the last one. If the top-level expr-list returns and the next token is not the ending token, it's treated as a syntax error.

Last edited by MTK358; 05-09-2011 at 03:24 PM.
 
Old 07-14-2011, 11:51 AM   #30
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I came across a big issue with this, so I have to mark the te thread as unsolved:

I recently modified the parser to have two-token lookahead, since that was necessary for some syntax I wanted to add. The problem is that this completely broke expression substitution in strings, and I'm not sure how to solve it.

Basically, the way it worked before is that if you evaluate an expression, the lexer is at the token after the expression's last token. This was OK before, but now the lexer is actually internally two tokens after the expression's last token, because that's how it implements its new peek() feature. The reason that this poses a problem for expression substitution is that the inner lexer (when inside the ${...}) actually goes past the closing curly brace to peek at the next token. If the contents of the string right after the closing brace happen not to be a valid token, the inner lexer throws a syntax error. Or if it is a valid token, when it goes back to the main parser/lexer, it starts reading from where the inner lexer finished, which means that it skips the part of the string after the closing brace.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Variable Substitution theKbStockpiler Programming 7 04-16-2011 02:21 AM
sed substitution of variable with a variable ngyz86 Linux - Newbie 6 01-05-2011 07:44 AM
SH: Recursive Variable Substitution TVT Programming 7 07-01-2010 03:30 PM
[SOLVED] bash variable substitution Jerry Mcguire Programming 6 04-29-2010 09:33 AM
variable substitution in sed gaynut Programming 1 07-14-2008 07:38 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration