How to implement variable substitution in strings?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The deep problem is that for nested strings plain '"' is not enough - one needs opening and closing "quote". This has been implemented in Perl, for example: http://perldoc.perl.org/perlop.html#...like-Operators .
Either you're being very cryptic again or you have no idea what I'm trying to do.
"${foo${bar}}" is a syntax error, it is NOT translated into "foo(bar)". Substitution does not occur inside the substitution operator, which means that "${foo${bar}}" will try to evaluate the expression "foo${bar}", which is invalid.
I think that the way this would work is that if the lexer comes acroos a "$" followed by a "{" inside a double-quoted string, it cuts out everything from the "{" to the matching "}", and creates more instances of the scanner/lexer/parser that will parse it as it if were a separate program in the interpreted language. Since the parser does not know that the program it's parsing is embedded in a string, it doesn't do any ${} substitution. It can, however, contain double quoted strings, and those strings can contain ${} substitutions, and this can recursively go on and on as long as there's room on the stack.
Either you're being very cryptic again or you have no idea what I'm trying to do.
"${foo${bar}}" is a syntax error, it is NOT translated into "foo(bar)". Substitution does not occur inside the substitution operator, which means that "${foo${bar}}" will try to evaluate the expression "foo${bar}", which is invalid.
I think that the way this would work is that if the lexer comes acroos a "$" followed by a "{" inside a double-quoted string, it cuts out everything from the "{" to the matching "}", and creates more instances of the scanner/lexer/parser that will parse it as it if were a separate program in the interpreted language. Since the parser does not know that the program it's parsing is embedded in a string, it doesn't do any ${} substitution. It can, however, contain double quoted strings, and those strings can contain ${} substitutions, and this can recursively go on and on as long as there's room on the stack.
You said (IIRC) that in ${something} the "something" is an expression. Applying the "inner items are dealt with first" principle I've created my 'foo(1)' example.
What's sonfusing me isn't the concept of eval, but how to figure out what string to pass to it.
I think the "founding fathers" were confused too and decided not to complicate their (and our) lives: if one wants more than pure variables substitution, he/she needs to explicitly call 'eval'.
I think the "founding fathers" were confused too and decided not to complicate their (and our) lives: if one wants more than pure variables substitution, he/she needs to explicitly call 'eval'.
Maybe, but there is a language that does expression substitution exactly the way I described it: Ruby.
Code:
foo = 3
bar = 8
puts("#{foo} + #{bar} = #{foo + bar}")
puts("#{ "#{foo + bar}" + ' here are some curly braces: { }{}}}}{{' }")
# this causes a syntax error (the program only runs with it commented out)
# puts("#{foo#{bar}}")
Good to hear. Did you use another instance of the lexer/parser to convert such string constants to AST, or how did you do it?
The way I did it is that when the lexer comes across a "${" inside a double-quoted string, it returns a special token. When the parser gets that token, it creates a new lexer and parser but tells them to use the original scanner (since it remembers the place in the text file).
I also had to slightly modify the parser to be able to recognize any specified token (not just EOF) as the end of the program, in this case the closing curly bracket.
From the lexer:
Code:
if (isInDoubleQuotes) {
if (s->current() == '"') {
s->next();
isInDoubleQuotes = false;
curTok = DoubleQuoteTok;
} else if (s->current() == '$') {
s->next();
if (s->current() == '{') {
s->next();
curTok = DoubleQuotedExpressionTok;
} else if (!isCharFirstNameCharacter(s->current())) {
curTok = InvalidInput;
} else {
do {
str.push_back(s->current());
} while (isCharNameCharacter(s->next()));
curText = str.c_str();
curTok = DoubleQuotedVariableTok;
}
} else if (s->current() == Scanner::ReadError) {
isInDoubleQuotes = false;
curTok = ReadError;
} else if (s->current() == Scanner::EndOfFile) {
isInDoubleQuotes = false;
curTok = InvalidInput;
} else {
do {
if (s->current() != '\\') {
str.push_back(s->current());
} else {
s->next();
switch (s->current()) {
case '\\':
str.push_back('\\');
break;
case 'n':
str.push_back('\n');
break;
case 'r':
str.push_back('\r');
break;
case '0':
str.push_back('\0');
break;
case 'a':
str.push_back('\a');
break;
case 'b':
str.push_back('\b');
break;
case 't':
str.push_back('\t');
break;
case 'v':
str.push_back('\v');
break;
case 'f':
str.push_back('\f');
break;
case 'e':
str.push_back('\e');
break;
case '"':
str.push_back('"');
break;
default:
str.push_back(s->current());
}
}
s->next();
} while (s->current() != '"' && s->current() != '$' && s->current() >= 0);
curText = str.c_str();
curTok = DoubleQuotedTextTok;
}
return curTok;
}
From the parser:
Code:
else if (accept(Lexer::DoubleQuoteTok))
{
int l = lex->prevLine(), c = lex->prevCol();
node = new SubstitutionStringNode();
while ( lex->current() == Lexer::DoubleQuotedTextTok ||
lex->current() == Lexer::DoubleQuotedVariableTok ||
lex->current() == Lexer::DoubleQuotedExpressionTok ) {
if (lex->current() == Lexer::DoubleQuotedTextTok) {
((SubstitutionStringNode*) node)->addText(String::fromAscii(lex->text()));
} else if (lex->current() == Lexer::DoubleQuotedExpressionTok) {
Lexer l2;
l2.setScanner(lex->getScanner());
Parser p2;
Node* node2 = p2.parse(&l2, Lexer::CCurlyTok);
((SubstitutionStringNode*) node)->addExpr(node2);
} else if (lex->current() == Lexer::DoubleQuotedVariableTok) {
((SubstitutionStringNode*) node)->addVar(lex->text());
}
lex->next();
}
if (!accept(Lexer::DoubleQuoteTok)) throw SyntaxError("No closing double-quote", l, c);
}
The way I did it is that when the lexer comes across a "${" inside a double-quoted string, it returns a special token. When the parser gets that token, it creates a new lexer and parser but tells them to use the original scanner (since it remembers the place in the text file).
Quite neat.
Quote:
Originally Posted by MTK358
I also had to slightly modify the parser to be able to recognize any specified token (not just EOF) as the end of the program, in this case the closing curly bracket.
Does it still return an error on a stray closing brace (}), or does it treat it as the end of the program?
Does it still return an error on a stray closing brace (}), or does it treat it as the end of the program?
It treats it as the end of the program.
The parser is a recursive descent parser. The "program" rule matches an expr-list followed by the ending token (EOF or "}", depending on how the parser was initialized). The expr-list rule matches 0 or more newlines, and then it checks if the next token could be the first token of an expression (for example, "if" or "(" tokens could be the start of an expression, while ")" or "end" could not). If so, it matches an expression and starts over. If not, it quits, returning a node that evaluates all the expressions in the list, and returns the value of the last one. If the top-level expr-list returns and the next token is not the ending token, it's treated as a syntax error.
I came across a big issue with this, so I have to mark the te thread as unsolved:
I recently modified the parser to have two-token lookahead, since that was necessary for some syntax I wanted to add. The problem is that this completely broke expression substitution in strings, and I'm not sure how to solve it.
Basically, the way it worked before is that if you evaluate an expression, the lexer is at the token after the expression's last token. This was OK before, but now the lexer is actually internally two tokens after the expression's last token, because that's how it implements its new peek() feature. The reason that this poses a problem for expression substitution is that the inner lexer (when inside the ${...}) actually goes past the closing curly brace to peek at the next token. If the contents of the string right after the closing brace happen not to be a valid token, the inner lexer throws a syntax error. Or if it is a valid token, when it goes back to the main parser/lexer, it starts reading from where the inner lexer finished, which means that it skips the part of the string after the closing brace.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.