LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-14-2013, 07:52 AM   #1
w1k0
Senior Member
 
Registered: May 2008
Location: Poland
Distribution: Slackware (personalized Window Maker), Mint (customized MATE)
Posts: 1,309

Rep: Reputation: 234Reputation: 234Reputation: 234
Square bracket hoax


I encountered strange problem writing a bash script similar to the following one:

Code:
#!/bin/bash

for character in [ h o a x ]
do
    string=$string$character
    echo $string
done
The right output of that script is the following:

Code:
[
[h
[ho
[hoa
[hoax
[hoax]
But when in the current directory exists the file or the directory named h, o, a, or x the problem appears.

For example after the command:

touch a

the above script displays the following output:

Code:
[
[h
[ho
[hoa
[hoax
a
So it builds the string but when it encounters ] instead of appending that character to the string it replaces everything between [ and ] with the letter corresponding to the name of the file made in the meantime.

Maybe you know what causes that problem and how could I avoid it (except of running that script inside the directories which don’t include one-letter files or subdirectories).
 
Old 05-14-2013, 08:37 AM   #2
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
It's simple, add double quotes around variables.

Code:
for character in [ h o a x ]
do
    string="$string$character"
    echo "$string"
done
There are rare exceptions where you don't want quotes (like in this case for [hoax]), but in general you should have them there.

Last edited by H_TeXMeX_H; 05-14-2013 at 08:38 AM.
 
1 members found this post helpful.
Old 05-14-2013, 07:24 PM   #3
w1k0
Senior Member
 
Registered: May 2008
Location: Poland
Distribution: Slackware (personalized Window Maker), Mint (customized MATE)
Posts: 1,309

Original Poster
Rep: Reputation: 234Reputation: 234Reputation: 234
H_TeXMeX_H,

Thank you – I understand now. Without double quotes [hoax] string works as a regular expression.

Unfortunately the substrings such as [hoax] appear in my script also as the parts of some patterns used by the loops:

Code:
#!/bin/bash

pattern="abcdef [hoax] uvwxyz"

for substring in $pattern
do
    echo "$substring"
done
I can’t put double quotes around $pattern used in the loop because I’d like to treat each substring from the pattern as a separate one. So it seems that the only solution is to avoid square brackets at all.

A week ago I wrote a simple script to process some simple string using bash. Then I started to develop the script and complicate the string (adding more alternative characters including square brackets). My final script processes huge strings built from random characters. I see now that writing that script using bash wasn’t good idea but when I started that work bash seemed suitable tool to process some simple string.
 
Old 05-15-2013, 02:43 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,871
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
Code:
#!/bin/bash

set -o noglob # or set -f 

pattern="abcdef [hoax] uvwxyz"

for substring in $pattern
do
    echo "$substring"
done
 
1 members found this post helpful.
Old 05-15-2013, 07:49 AM   #5
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
I also tried various combinations I still don't get why ] seems to flush the buffer ($string) and starts listing files with the same names as the variables - but only if the file name exists.

OK
 
Old 05-15-2013, 08:19 AM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I don't think using bash was the best idea. The brackets are special characters and I think that noglob is the only thing that can help here like NevemTeve suggests. Maybe you can use awk and sed to accomplish what you need better and faster ? Or if you know C or another programming language.
 
1 members found this post helpful.
Old 05-15-2013, 08:53 AM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
You can use an array for this:
Code:
#!/bin/bash

patterns=('abcdef' '[hoax]' 'uvwxyz')

for string in "${patterns[@]}"
do
    echo "$string"
done
 
1 members found this post helpful.
Old 05-15-2013, 02:53 PM   #8
w1k0
Senior Member
 
Registered: May 2008
Location: Poland
Distribution: Slackware (personalized Window Maker), Mint (customized MATE)
Posts: 1,309

Original Poster
Rep: Reputation: 234Reputation: 234Reputation: 234
NevemTeve,

That solved the problem completely. Thank you very much.

H_TeXMeX_H,

I stated the same in the post #3: “writing that script using bash wasn’t good idea but when I started that work bash seemed suitable tool to process some simple string”. To disable globbing helped indeed. As for the whole project I did it as an occupational therapy. The project is finished now and the therapy succeeded so I’m not prone to rewrite that script now using the other tool than bash.

ntubski,

Your solution is interesting but to implement that it’ll be necessary to modify the entire script which uses a lot of such constructions. I’m afraid that implementing arrays could cause some other problems concerning some other character classes.

During my work I set apart a few classes of characters...

The following characters are forbidden because it isn’t possible to use them in the command line inside the complex string closed in double quotes:

Code:
"`!<>
The simplest command failing with these characters is:

echo "`!<>"

The following characters should be avoided because they cause problems with some commands:

Code:
#$%*?
The following characters should be escaped with \\ in order to work properly:

Code:
.()~^&_=+[]{}/
I’m too lazy to write the scripts illustrating the problems with the above two classes of characters.

Some other problems can cause high ASCII characters and UTF-8 characters. For example Perl tr function replacing hard spaces spoils at least the string including French “à”, Russian “Р”, or French quotes “«»” changing these four characters into something else:

Code:
#!/bin/bash

hard_space="*"    # to use the script replace here and below * with hard space (ASCII 160)

string="A*French 'à', a*Russian 'Р', and French quotes '«»'"    # the string including two hard spaces

echo $string

good_string=${string//$hard_space/ }

echo $good_string

bad_string=`echo $string | perl -pe "tr/$hard_space/ /"`

echo $bad_string
(LinuxQuestions.org replaces hard spaces with * so to use the above script replace * in hard_space and string variables with hard spaces.)

Of course the other high ASCII characters which are the parts of some compound UTF-8 characters cause the similar problems when translated with Perl tr function.

The bottom line is: to convert complex string isn’t easy because different character classes can cause different problems during convertions made with different tools or functions.
 
Old 05-15-2013, 03:03 PM   #9
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by AnanthaP View Post
I also tried various combinations I still don't get why ] seems to flush the buffer ($string) and starts listing files with the same names as the variables - but only if the file name exists.
It doesn't change the string; it's parsed when it's used with echo.

I'd actually consider this a bug in bash, since it doesn't interpret things like *, `, !, >, and {1..10} when variable substitution happens. This is also a security risk, since you could have the string p='[a-z]*' which has length 6, and then do echo $p from /usr/bin and get a string much larger than that (on my system, ~27k characters.) You should have to use eval for bash to parse the expression from a string, even when globbing is enabled.

Kevin Barry
 
1 members found this post helpful.
Old 05-15-2013, 03:49 PM   #10
w1k0
Senior Member
 
Registered: May 2008
Location: Poland
Distribution: Slackware (personalized Window Maker), Mint (customized MATE)
Posts: 1,309

Original Poster
Rep: Reputation: 234Reputation: 234Reputation: 234
AnanthaP,

The echo command treats "[hoax]" string as a regular expression (it uses globbing). If there is a file or directory named "h", "o", "a", or "x" the echo command displays its name instead of "[hoax]" string. It seems that ta0kira has right and that may be a bug.

Last edited by w1k0; 05-15-2013 at 03:54 PM.
 
Old 05-15-2013, 05:09 PM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by ta0kira View Post
I'd actually consider this a bug in bash, since it doesn't interpret things like *, `, !, >, and {1..10} when variable substitution happens. This is also a security risk, since you could have the string p='[a-z]*' which has length 6, and then do echo $p from /usr/bin and get a string much larger than that (on my system, ~27k characters.) You should have to use eval for bash to parse the expression from a string, even when globbing is enabled.
There's no bug here. Like many other things in the shell, it all makes sense when you fully understand the parsing order.

Filename expansion happens after variable and command expansion; in fact it's the very last step undertaken before the command is executed. The only other step that happens after variable/command expansion is IFS word-splitting, which takes place just before glob expansion.

All of the other interpretations happen before variable substitution. Brace expansion in particular is one of the first steps, which is why you can't use variables in them. Therefore only IFS and globbing need to be considered, and all other shell-reserved characters will always be considered literal once stored inside a parameter.

It's also the main reason why proper quoting is so important. With the correct use of quotes and/or "set -f" and other shell options, security issues like the one you describe are not a problem.

('*', BTW, as another globbing pattern, will indeed expand if unquoted.)


I don't understand your last point at all, BTW. You should almost never touch eval for anything, and especially avoid using it on variables that can contain arbitrary text outside of the script-writer's control. Unless you intended to write "shouldn't"?


Anyway, to be absolutely clear as to what's happening here, '[ h o a x ]' is not a valid globbing pattern, since it contains unescaped spaces, and it's treated as a list of individual tokens for processing. But when it's re-assembled at the end into '[hoax]', now it does form a valid globbing pattern, and will expand into the appropriate matching file, if the variable is unquoted.


The only real, proper solution here is what was mentioned before, use arrays. That's exactly what they are designed for, and that's really how the script should've been set up from the start. One should never use a single, scalar variable for a list of items, and to rely on shell word-splitting to break up the contents of a substitution is something that should be done very sparingly, if at all*. But with arrays you can always keep the values safely quoted, while having the flexibility of accessing them both individually and as a whole.

As much work as it would be, I still highly recommend taking the effort to do it right.


*Assuming of course that your shell supports arrays. The situation changes when you need to use a POSIX-based shell without that feature.


PS: globbing is not a kind of regular expression. While it has many similarities, it doesn't fulfill the definition of a proper regex. The extended globbing patterns do, however.

Last edited by David the H.; 05-15-2013 at 05:11 PM.
 
1 members found this post helpful.
Old 05-16-2013, 10:11 PM   #12
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
There's no bug here. Like many other things in the shell, it all makes sense when you fully understand the parsing order.

Filename expansion happens after variable and command expansion; in fact it's the very last step undertaken before the command is executed. The only other step that happens after variable/command expansion is IFS word-splitting, which takes place just before glob expansion.
Has this always been the case? I wonder how I've gone this long without noticing that filename patterns are expanded from variables even in shell scripts. On the other hand, if you use the {} syntax, e.g. {/usr,}/bin/*, that doesn't get expanded from within a variable.

When I said it should be considered a bug, I realized that it was an intentional part of the design. I really meant to point out that it potentially causes widespread latent bugs because in most cases a variable won't be expanded into filenames, and other syntactic elements will be ignored (such as ;, >, &.)

Kevin Barry
 
Old 05-17-2013, 01:24 PM   #13
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yes, it's always been that way. You probably just never noticed it because getting a pattern that actually matches any files, in an unquoted variable expansion, is a relatively rare occurrence.

I suppose you could criticize the way it was implemented, but that doesn't make it a bug, just an area of potential gotchas to watch out for. But for historical reasons that's what we have to deal with.

Perhaps it's different for more advanced languages, but in the shell I personally think that it's a good that most syntax elements are inert inside parameters. The potential for serious security issues would be much greater if any old arbitrary code could be run from within one. It would be like having a permanently enabled eval in your shell. It also violates the concept of keeping code and data separate. The shell provides functions for that kind of thing.


As for "{/usr,}/bin/*", it's a two-step combination of brace expansion and globbing. When used directly the braces expand first into two separate strings, "/usr/bin/*" and "/bin/*". These are then substituted with lists of files later on in the glob expansion stage.

But if the string is first stored in a variable, the brace expansion stage is bypassed and you only get a single globbing pattern "{/usr,}/bin/*", with all characters but the last being literal, and that's very unlikely to match anything at all, and especially not what you want it to.

Actually, the brace expansion step is one area I would agree was poorly decided. It would certainly be much more convenient if it happened later in the parsing order as well, so that variables could be used directly with them. At the very least I'd love to see the inclusion of a shell option that would allow you to configure such behavior voluntarily.
 
Old 05-17-2013, 09:53 PM   #14
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,783

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Quote:
Originally Posted by David the H. View Post
Actually, the brace expansion step is one area I would agree was poorly decided. It would certainly be much more convenient if it happened later in the parsing order as well, so that variables could be used directly with them.
Then you would somehow have to deal with arbitrary parameters that happened to contain something that looked like a brace expression. Sure, you could quote them, but that would block any desired word splitting or pathname expansion. Thanks, but I'd rather not have to worry about yet another "gotcha" that might show up in a parameter value.
 
Old 05-19-2013, 10:11 AM   #15
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by rknichols View Post
Then you would somehow have to deal with arbitrary parameters that happened to contain something that looked like a brace expression.
Very true. I had thought about that, and I imagine that's the main reason it was done the way it was. Still, it would've been nice to somehow get it to work. The brace expansion could've been designed to use a different syntax that would be easier for the parser to distinguish, perhaps.

That's also why I was thinking of a shell option extension, if it could be feasibly implemented. It would be off by default, but you could enable it manually to allow the expansion of substitutions inside braces when really needed. But after thinking about it some more I'd probably not touch the parsing order, and allow variables to actually contain and expand valid brace expansion patterns. That gets back into the whole code vs. data thing again, and also conflates the concept of scalar variables vs arrays.

Last edited by David the H.; 05-19-2013 at 10:14 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
"Missing right curly or square bracket at -e" error in perl nivedhitha Programming 7 08-31-2011 03:17 AM
[SOLVED] in Fedora 11 square screen have not adjust square of monitor. us_ed Linux - Hardware 27 10-05-2010 12:35 PM
GoogleOS - Is it a hoax? SlowCoder General 14 01-10-2008 08:22 AM
Sendmail and hoax mail, help xedios Linux - Software 0 11-06-2005 07:25 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration