LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   RegEx - character class containing brackets, how to escape correctly (https://www.linuxquestions.org/questions/programming-9/regex-character-class-containing-brackets-how-to-escape-correctly-4175614543/)

MrMeeSeeks 09-26-2017 06:48 AM

RegEx - character class containing brackets, how to escape correctly
 
Hey there,
somewhat dim question I guess, but:

I'm trying to figure out how to grep for a character class like [a-d\[\]\*]. Which, as I see it, should match any of abcd[]*, but doesn't in grep.
Also trying to match a class containing both single and double quotes.
No particular reason, just trying to get a better handle on grep and REs.

Now, I came across a statement suggesting that there is no actual escaping in greps BREs, but rather in the shell, however this post suggests there is both simultaneously, so I tried double escaping like '[a\\'b]' - which doesn't work either.
So, I am very confused as to how the escaping works and why it doesn't in these particular instances.


Also, something that deeply weirds me out: when I forget to quote a character class grep always matches capital C's and nothing else. Why on earth?

//okay, now I just noticed something that really freaks me out: when I try to match '[a\-b]' grep matches a, b and every digit. Why?

Best wishes

ntubski 09-26-2017 07:21 AM

Quote:

Originally Posted by MrMeeSeeks (Post 5763062)
Now, I came across a statement suggesting that there is no actual escaping in greps BREs, but rather in the shell, however this post suggests there is both simultaneously, so I tried double escaping like '[a\\'b]' - which doesn't work either.
So, I am very confused as to how the escaping works and why it doesn't in these particular instances.

There is escaping both in BREs and in the shell; but BRE escaping does not use the same rules as shell escaping. In particular, BRE escaping inside of character classes is a bit... funny.

GNU Grep manual: Character Classes and Bracket Expressions:
Quote:

Most meta-characters lose their special meaning inside bracket expressions.

‘]’
ends the bracket expression if it’s not the first list item. So, if you want to make the ‘]’ character a list item, you must put it first.
[...]

‘-’
represents the range if it’s not first or last in a list or the ending point of a range.

Therefore, to match any of abcd[]*, you want [][abcd*]. To protect this from shell expansion, you should then wrap in quotes:
Code:

grep '[][abcd*]'
Quote:

Originally Posted by MrMeeSeeks (Post 5763062)
Also, something that deeply weirds me out: when I forget to quote a character class grep always matches capital C's and nothing else. Why on earth?

Hard to say without an example.

Quote:

Originally Posted by MrMeeSeeks (Post 5763062)
//okay, now I just noticed something that really freaks me out: when I try to match '[a\-b]' grep matches a, b and every digit. Why?

Backslash doesn't escape within a character class, so it's matching a, and everything between \ and b. The exact characters this consists of depends on your locale sort order. In your default locale, digits are sorted between \ and b.

MrMeeSeeks 09-26-2017 08:01 AM

Thanks a lot!
Sorry I wasn't thorough enough - I even was on that page but at a glance assumed it would not help me.


All times are GMT -5. The time now is 06:04 AM.