LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Regular expressions (https://www.linuxquestions.org/questions/linux-newbie-8/regular-expressions-803336/)

Khaj.pandey 04-21-2010 01:23 PM

Regular expressions
 
Hi,
I am following the following tutorial to learn regular expressions.

http://www.grymoire.com/Unix/Regular.html (written for solaris)

I am on RHL.

I am facing a problem searching characters [ and ] in my file.

File contents are :

FROM
DFROM
a
n
[
[[
]


When i run the following it fails :
$ grep '[]' tp
grep: Unmatched [ or [^


this does not work either:
$ grep '[\[\]]' tp

Can you guys point out what is it that i am doing wrong?

Thanks!

Tinkster 04-21-2010 01:29 PM

Not sure what your problem is:
Code:

$ grep '\[\]' *
numeral.cxx:  char *ones[] = {"","I","II","III","IV","V","VI","VII","VIII","IX"};
numeral.cxx:  char *tens[] = {"","X","XX","XXX","XL","L","LX","LXX","LXXX","XC"};
numeral.cxx:  char *hundreds[] = {"","C","CC","CCC","CD","D","DC","DCC","DCCC","CM"};
numeral.cxx~:  char *ones[] = {"","I","II","III","IV","V","VI","VII","VIII","IX"};
numeral.cxx~:  char *tens[] = {"","X","XX","XXX","XL","L","LX","LXX","LXXX","XC"};
numeral.cxx~:  char *hundreds[] = {"","C","CC","CCC","CD","D","DC","DCC","DCCC","CM"};
order.awk:  print gensub(/([^\[]+)\[([^\],]+),([^\]]+)\]/, "\1 FROM \2 FOR \3", "1" )}


pixellany 04-21-2010 02:04 PM

the example above find only the pair: "[]"---I think OP wants to find either [ or ]

Interesting problem:

Once inside the outer [] pair, it seems that I can use either [ or ] without escaping, eg these constructs work as expected:

grep "[[]" (matches literal [)
grep "[]]" (matches literal ])
grep "[][]" (matches either literal ] or literal [)

But this:
grep "[[]]" (matches only the [] pair)

This ALSO matches only the [] pair:
grep "[\[\]]"

And this matches nothing:
grep "[\]\[]"

I have NO CLUE what is going on here....

Tinkster 04-21-2010 02:09 PM

I still don't understand what exactly he's searching for.
If he wants to find lines that have EITHER, just do:
Code:

grep -E '\[|\]'

Khaj.pandey 04-21-2010 02:23 PM

pixellany seems to have caught it.

Thanks Guys! Here is more light on the problem.

I am confused regarding the use of [] brackets. I know we can give ranges in the bracket , but does it match only one character or many?


Say i change my file to
Quote:

FROM
DFROM
a
n
[]
[[
]
a
9
-

when i do this :
$ grep '[\[\]]*' tp
[]
[[

I can understand
[] was returned since it matched [ and *
[[ was returned since it matched ] and *

Why was ] not returned?
The tutorial mentions :
[0-9\-a\]] Matches Any number, or a "-", a "a", or a "]"

But the following does not return expected results.
$ grep '^[0-9\-a\]]*' tp
a
a
9
Why was “-“ not returned? Heck , even the ] was not returned.. :|


Cheers!

pixellany 04-21-2010 02:27 PM

Quote:

Originally Posted by Tinkster (Post 3943023)
I still don't understand what exactly he's searching for.
If he wants to find lines that have EITHER, just do:
Code:

grep -E '\[|\]'

Indeed---but he and I were both trying to use character classes---at least I was...

Tinkster 04-21-2010 02:36 PM

Strangely enough
Code:

grep '[][]' test.txt
[
[[
]

seems to work on his original snippet.

pixellany 04-21-2010 03:58 PM

This one has already cost me a whole bottle of Excedrin......

any chance that single vs double quotes is significant?

Tinkster 04-21-2010 05:09 PM

I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than
"two consecutive empty character classes" ... about to check how that
works with other tools (e.g. perl, awk, sed, emacs ... )

Tinkster 04-21-2010 05:10 PM

I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than
"two consecutive empty character classes" ... about to check how that
works with other tools (e.g. perl, awk, sed, emacs ... )

chrism01 04-21-2010 06:25 PM

In re other tools, I have read Mastering Regular Expressions http://regex.info/ some time ago and it pointed out that each lang/tool that has a regex engines tends to have differences that vary from minor to major, unless using the pcre option.
Highly recommended book btw.

Tinkster 04-21-2010 06:37 PM

Quote:

Originally Posted by chrism01 (Post 3943294)
In re other tools, I have read Mastering Regular Expressions http://regex.info/ some time ago and it pointed out that each lang/tool that has a regex engines tends to have differences that vary from minor to major, unless using the pcre option.
Highly recommended book btw.

Highly recommended indeed - but my copy is at home and
not at my desk ;}


Cheers,
Tink

pixellany 04-21-2010 08:45 PM

I think I've got it!!

Testing this has now gotten me into the 3rd bottle of Excedrin.

First, I think is a Regex thing and not just a GREP thing.

Here is what I think is happening:

[[]] means [ in the character class, followed by another literal ]---ie it matches only []

[[\] means a character class of [ and literal \----ie it matches [ or \ (\ is not an escape!)

[[\]] means the above + a literal ]----ie it matches ([ OR \) AND ]

So---inside a character class ([....]):
1: [ or ] are literal
2: \ is literal if there is nothing else around but [ or ]
3: Extra [ or ] after a char class are literal (how about before)

I wonder if this is documented anywhere???

Tinkster 04-21-2010 08:52 PM

Quote:

Originally Posted by pixellany (Post 3943395)
I think I've got it!!

Testing this has now gotten me into the 3rd bottle of Excedrin.

First, I think is a Regex thing and not just a GREP thing.

Yes, I'm afraid you're right; I still don't understand WHY,
though.

Quote:

Originally Posted by pixellany (Post 3943395)
Here is what I think is happening:

[[]] means [ in the character class, followed by another literal ]---ie it matches only []

But that's not how character classed are supposed to work;
and if the outer [] were taken as a class [][] and [[]]
should be equivalent (which they're not). E.g., "[ab]" is
meant to to be "either a or b will do".

My *guess* is that, since [][] works, and we get matches for
any combo of individual or paired square brackets for some
reason the regex implementation "expects" to get a named
character class when it finds two opening [[ brackets, and
then gives up if there's no :<something_posix>: in there.

Just a guess.



Cheers,
Tink

pixellany 04-21-2010 09:15 PM

sed 's/]/new/' "]" is treated as a literal

sed 's/[/new/' produces an error

sed 's/[[a]/new/' matches [ OR a


Postulate:
Whenever "[" is encountered, the following characters are taken as literal --until the first "]". After that, "]" is literal, but "[" is not

the only thing I have not tested is some **other** special characters inside the char class.


All times are GMT -5. The time now is 03:50 PM.