Regular expressions
Hi,
I am following the following tutorial to learn regular expressions. http://www.grymoire.com/Unix/Regular.html (written for solaris) I am on RHL. I am facing a problem searching characters [ and ] in my file. File contents are : FROM DFROM a n [ [[ ] When i run the following it fails : $ grep '[]' tp grep: Unmatched [ or [^ this does not work either: $ grep '[\[\]]' tp Can you guys point out what is it that i am doing wrong? Thanks! |
Not sure what your problem is:
Code:
$ grep '\[\]' * |
the example above find only the pair: "[]"---I think OP wants to find either [ or ]
Interesting problem: Once inside the outer [] pair, it seems that I can use either [ or ] without escaping, eg these constructs work as expected: grep "[[]" (matches literal [) grep "[]]" (matches literal ]) grep "[][]" (matches either literal ] or literal [) But this: grep "[[]]" (matches only the [] pair) This ALSO matches only the [] pair: grep "[\[\]]" And this matches nothing: grep "[\]\[]" I have NO CLUE what is going on here.... |
I still don't understand what exactly he's searching for.
If he wants to find lines that have EITHER, just do: Code:
grep -E '\[|\]' |
pixellany seems to have caught it.
Thanks Guys! Here is more light on the problem. I am confused regarding the use of [] brackets. I know we can give ranges in the bracket , but does it match only one character or many? Say i change my file to Quote:
$ grep '[\[\]]*' tp [] [[ I can understand [] was returned since it matched [ and * [[ was returned since it matched ] and * Why was ] not returned? The tutorial mentions : [0-9\-a\]] Matches Any number, or a "-", a "a", or a "]" But the following does not return expected results. $ grep '^[0-9\-a\]]*' tp a a 9 Why was “-“ not returned? Heck , even the ] was not returned.. :| Cheers! |
Quote:
|
Strangely enough
Code:
grep '[][]' test.txt |
This one has already cost me a whole bottle of Excedrin......
any chance that single vs double quotes is significant? |
I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than "two consecutive empty character classes" ... about to check how that works with other tools (e.g. perl, awk, sed, emacs ... ) |
I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than "two consecutive empty character classes" ... about to check how that works with other tools (e.g. perl, awk, sed, emacs ... ) |
In re other tools, I have read Mastering Regular Expressions http://regex.info/ some time ago and it pointed out that each lang/tool that has a regex engines tends to have differences that vary from minor to major, unless using the pcre option.
Highly recommended book btw. |
Quote:
not at my desk ;} Cheers, Tink |
I think I've got it!!
Testing this has now gotten me into the 3rd bottle of Excedrin. First, I think is a Regex thing and not just a GREP thing. Here is what I think is happening: [[]] means [ in the character class, followed by another literal ]---ie it matches only [] [[\] means a character class of [ and literal \----ie it matches [ or \ (\ is not an escape!) [[\]] means the above + a literal ]----ie it matches ([ OR \) AND ] So---inside a character class ([....]): 1: [ or ] are literal 2: \ is literal if there is nothing else around but [ or ] 3: Extra [ or ] after a char class are literal (how about before) I wonder if this is documented anywhere??? |
Quote:
though. Quote:
and if the outer [] were taken as a class [][] and [[]] should be equivalent (which they're not). E.g., "[ab]" is meant to to be "either a or b will do". My *guess* is that, since [][] works, and we get matches for any combo of individual or paired square brackets for some reason the regex implementation "expects" to get a named character class when it finds two opening [[ brackets, and then gives up if there's no :<something_posix>: in there. Just a guess. Cheers, Tink |
sed 's/]/new/' "]" is treated as a literal
sed 's/[/new/' produces an error sed 's/[[a]/new/' matches [ OR a Postulate: Whenever "[" is encountered, the following characters are taken as literal --until the first "]". After that, "]" is literal, but "[" is not the only thing I have not tested is some **other** special characters inside the char class. |
All times are GMT -5. The time now is 03:50 PM. |