LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-21-2010, 01:23 PM   #1
Khaj.pandey
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Rep: Reputation: 0
Regular expressions


Hi,
I am following the following tutorial to learn regular expressions.

http://www.grymoire.com/Unix/Regular.html (written for solaris)

I am on RHL.

I am facing a problem searching characters [ and ] in my file.

File contents are :

FROM
DFROM
a
n
[
[[
]


When i run the following it fails :
$ grep '[]' tp
grep: Unmatched [ or [^


this does not work either:
$ grep '[\[\]]' tp

Can you guys point out what is it that i am doing wrong?

Thanks!
 
Old 04-21-2010, 01:29 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Not sure what your problem is:
Code:
$ grep '\[\]' *
numeral.cxx:  char *ones[] = {"","I","II","III","IV","V","VI","VII","VIII","IX"};
numeral.cxx:  char *tens[] = {"","X","XX","XXX","XL","L","LX","LXX","LXXX","XC"};
numeral.cxx:  char *hundreds[] = {"","C","CC","CCC","CD","D","DC","DCC","DCCC","CM"};
numeral.cxx~:  char *ones[] = {"","I","II","III","IV","V","VI","VII","VIII","IX"};
numeral.cxx~:  char *tens[] = {"","X","XX","XXX","XL","L","LX","LXX","LXXX","XC"};
numeral.cxx~:  char *hundreds[] = {"","C","CC","CCC","CD","D","DC","DCC","DCCC","CM"};
order.awk:  print gensub(/([^\[]+)\[([^\],]+),([^\]]+)\]/, "\1 FROM \2 FOR \3", "1" )}
 
Old 04-21-2010, 02:04 PM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
the example above find only the pair: "[]"---I think OP wants to find either [ or ]

Interesting problem:

Once inside the outer [] pair, it seems that I can use either [ or ] without escaping, eg these constructs work as expected:

grep "[[]" (matches literal [)
grep "[]]" (matches literal ])
grep "[][]" (matches either literal ] or literal [)

But this:
grep "[[]]" (matches only the [] pair)

This ALSO matches only the [] pair:
grep "[\[\]]"

And this matches nothing:
grep "[\]\[]"

I have NO CLUE what is going on here....
 
Old 04-21-2010, 02:09 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I still don't understand what exactly he's searching for.
If he wants to find lines that have EITHER, just do:
Code:
grep -E '\[|\]'
 
Old 04-21-2010, 02:23 PM   #5
Khaj.pandey
LQ Newbie
 
Registered: Apr 2010
Posts: 12

Original Poster
Rep: Reputation: 0
pixellany seems to have caught it.

Thanks Guys! Here is more light on the problem.

I am confused regarding the use of [] brackets. I know we can give ranges in the bracket , but does it match only one character or many?


Say i change my file to
Quote:
FROM
DFROM
a
n
[]
[[
]
a
9
-
when i do this :
$ grep '[\[\]]*' tp
[]
[[

I can understand
[] was returned since it matched [ and *
[[ was returned since it matched ] and *

Why was ] not returned?
The tutorial mentions :
[0-9\-a\]] Matches Any number, or a "-", a "a", or a "]"

But the following does not return expected results.
$ grep '^[0-9\-a\]]*' tp
a
a
9
Why was “-“ not returned? Heck , even the ] was not returned.. :|


Cheers!

Last edited by Khaj.pandey; 04-21-2010 at 02:24 PM.
 
Old 04-21-2010, 02:27 PM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by Tinkster View Post
I still don't understand what exactly he's searching for.
If he wants to find lines that have EITHER, just do:
Code:
grep -E '\[|\]'
Indeed---but he and I were both trying to use character classes---at least I was...
 
Old 04-21-2010, 02:36 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Strangely enough
Code:
grep '[][]' test.txt
[
[[
]
seems to work on his original snippet.
 
Old 04-21-2010, 03:58 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
This one has already cost me a whole bottle of Excedrin......

any chance that single vs double quotes is significant?
 
Old 04-21-2010, 05:09 PM   #9
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than
"two consecutive empty character classes" ... about to check how that
works with other tools (e.g. perl, awk, sed, emacs ... )
 
Old 04-21-2010, 05:10 PM   #10
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
I'm almost inclined to say that's a bug in grep ... if ANYTHING
I would have expectd "[[]]" to give the desired result, rather than
"two consecutive empty character classes" ... about to check how that
works with other tools (e.g. perl, awk, sed, emacs ... )
 
Old 04-21-2010, 06:25 PM   #11
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
In re other tools, I have read Mastering Regular Expressions http://regex.info/ some time ago and it pointed out that each lang/tool that has a regex engines tends to have differences that vary from minor to major, unless using the pcre option.
Highly recommended book btw.
 
Old 04-21-2010, 06:37 PM   #12
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by chrism01 View Post
In re other tools, I have read Mastering Regular Expressions http://regex.info/ some time ago and it pointed out that each lang/tool that has a regex engines tends to have differences that vary from minor to major, unless using the pcre option.
Highly recommended book btw.
Highly recommended indeed - but my copy is at home and
not at my desk ;}


Cheers,
Tink
 
Old 04-21-2010, 08:45 PM   #13
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
I think I've got it!!

Testing this has now gotten me into the 3rd bottle of Excedrin.

First, I think is a Regex thing and not just a GREP thing.

Here is what I think is happening:

[[]] means [ in the character class, followed by another literal ]---ie it matches only []

[[\] means a character class of [ and literal \----ie it matches [ or \ (\ is not an escape!)

[[\]] means the above + a literal ]----ie it matches ([ OR \) AND ]

So---inside a character class ([....]):
1: [ or ] are literal
2: \ is literal if there is nothing else around but [ or ]
3: Extra [ or ] after a char class are literal (how about before)

I wonder if this is documented anywhere???
 
Old 04-21-2010, 08:52 PM   #14
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by pixellany View Post
I think I've got it!!

Testing this has now gotten me into the 3rd bottle of Excedrin.

First, I think is a Regex thing and not just a GREP thing.
Yes, I'm afraid you're right; I still don't understand WHY,
though.

Quote:
Originally Posted by pixellany View Post
Here is what I think is happening:

[[]] means [ in the character class, followed by another literal ]---ie it matches only []
But that's not how character classed are supposed to work;
and if the outer [] were taken as a class [][] and [[]]
should be equivalent (which they're not). E.g., "[ab]" is
meant to to be "either a or b will do".

My *guess* is that, since [][] works, and we get matches for
any combo of individual or paired square brackets for some
reason the regex implementation "expects" to get a named
character class when it finds two opening [[ brackets, and
then gives up if there's no :<something_posix>: in there.

Just a guess.



Cheers,
Tink

Last edited by Tinkster; 04-21-2010 at 08:53 PM.
 
Old 04-21-2010, 09:15 PM   #15
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
sed 's/]/new/' "]" is treated as a literal

sed 's/[/new/' produces an error

sed 's/[[a]/new/' matches [ OR a


Postulate:
Whenever "[" is encountered, the following characters are taken as literal --until the first "]". After that, "]" is literal, but "[" is not

the only thing I have not tested is some **other** special characters inside the char class.

Last edited by pixellany; 04-21-2010 at 09:16 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions Wim Sturkenboom Programming 10 11-19-2009 01:21 AM
help with regular expressions mariogarcia Linux - Software 3 01-28-2009 03:23 AM
Regular expressions bhuwan Programming 5 02-25-2006 11:07 PM
regular expressions. stomach Linux - Software 1 02-10-2006 06:41 AM
regular expressions? alaios Linux - General 2 06-11-2003 03:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration