LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-22-2014, 04:08 AM   #1
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 703

Rep: Reputation: 111Reputation: 111
Angry shell scripting - Grep a character class containing '-'


Ok, I admit it; I don't really see the solution yet...

I have this script, among all the things it does is grep files for 'words';
The definition of a word is (as I thought in grep expression): "@[a-zA-Z0-9\.\-_]+@"

But it does not find a word like: '@foo.bar-test@'; which I thought it should detect.
The problem is that [[:alnum:][:punct:]] matches too much (eg: @foo=bar@)

I tried to use grep in various ways: grep -o -e, grep -o -E, grep -o -- etc
Also tried double-escaping the dash (it seems the only caharacter that is making issues is the dash, so focussing on that one now).

So far I've had the following commands to test it out:

Code:
echo "@foo.bar-match@" | grep "@[a-zA-Z0-9\.\-_]+@"
echo "@foo.bar-match@" | grep '@[a-zA-Z0-9\.\-_]+@' # different quotes
echo "@foo.bar-match@" | grep -e '@[a-zA-Z0-9\.\-_]+@'
echo "@foo.bar-match@" | grep -- '@[a-zA-Z0-9\.\-_]+@' # maybe the shell tries to interpret
echo "@foo.bar-match@" | grep '@[a-zA-Z0-9\.\\\-_]+@'
echo "@foo.bar-match@" | grep -e '\@[a-zA-Z0-9\.\-\_]+\@'
echo "@foo.bar-match@" | grep '@[a-zA-Z0-9\.[-]_]+@'

And a bunch more, actually, to add a bit of flavor. (e.g. combine -e and --), switch quotes, combine various ways of escaping; I'm a bit lost... why does grep not "see" this what should match...

Maybe one of the gurus immediately sees (as regexpes always give me headaches)

Last edited by Ramurd; 08-22-2014 at 04:10 AM. Reason: fixed code-tags
 
Old 08-22-2014, 04:14 AM   #2
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 703

Original Poster
Rep: Reputation: 111Reputation: 111
Maybe I should phrase the definition of a word here:
- A string starting and ending with '@'
-- Containing, between these two '@':
--- at least one of:
---- lowercase letter (a-z)
---- uppercase letter (A-Z)
---- number (0-9)
---- period (.)
---- dash (-)
---- underscore (_)
 
Old 08-22-2014, 05:08 AM   #3
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 703

Original Poster
Rep: Reputation: 111Reputation: 111
ok, this is weird...

Code:
echo "@foo.bar-match@" | grep -E '@[[:alnum:]_.\-]+@'
does recognize this string
Code:
echo "@foo.bar-match@" | grep -E '@[[:alnum:]_\.\-]+@'
as well
and
Code:
echo "@foo.bar=match@" | grep -E '@[[:alnum:]_.\-]+@'
does not match (which is intended)

Funny how the non-escaped dot does not have the special meaning 'any character'; I kinda forgot that one :-)
It seems that the location of the dash in the "character set" makes the difference... Never would've thought it would.
 
Old 08-22-2014, 05:43 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I am not sure I understand the issue, suffice it to say, anything inside [] does not get interpreted.
Code:
echo "@foo.bar-match@" | grep -E '@[[:alnum:]_.-]+@'
The only other point I would make, is keeping the hyphen at the end also lets [] know that you are not providing a range, like [a-z]
 
1 members found this post helpful.
Old 08-22-2014, 11:07 AM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
As stated in the grep manpage under "Character Classes and Bracket Expressions":
"Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last."
 
1 members found this post helpful.
Old 08-25-2014, 01:53 AM   #6
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 703

Original Poster
Rep: Reputation: 111Reputation: 111
I did grab the man page and tried to search the entire interwebs. While specifically looking for this issue I still overlooked that part in the manpage; However, I did read this line:

"Most meta-characters lose their special meaning inside bracket expressions"; but I could not find the specification of "most"; I guess I need new glasses or learn to read... or...

Quote:
suffice it to say, anything inside [] does not get interpreted.
And that's the thing actually; I read "most" not "anything" in the manpage, which put me on the wrong leg (if that's a proper saying)

Still thanks for the replies! it's fixed now :-) Marking thread as solved.
 
Old 08-25-2014, 09:19 AM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,779

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by Ramurd View Post
I did grab the man page and tried to search the entire interwebs. While specifically looking for this issue I still overlooked that part in the manpage; However, I did read this line:

"Most meta-characters lose their special meaning inside bracket expressions"; but I could not find the specification of "most"; I guess I need new glasses or learn to read... or...
The rest of that paragraph details the only meta-characters that still have special meaning:
"To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last."
Those are the only three.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Read .log format file and get special character from some lines by shell scripting samasara Linux - Newbie 32 12-21-2013 02:17 AM
Bash scripting: parsing a text file character-by-character Completely Clueless Programming 13 08-12-2009 09:07 AM
LXer: Using Grep To Streamline Your Shell And Command Line Scripting LXer Syndicated Linux News 0 08-08-2008 07:11 AM
bash simple test with posix character class osio Programming 5 01-22-2006 07:23 PM
line addressing with grep (shell scripting) j2dizzo Linux - General 13 03-03-2004 09:36 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration