LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-20-2016, 11:32 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Character class exclusion


I have Work1, a file of English words.

As a learning exercise I coded this ...
Code:
echo "Find 5-character words which have the same letter in positions 1 and 3."
echo "  Examples: fifth, mamma, sassy, total."
egrep '^(.).\1..$' $Work1 >$OutFile
... and it works.

To make the exercise more interesting I coded this ...
Code:
echo "Find 5-character words which have the same letter in positions 1 and 3"
echo "   --and-- the character in positions 1 and 3 is not used elsewhere."
echo "  Examples: fifth, total."
egrep '^(.)[^\1]\1[^\1][^\1]$' $Work1 >$OutFile
... and it produces an OutFile identical to the first exercise. Evidently using [^\1] to exclude a specific character from a character class is not doing the job.

Please advise.

Daniel B. Martin
 
Old 08-20-2016, 02:01 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
I found a solution but it is a little out there. Also my reference was posted over 5 years ago so there may be an alternative now.
Anyhoo, here is what worked:
Code:
grep -P '^(.)(?:(?!\1).)\1(?:(?!\1).)(?:(?!\1).)$' word_file
It seems you cannot negate a back reference, but you can negate a look-ahead. You will notice I have also switched from -E (what you are using) to -P for perl regular expressions which support look-aheads

I will be interested to see if there is an alternative or even a way to shorten the current solution
 
1 members found this post helpful.
Old 08-20-2016, 08:00 PM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,774

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Well, this is not shorter, but (IMO) more readable (basically a straightforward translation of each condition into awk):
Code:
awk -F '' 'NF == 5 && $1 == $3 && !index($2 substr($0, 4), $1)' input
 
Old 08-21-2016, 05:43 AM   #4
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
You can make it shorter by removing the (?: ... ) groups
Code:
grep -P '^(.)(?!\1).\1(?!\1).(?!\1).$' words
 
1 members found this post helpful.
Old 08-21-2016, 11:13 AM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by keefaz View Post
Code:
grep -P '^(.)(?!\1).\1(?!\1).(?!\1).$' words
Concise, correct, and over my head! Please elaborate and explain.

Daniel B. Martin
 
Old 08-21-2016, 11:58 AM   #6
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Quote:
Originally Posted by perlre
Lookaround assertions are zero-width patterns which match a specific pattern without including it in $& . Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
Code:
(?!pattern)
A zero-width negative lookahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".

If you are looking for a "bar" that isn't preceded by a "foo", /(?!foo)bar/ will not do what you want. That's because the (?!foo) is just saying that the next thing cannot be "foo"--and it's not, it's a "bar", so "foobar" will match. Use lookbehind instead (see below).
http://perldoc.perl.org/perlre.html#Extended-Patterns > Lookaround Assertions
Code:
grep -P '^(.)(?!\1).\1(?!\1).(?!\1).$' words
Decomposed:
Code:
grep -P	enable perl regular expression
^(.)  	matches any character at start of string and captures it in \1
(?!\1) 	matches previous that isn't followed by \1, no capture
. 	matches any character (that isn't \1 as previous rule)
\1	matches captured character in \1
(?!\1)	matches previous that isn't followed by \1, no capture
. 	matches any character (that isn't \1 as previous rule)
(?!\1)	matches previous that isn't followed by \1, no capture
.	matches any character (that isn't \1 as previous rule)
$	end of string

Last edited by keefaz; 08-21-2016 at 12:02 PM.
 
1 members found this post helpful.
Old 08-21-2016, 12:58 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
And a little more tidy up
Code:
grep -P '^(.)(?!\1).\1((?!\1).){2}$' word_file
 
1 members found this post helpful.
Old 08-21-2016, 02:04 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Thank you to keefaz and grail for thoughtful contributions. This thread is marked SOLVED!

Daniel B. Martin
 
  


Reply

Tags
character class


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] shell scripting - Grep a character class containing '-' Ramurd Programming 6 08-25-2014 09:19 AM
RSYNC Exclusion? carlosinfl General 4 10-04-2007 06:13 PM
sed inclusion/exclusion jinksys Programming 2 05-01-2007 09:42 PM
bash simple test with posix character class osio Programming 5 01-22-2006 07:23 PM
BASH; exclusion list and cp TheLinuxDuck Programming 3 03-10-2005 01:59 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration