LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-20-2015, 09:36 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Pattern matching... Is there a clever RegEx?


This problem is purely for amusement and self-education.

Read an English-language dictionary and identify words
which fit this pattern:
- the length is 6 characters
- the 3rd and 4th letter are the same but do not appear elsewhere in the word
- the 5th and 6th letter are the same but do not appear elsewhere in the word

I wrote an awk which performs this task.
Code:
awk -F "" '{if (length($0)==6   &&
                        $3==$4  &&
                        $5==$6  &&
              index($0,$3)==3   &&
              index($0,$5)==5) print}' $InFile >$OutFile
There are several words which fit the pattern.
Familiar words such as coffee and toffee;
less-familiar words such as puttee and suttee.

Is it possible to do the same thing with grep using a clever Regular Expression?

Daniel B. Martin
 
Old 02-20-2015, 10:24 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Partial:

Code:
echo coffee |grep -E '^..([a-z])\1([a-z])\2$'
coffee
echo banana |grep -E '^..([a-z])\1([a-z])\2$'
Edit: this doesn't check the part about not appearing elsewhere. Just the two doubled letters.

Last edited by smallpond; 02-20-2015 at 10:28 AM.
 
1 members found this post helpful.
Old 02-20-2015, 10:52 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by smallpond View Post
Partial:

Code:
echo coffee |grep -E '^..([a-z])\1([a-z])\2$'
coffee
echo banana |grep -E '^..([a-z])\1([a-z])\2$'
Edit: this doesn't check the part about not appearing elsewhere. Just the two doubled letters.
Thank you, smallpond, for this contribution. It works as you said, finding coffee and toffee but also settee and tattoo.

Maybe this can't be done with one grep. Could it be done with two?

Daniel B. Martin
 
Old 02-20-2015, 11:01 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
grep -E '^..([a-z])\1([a-z])\2$' | grep -E -v '^.?(.).*\1'
 
1 members found this post helpful.
Old 02-20-2015, 11:29 AM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
grep -E '^..([a-z])\1([a-z])\2$' | grep -E -v '^.?(.).*\1'
Excellent! Thank you!

Daniel B. Martin
 
Old 02-20-2015, 11:35 AM   #6
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Would be easier if grep supported lookaheads like perl does:

Code:
perl -ne '/\b(.)(.)(?!.{0,3}(\1|\2))(.)\4(?!.{0,1}\4)(.)\5\b/ && print'
 
Old 02-20-2015, 11:37 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
if you were so kind:
grep -P '\b(.)(.)(?!.{0,3}(\1|\2))(.)\4(?!.{0,1}\4)(.)\5\b'
https://regex101.com/r/nV0qC0/1

Last edited by pan64; 02-20-2015 at 11:44 AM.
 
2 members found this post helpful.
Old 02-20-2015, 11:43 AM   #8
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
The output still differs from the awk expression for these words, if it matters:

Code:
abcccc
aabbcc
 
Old 02-20-2015, 11:44 AM   #9
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Quote:
Originally Posted by pan64 View Post
if you were so kind:
grep -P '\b(.)(.)(?!.{0,3}(\1|\2))(.)\4(?!.{0,1}\4)(.)\5\b'
I read through the grep man page looking for this two times! How come I missed it?
 
Old 02-20-2015, 11:57 AM   #10
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
grep -P '\b(.)(.)(?!.{0,3}(\1|\2))(.)\4(?!.{0,1}\4)(.)\5\b'
Superb! Let's mark this thread SOLVED!

Daniel B. Martin
 
  


Reply

Tags
grep, regexp



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Matching patterns or partial pattern matching yaplej Programming 6 12-16-2012 10:21 AM
[SOLVED] Seeking a clever RegEx for text processing danielbmartin Programming 12 10-17-2012 11:32 AM
how to use regex pattern matching to get data from file? ranjit Programming 4 10-17-2011 02:09 PM
[SOLVED] awk with pipe delimited file (specific column matching and multiple pattern matching) lolmon Programming 4 08-31-2011 12:17 PM
Pattern matching in a bash case statement using regex ciphyre Programming 1 01-31-2009 12:20 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration