LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-21-2016, 12:10 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Seeking an improved RegEx


I have a file of English words, one word on each line.

As a learning exercise I coded this awk ...
Code:
echo; echo "Find four-letter words containing V, using AWK."
awk -F "" '(NF==4 && index($0,"v"))' $CleanWords >$OutFile
... and it works.

I coded this grep couplet ...
Code:
echo; echo "Find four-letter words containing V, using GREP."
grep "^....$" $CleanWords |grep "v" >$OutFile
... and it works but maybe it can be done in one line.

I coded this grep ...
Code:
echo; echo "Find four-letter words containing V, using GREP."
egrep '(^v...$)|(^.v..$)|(^..v.$)|(^...v$)' $CleanWords >$OutFile
... and it works but is an unappealing brute-force solution.

Is there a clever RegEx to handle this task?

Daniel B. Martin
 
Old 08-21-2016, 12:53 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
This sed comes quickly to mind.

Code:
sed -n /^....$/s/v/v/p $CleanWords
You might change the address to /^.{4}$/ but it is the same number of characters and harder to type than ...., and you will need to escape the curly braces I think.

Last edited by astrogeek; 08-21-2016 at 01:30 PM.
 
1 members found this post helpful.
Old 08-21-2016, 01:14 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
The awk can be a little shorter:
Code:
awk -F '' 'NF == 4 && /v/' file
Your last grep can be shorter just by omission of repeating terms:
Code:
grep -E '^(v...|.v..|..v.|...v)$' file
 
1 members found this post helpful.
Old 08-21-2016, 01:53 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by astrogeek View Post
You might change the address to /^.{4}$/ but it is the same number of characters and harder to type than ...., and you will need to escape the curly braces I think.
Your suggestion works nicely ...
Code:
echo "Find four-letter words containing V, using SED with curly braces."
echo "Method of LQ Senior Member astrogeek."
sed -n '/^.\{4\}$/s/v/v/p' $CleanWords >$OutFile
... and lends itself to being a more general solution such as 9-character words containing "w."

Nitpick: the sed solution makes a v-for-v substitution which is not needed for the awk or grep solutions.

Counter-argument to the Nitpick: depending on the overall purpose of the code it might be desirable to make that "v" stand out by coding it this way ...
Code:
echo "Find four-letter words containing V, using SED with curly braces."
echo "Method of LQ Senior Member astrogeek."
sed -n '/^.\{4\}$/s/v/V/p' $CleanWords >$OutFile
Daniel B. Martin
 
Old 08-21-2016, 02:17 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Without the substitution the sed does not work ... remember, it is how the tool functions that is the key and not how it compares to another commands process.

You can add -r to the sed switches and the braces should not need to be escaped.
 
2 members found this post helpful.
Old 08-21-2016, 02:40 PM   #6
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
With grep -P
Code:
grep -P '(?=^....$).*v' $CleanWords

Last edited by keefaz; 08-21-2016 at 02:50 PM.
 
2 members found this post helpful.
Old 08-21-2016, 03:29 PM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by danielbmartin View Post
... and lends itself to being a more general solution such as 9-character words containing "w."

Nitpick: the sed solution makes a v-for-v substitution which is not needed for the awk or grep solutions.

Counter-argument to the Nitpick: depending on the overall purpose of the code it might be desirable to make that "v" stand out...
I had played with it for words of various length, but the question was for four letters, so I let the moment pass...

With s/v/&/p it makes one less edit for substituting alternate characters as well.

But I had not thought of the caps replacement, nice touch!

I also seem to never remember -r for sed, it just doesn't stick in my remaining brain cell - thanks grail!

Last edited by astrogeek; 08-21-2016 at 04:05 PM. Reason: typo
 
Old 08-21-2016, 03:46 PM   #8
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by grail View Post
Without the substitution the sed does not work ... remember, it is how the tool functions that is the key and not how it compares to another commands process.
Actually, sed can use the same approach as awk here:

Code:
sed -n '/^....$/{/v/p}' input
 
2 members found this post helpful.
Old 08-21-2016, 08:37 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I am impressed by the ingenuity of the various solutions and improvements-on-solutions offered by all the contributors to this thread. Thanks to astrogeek, grail, keefaz, ntubski. Kudos and reps all around. This thread is marked SOLVED!

Daniel B. Martin
 
Old 08-22-2016, 04:50 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
Originally Posted by ntubski View Post
Actually, sed can use the same approach as awk here:

Code:
sed -n '/^....$/{/v/p}' input
I did think of that later Good pick up.
 
Old 08-22-2016, 07:12 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
hm. really nice. But using awk the "natural" way would be:
Code:
awk ' length == 4 && /v/ ' input
# and obviously perl can be used too
Code:
perl -ne '/(?=^....$).*v/ && print' input
 
1 members found this post helpful.
  


Reply

Tags
regex



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Confusing issue with Perl regEx - Regex check seems to require variable being set EnderX Programming 1 09-07-2013 04:36 AM
[SOLVED] Seeking a clever RegEx for text processing danielbmartin Programming 12 10-17-2012 11:32 AM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 08:08 PM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration