LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-08-2021, 09:12 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Seeking 7-letter words containing 2 Us


This topic is only a learning exercise.

Wanted: a list of English words which...
- are 7 letters long
- contain exactly 2 Us in any sequence

Examples: fulcrum and surplus are good
but cumulus and unusual are not.

My solution...
Code:
WordList='/usr/share/dict/words'
 sed -rn '/^.{7}$/p' $WordList  \
|sed -rn '/u.*u/p'              \
|sed -r  '/u.*u.*u/d'
This works but it seems clumsy.

Brainteaser: is there an elegant RegEx which could do this with only one sed?

Daniel B. Martin

.
 
Old 08-08-2021, 10:41 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,717
Blog Entries: 11

Rep: Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748
This comes quickly to mind...

Code:
sed -nr '/^.{7}$/{/^[^u]*u[^u]*u[^u]*$/p}' /usr/share/dict/words
...but not very pretty - probably not a candidate for "elegant"!

Last edited by astrogeek; 08-08-2021 at 11:37 PM. Reason: Remove redundant d
 
1 members found this post helpful.
Old 08-09-2021, 09:50 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Original Poster
Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Quote:
Originally Posted by astrogeek View Post
Code:
sed -nr '/^.{7}$/{/^[^u]*u[^u]*u[^u]*$/p}' /usr/share/dict/words
...but not very pretty - probably not a candidate for "elegant"!
Ding ding ding ding ding! We have a winner!

Pretty? No.
Elegant? Not really.
Excellent? Yes!

A good definition of Technical Excellence:
Completeness of function coupled with economy of means.

Thank you for the education!

Daniel B. Martin

.
 
Old 08-09-2021, 01:35 PM   #4
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_12{.0|.1}
Posts: 5,717
Blog Entries: 11

Rep: Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748Reputation: 3748
Thank U2!
 
1 members found this post helpful.
Old 08-10-2021, 09:34 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,832

Rep: Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089
How about:
Code:
awk 'length == 7 && split($0,_,"u") > 2' /usr/share/dict/words
#OR
awk -Fu 'length == 7 && NF > 2' /usr/share/dict/words
 
3 members found this post helpful.
Old 08-10-2021, 10:10 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Original Poster
Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Quote:
Originally Posted by grail View Post
Code:
awk 'length == 7 && split($0,_,"u") > 2' /usr/share/dict/words
I was looking for a sed solution but an awk solution is certainly welcome.

With this WordList ...
Code:
nothing
fulcrum
surplus
cumulus
unusual
abcdefg
ubudefg
abcuefu
ubudufg
abuuefu
abuuufg
uuudefg
uuuuefg
ubudufu
abcduuu
ubudufu
ubudef
ubudefgh
... this awk ...
Code:
awk 'length == 7 && split($0,_,"u") > 2' $WordList
... produced this result ...
Code:
fulcrum
surplus
cumulus
unusual
ubudefg
abcuefu
ubudufg
abuuefu
abuuufg
uuudefg
uuuuefg
ubudufu
abcduuu
ubudufu
It fails the criterion of finding words with exactly two Us.

The second awk has the same shortcoming.


Daniel B. Martin

.
 
Old 08-10-2021, 10:16 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,832

Rep: Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089Reputation: 3089
Quote:
It fails the criterion of finding words with exactly two Us.
That is my bad for the misread, obviously just change the sign from > to ==
 
1 members found this post helpful.
Old 08-10-2021, 01:56 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Original Poster
Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Quote:
Originally Posted by grail View Post
That is my bad for the misread, obviously just change the sign from > to ==
This didn't produce the desired result ...
Code:
awk 'length == 7 && split($0,_,"u") == 2' $WordList
... but this did.
Code:
awk 'length == 7 && split($0,_,"u") == 3' $WordList
Daniel B. Martin

.
 
1 members found this post helpful.
Old 08-11-2021, 05:38 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Original Poster
Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
grail's awk solution has an advantage: scalability. Suppose some industrial application dealt with electronic signals instead of English words. The problem might be to identify lines of length 112 with exactly 27 "U"s. The sed solution would be unwieldy but the awk solution would have two numbers change.

Daniel B. Martin

.
 
Old 08-11-2021, 05:49 AM   #10
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 2,552

Rep: Reputation: Disabled
There are grep-like tools that allow either matching against several patterns at once (ugrep, faster than awk, but slower than sed)
Code:
ug -x% '.{7} [^u]*u[^u]*u[^u]*' $WordList
or constraining the match to selected regions via an additional regex (greple, slow)
Code:
greple --inside '^.{7}$' --strict '^[^u]*u[^u]*u[^u]*$' $WordList
 
Old 08-11-2021, 05:59 AM   #11
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,862

Rep: Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694Reputation: 5694
I would suggest a different approach (although it is already solved).
You can always [try to] invert the requirement[s], sometimes that can be implemented easier (and you do not need to pipe sed[s] into each other).
I mean something like this:
Code:
sed -rn '/^.{7}$/!d;
         /u.*u.*u/d;
         /u.*u/p'
 
Old 08-11-2021, 06:15 AM   #12
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 2,552

Rep: Reputation: Disabled
^ -n is unnecessary in the latter case
Code:
sed '/^.\{7\}$/!d;/u.*u.*u/d;/u.*u/!d'
 
Old 08-11-2021, 06:32 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,923

Rep: Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628Reputation: 3628
Readability - and the aforementioned scalability - goes a long way to nailing awk as the solution. Its syntax is becoming accepted elsewhere as well - bpftrace for example. Worse ways to spend your time than becoming proficient in it. Thanks grail for getting me interested years ago.
 
1 members found this post helpful.
Old 08-31-2021, 12:17 PM   #14
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Older: Coherent, MacOS, Red Hat, Big Iron IXs: AIX, Solaris, Tru64
Posts: 2,514

Rep: Reputation: 508Reputation: 508Reputation: 508Reputation: 508Reputation: 508Reputation: 508
Quote:
Originally Posted by danielbmartin View Post
This topic is only a learning exercise.

Wanted: a list of English words which...
- are 7 letters long
- contain exactly 2 Us in any sequence

Examples: fulcrum and surplus are good
but cumulus and unusual are not.
With a little work, this could be a helpful tool to help solve those word jumble puzzles in the Sunday paper's Books section. :^)
 
Old 08-31-2021, 01:49 PM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,827

Original Poster
Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
Quote:
Originally Posted by rnturn View Post
With a little work, this could be a helpful tool to help solve those word jumble puzzles in the Sunday paper's Books section. :^)
Quite right! The puzzles in the Sunday NYTimes magazine section sparked my interest in this subject.

Daniel B. Martin

.
 
  


Reply

Tags
regex, sed, text processing


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Words, Words, Words--Introducing OpenSearchServer LXer Syndicated Linux News 0 08-07-2019 02:13 PM
an array containing words in C portia Programming 6 09-27-2010 02:44 AM
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 10:51 AM
copy files containing specific words in a specified line abenmao Linux - Newbie 5 08-28-2008 09:04 AM
Script to move directories based on first letter to a new directory of that letter tworkemon Linux - Newbie 8 01-30-2007 07:18 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration