LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-17-2020, 09:46 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Spelling Bee (text processing)


This is a learning exercise done just for fun.
It is inspired by a NYTimes word puzzle called Spelling Bee
written by Patrick Berry.

Have: a file of English words called WordList.

Have: a string of 7 characters called Hive.

Want:
Step 1...
Find words of length >4 letters which use ONLY the letters in
the string "hive" and MUST use the first letter in "hive".
Step 2...
Find words which meet the criteria in Step 1,
and use ALL of the letters in "hive".

This is my "brute force" solution.

Code:
#!/bin/bash   Daniel B. Martin   Apr20

# Step 1...
# Find words of length >4 letters which use ONLY the letters in
#   the string "hive" and MUST use the first letter in "hive".
# Step 2...
# Find words which meet the criteria in Step 1,
#   and use ALL of the letters in "hive".

# File identification
    Path=${0%%.*}
    Only=$Path"only.txt"
     All=$Path"all.txt"
WordList='/usr/share/dict/words'

hive='luenopt'

echo 'Words which use only the letters in "'$hive'"'
echo '  and contain the letter "'${hive:0:1}'".'
sed -n '/^.\{5\}/p' $WordList  \
|tr -c $hive"\n" "~"           \
|grep -v "~"                   \
|grep ${hive:0:1}              \
>$Only
cat $Only

echo; echo 'Words which use all of the letters in "'$hive'".'
 grep "${hive:0:1}" <$Only \
|grep "${hive:1:1}" \
|grep "${hive:2:1}" \
|grep "${hive:3:1}" \
|grep "${hive:4:1}" \
|grep "${hive:5:1}" \
|grep "${hive:6:1}" \
>$All
cat $All

echo; echo "Normal end of job."; echo; exit
It produces this result:
Code:
Words which use only the letters in "luenopt"
  and contain the letter "l".
elope
letup
lotto
nettle
opulent
outlet
pellet
people
pollen
pollute
pullet
pullout
topple
tulle
tunnel

Words which use all of the letters in "luenopt".
opulent

Normal end of job.
I suspect there is a cleaner better faster way.
Ideas? Suggestions?

Daniel B. Martin

.
 
Old 04-17-2020, 12:00 PM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,989

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
ok, construct the following regexp:
^first letter[all letters]{3,}
in your case it will be: grep -w 'l[luenopt]{3,}' $WordList

The second one is a bit more difficult, but pretty easy for example in python.

Last edited by pan64; 04-17-2020 at 12:13 PM.
 
Old 04-17-2020, 01:18 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
ok, construct the following regexp:
^first letter[all letters]{3,}
in your case it will be: grep -w 'l[luenopt]{3,}' $WordList
On my machine (Linux Mint 17.2) this grep ...
Code:
grep -w 'l[luenopt]{3,}' $WordList >$Only
... produced no result, and this egrep ...
Code:
egrep -w 'l[luenopt]{3,}' $WordList >$Only
... produced this result ...
Code:
lent
lept
letup
letup's
loll
lone
loon
loon's
loop
loop's
loot
loot's
lope
lope's
lotto
lotto's
lout
lout's
lull
lull's
lute
lute's
Note that the problem statement calls for words of length >4 letters which contain the first letter in "hive" but your solution produced words which begin with that letter.

A one-liner would be an impressive solution. Perhaps you can rework yours.

Daniel B. Martin

.
 
Old 04-18-2020, 03:29 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,989

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
you could do that easily:
Code:
grep l $WordList | grep -E '[luenopt]{4,}'
It is your job to make that hive configurable.
Code:
#!/usr/bin/python3
import sys

hive = sys.argv[1]
wordlist = sys.argv[2]

def sort_it(s: str):
    return ''.join(sorted(set(s)))

def equal(s: str):
    return hive_s == sort_it(s)

hive_s = sort_it(hive)

with open(wordlist, "r") as w:
    for line in w:
        if (equal(line.strip())):
            print(line.strip())
this does not take care about the length, but can be easily added.
 
1 members found this post helpful.
Old 04-18-2020, 10:39 AM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
you could do that easily:
Code:
grep l $WordList | grep -E '[luenopt]{4,}'
Thank you for contributing to this brain-teaser thread.

Perhaps I have not communicated well. To restate the first step in this problem:
Code:
Find words of length >4 letters which use ONLY the letters in
the string "hive" and MUST use the first letter in "hive".
Your code produced a file of words all of which contain the letter "l" but many contain letters which are not in the hive.

Daniel B. Martin

.
 
Old 04-18-2020, 11:28 AM   #6
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,989

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
so add $ at the end of the regexp
Code:
grep l $WordList | grep -E '^[luenopt]{4,}$'
 
Old 04-18-2020, 12:13 PM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by pan64 View Post
so add $ at the end of the regexp
Code:
grep l $WordList | grep -E '^[luenopt]{4,}$'
Same as before. The output file contains lots of words containing letters which are not in the hive. This is a small part of the result to illustrate the problem...
Code:
velveteen
violent
violet
violoncello
virulent
wallet
wallop
walnut
watermelon
I'm using ...
Code:
daniel@Daniel ~ $ grep --version
grep (GNU grep) 2.16
Copyright (C) 2014 Free Software Foundation, Inc.

Daniel B. Martin

.
 
Old 04-18-2020, 12:43 PM   #8
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
You probably forgot ^ at the start of the second grep expression:
grep -E '^[luenopt]{4,}$'
 
1 members found this post helpful.
Old 04-18-2020, 02:19 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by shruggy View Post
You probably forgot ^ at the start of the second grep expression:
grep -E '^[luenopt]{4,}$'
Ding ding ding ding ding! We have a winner!

Thank you, shruggy, for this breakthrough.

One minor change was needed. {4,} was changed to {5,}.

Now, bright minds, can you offer a streamlined way to perform Step #2?

Daniel B. Martin
 
Old 04-18-2020, 02:27 PM   #10
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
Well, what's wrong with the Python script suggested by pan64 above? Sure, you could do it as a one-liner, but it would look just as ugly as five greps chained one after another:
Code:
egrep '^[luenopt]{5,}$' /usr/share/dict/words |
awk -vh=luenopt '{m=1;for(i=1;i<=length(h);i++)if(!match($0,substr(h,i,1)))m=0;if(m)print}'
The same, but formatted to be more readable:
Code:
#!/usr/bin/awk -f

BEGIN {
        h="luenopt"
}
{
        m=1
        for (i=1; i<=length(h); i++)
                if ( ! match($0, substr(h, i, 1)) )
                        m=0
        if (m)
                print
}
or like this
Code:
#!/usr/bin/awk -f

BEGIN {
        split("luenopt", hive, "")
}
{
        m=1
        for (i in hive)
                if ( ! match($0, hive[i]) )
                        m=0
        if (m)
                print
}

Last edited by shruggy; 04-18-2020 at 02:46 PM.
 
Old 04-18-2020, 02:34 PM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,989

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
did you check the solution written in python? There is a tricky function named sort_it inside.

I will help you to rewrite this script in [pure] bash - if you wish. It is quite simple, the only exception is that function. I don't know if there was any ready-made tool doing the same, so need to be implemented (either this or something else to do the work).
 
Old 04-18-2020, 04:14 PM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Thank you, all, for references to Python. I don't know that language and am still working toward mastery of Linux commands such as grep.

I wrote a solution to this "hive" problem in awk. I'll post that for review and comment after arriving at an optimal solution to that shown in post #1 of this thread.

Daniel B. Martin
 
Old 04-19-2020, 04:02 AM   #13
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,989

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
here is pure bash solution for the first question
Code:
hive="luenopt"
wordlist=/tmp/wordlist

while read -r word
do
    [[ ${#word} -gt 5 ]] || continue
    [[ $word =~ ${hive:0:1} ]] || continue
    [[ $word =~ [^$hive] ]] && continue
    echo word
done < $wordlist
[obviously] grep is faster.
To the second you need to add a check if all the letters are in use, but the first two conditions become superfluous
Code:
#!/bin/bash
hive="luenopt"
wordlist=/tmp/words.txt

while read -r word
do
    [[ $word =~ [^$hive] ]] && continue
    wrong=0
    for i in {0..6}
    do
        [[ $word =~ ${hive:$i:1} ]] || wrong=1
    done
    [[ $wrong == 1 ]] && continue
    echo $word
done < $wordlist
 
1 members found this post helpful.
Old 04-19-2020, 05:06 AM   #14
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,677

Rep: Reputation: Disabled
To the second, you also could do something like this:
Code:
#!/bin/bash

wordlist=/usr/share/dict/words
hivestring=luenopt
declare -a hive=( $(sed 's/./& /g' <<<$hivestring) )

grep -E "^[$hivestring]{5,}$" "$wordlist" |
while read word
do
  for letter in ${hive[@]}
  do
    [[ $word =~ $letter ]] && continue 1 || continue 2
  done
  echo $word
done

Last edited by shruggy; 04-19-2020 at 10:55 AM.
 
1 members found this post helpful.
Old 04-19-2020, 03:30 PM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
We are getting closer to an ideal solution!
This code ...

Code:
WordList='/usr/share/dict/words'

hive='luenopt'

echo 'Words which use only the letters in "'$hive'"'
echo '  and contain the letter "'${hive:0:1}'".'
 grep l $WordList  \
|grep -E "^[$hive]{5,}$" >$Only
cat $Only

echo; echo 'Words which use all of the letters in "'$hive'".'
 grep -v -P '(.).*\1' <$Only \
|sed -n '/^.\{7\}/p'
>$All
cat $All
... produced this result ...
Code:
Words which use only the letters in "luenopt"
  and contain the letter "l".
elope
letup
lotto
nettle
opulent
outlet
pellet
people
pollen
pollute
pullet
pullout
topple
tulle
tunnel

Words which use all of the letters in "luenopt".
opulent
To polish this apple even more,
- can the two grep commands in step 1 be combined?
- can the grep RexEx in step 2 be changed to produce
only words of >6 characters, and then eliminate the sed?

Daniel B. Martin

.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Scientific Audio Processing, Part III - How to apply Advanced Mathematical Processing Effects on Audio files with Octave 4.0 on Ubuntu LXer Syndicated Linux News 0 06-22-2016 06:12 PM
LXer: Scientific Audio Processing, Part II - How to make basic Mathematical Signal Processing in Audio files using Ubuntu with Octave 4.0 LXer Syndicated Linux News 0 06-20-2016 11:51 AM
Nu Bee ahchewyy Linux - Newbie 9 03-10-2006 01:02 PM
New-BEE question. How can I install something? MrPolite Linux - Software 4 06-19-2002 09:14 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration