Spelling Bee (text processing)
This is a learning exercise done just for fun.
It is inspired by a NYTimes word puzzle called Spelling Bee written by Patrick Berry. Have: a file of English words called WordList. Have: a string of 7 characters called Hive. Want: Step 1... Find words of length >4 letters which use ONLY the letters in the string "hive" and MUST use the first letter in "hive". Step 2... Find words which meet the criteria in Step 1, and use ALL of the letters in "hive". This is my "brute force" solution. Code:
#!/bin/bash Daniel B. Martin Apr20 Code:
Words which use only the letters in "luenopt" Ideas? Suggestions? Daniel B. Martin . |
ok, construct the following regexp:
^first letter[all letters]{3,} in your case it will be: grep -w 'l[luenopt]{3,}' $WordList The second one is a bit more difficult, but pretty easy for example in python. |
Quote:
Code:
grep -w 'l[luenopt]{3,}' $WordList >$Only Code:
egrep -w 'l[luenopt]{3,}' $WordList >$Only Code:
lent A one-liner would be an impressive solution. Perhaps you can rework yours. Daniel B. Martin . |
you could do that easily:
Code:
grep l $WordList | grep -E '[luenopt]{4,}' Code:
#!/usr/bin/python3 |
Quote:
Perhaps I have not communicated well. To restate the first step in this problem: Code:
Find words of length >4 letters which use ONLY the letters in Daniel B. Martin . |
so add $ at the end of the regexp
Code:
grep l $WordList | grep -E '^[luenopt]{4,}$' |
Quote:
Code:
velveteen Code:
daniel@Daniel ~ $ grep --version Daniel B. Martin . |
You probably forgot ^ at the start of the second grep expression:
grep -E '^[luenopt]{4,}$' |
Quote:
Thank you, shruggy, for this breakthrough. One minor change was needed. {4,} was changed to {5,}. Now, bright minds, can you offer a streamlined way to perform Step #2? Daniel B. Martin |
Well, what's wrong with the Python script suggested by pan64 above? Sure, you could do it as a one-liner, but it would look just as ugly as five greps chained one after another:
Code:
egrep '^[luenopt]{5,}$' /usr/share/dict/words | Code:
#!/usr/bin/awk -f Code:
#!/usr/bin/awk -f |
did you check the solution written in python? There is a tricky function named sort_it inside.
I will help you to rewrite this script in [pure] bash - if you wish. It is quite simple, the only exception is that function. I don't know if there was any ready-made tool doing the same, so need to be implemented (either this or something else to do the work). |
Thank you, all, for references to Python. I don't know that language and am still working toward mastery of Linux commands such as grep.
I wrote a solution to this "hive" problem in awk. I'll post that for review and comment after arriving at an optimal solution to that shown in post #1 of this thread. Daniel B. Martin |
here is pure bash solution for the first question
Code:
hive="luenopt" To the second you need to add a check if all the letters are in use, but the first two conditions become superfluous Code:
#!/bin/bash |
To the second, you also could do something like this:
Code:
#!/bin/bash |
We are getting closer to an ideal solution!
This code ... Code:
WordList='/usr/share/dict/words' Code:
Words which use only the letters in "luenopt" - can the two grep commands in step 1 be combined? - can the grep RexEx in step 2 be changed to produce only words of >6 characters, and then eliminate the sed? Daniel B. Martin . |
All times are GMT -5. The time now is 12:31 AM. |