Text processing - "must have" letters
Have: a file with many 6-character words, one per line.
Have: a key word with 6 characters. Example: deront Want: a file containing all of the words from the original file which contain all of the letters in the key. Order is not important. The key word may be thought of as a string of "Must Have" letters. denote fails because it has no R. redone fails because it has no T. rodent passes the test because it contains all of the MH letters. If the key contains 2 Rs then qualifying words must likewise have 2 Rs. I've tried grep without success; tried tr without success. As a matter of personal coding style I strive to avoid explicit loops. Ideas? Daniel B. Martin |
Hi.
The simple idea that comes to mind is that words in question differ only by order of letters. So, if we sort letters in each word then simple comparison will tell you whether the word passes or not. Code:
$ echo deront | sed 's/./&\n/g' | sort | tr -d '\n' |
Hi.
Here is the reasonably fast perl solution based on my previous post: Code:
#!/usr/bin/env perl Code:
$ time /tmp/must-have.pl binary /usr/share/dict/words |
Quote:
Best regards, HMW |
And here is one version in Python, basically just ripping off firstfire's splendid approach.
With this ("dbm.txt") as infile: Code:
$ cat dbm.txt Code:
#!/usr/bin/env python3 Code:
$ ./must_have.py deront Best regards, HMW |
Hi.
The execution will be a bit faster if you move sorting of key out of the loop: Code:
key = list(sys.argv[1]) |
Quote:
The new, improved, code then becomes: Code:
#!/usr/bin/env python3 HMW |
It's possible to just count the letters without sorting (skeleton borrowed from HMW, untested):
Code:
#!/usr/bin/env python3 |
@ntubski - not only must I be doing it wrong as your code didn't seem to return anything for me :(, but if you are only counting the letters, won't they all return (sorry, not real familiar with python)
I did the ruby version, but it is boring as looks almost like the perl one (perl was faster though but I believe ruby is still working on its io stuff): Code:
#!/usr/bin/env ruby |
OK, let's try sed:
Code:
key=deront |
Quote:
Code:
#!/usr/bin/env python3 Quote:
|
It seems when I print the key value it looks like:
Code:
Counter({'t': 1, 'n': 1, 'e': 1, 'd': 1, 'r': 1, 'o': 1}) Tada :) ... turns out you need to strip line endings too: Code:
if key == Counter(line.rstrip("\n\r")): |
And the winner for slowest version ... lol
Code:
#!/usr/bin/env bash Code:
Perl: |
Quote:
It seems forum ate %-sign in your program :) BTW here is another way to convert string into column (e.g. for sorting) Code:
$ grep -o . <<< 'hello' |
Quote:
It is too dense for me to understand. Please give us a step-by-step. Thanks! Daniel B. Martin |
All times are GMT -5. The time now is 04:39 PM. |