ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Have: a file with many 6-character words, one per line.
Have: a key word with 6 characters. Example: deront
Want: a file containing all of the words from the original file which contain all of the letters in the key. Order is not important.
The key word may be thought of as a string of "Must Have" letters. denote fails because it has no R. redone fails because it has no T. rodent passes the test because it contains all of the MH letters.
If the key contains 2 Rs then qualifying words must likewise have 2 Rs.
I've tried grep without success; tried tr without success.
As a matter of personal coding style I strive to avoid explicit loops.
Ideas?
Daniel B. Martin
Last edited by danielbmartin; 09-21-2015 at 08:54 PM.
The simple idea that comes to mind is that words in question differ only by order of letters. So, if we sort letters in each word then simple comparison will tell you whether the word passes or not.
Code:
$ echo deront | sed 's/./&\n/g' | sort | tr -d '\n'
denort
$ echo rodent | sed 's/./&\n/g' | sort | tr -d '\n'
denort # this word matches
$ echo denote | sed 's/./&\n/g' | sort | tr -d '\n'
deenot # does not match
(I added newlines at end of output manually for clarity)
The simple idea that comes to mind is that words in question differ only by order of letters. So, if we sort letters in each word then simple comparison will tell you whether the word passes or not.
And here is one version in Python, basically just ripping off firstfire's splendid approach.
With this ("dbm.txt") as infile:
Code:
$ cat dbm.txt
denote
redone
rodent
And this Python script:
Code:
#!/usr/bin/env python3
import sys
key = list(sys.argv[1])
with open("dbm.txt") as infile:
for line in infile:
sortedLine = "".join(sorted(list(line))).strip('\n')
if "".join(sorted(key)) == sortedLine:
print(line)
Upon execution, I get this output:
Code:
$ ./must_have.py deront
rodent
Thanks for a cool little exercise. All credit goes to firstfire for the approach.
Best regards,
HMW
Last edited by HMW; 09-22-2015 at 04:46 AM.
Reason: Minor spelling error
Yes, you are absolutely right. In hindsight, that was clumsy of me. I normally try to keep those kind of operations outside of loops.
The new, improved, code then becomes:
Code:
#!/usr/bin/env python3
import sys
key = list(sys.argv[1])
sortedKey = "".join(sorted(key))
with open("dbm.txt") as infile:
for line in infile:
sortedLine = "".join(sorted(list(line))).strip('\n')
if sortedKey == sortedLine:
print(line)
It's possible to just count the letters without sorting (skeleton borrowed from HMW, untested):
Code:
#!/usr/bin/env python3
import sys
from collections import Counter
key = Counter(sys.argv[1])
with open("dbm.txt") as infile:
for line in infile:
if key == Counter(line.rstrip("\n\r")):
print(line)
Last edited by ntubski; 09-22-2015 at 12:47 PM.
Reason: add fix from grail
@ntubski - not only must I be doing it wrong as your code didn't seem to return anything for me , but if you are only counting the letters, won't they all return (sorry, not real familiar with python)
I did the ruby version, but it is boring as looks almost like the perl one (perl was faster though but I believe ruby is still working on its io stuff):
Code:
#!/usr/bin/env ruby
def sortletters(w)
w.downcase.split(//).sort.join
end
key = sortletters(ARGV[0])
IO.foreach(ARGV[1]) do |word|
puts word if key == sortletters(word.chomp)
end
#!/usr/bin/env bash
sortletters()
{
local array
for (( i = 0; i < ${#1}; i++ ))
do
array+=(${1:i:1})
done
printf "c\n" ${array[@]} | sort | tr '\n' '\0'
}
key=$(sortletters "$1")
while read line
do
[[ "$key" == "$(sortletters "$line")" ]] && echo "$line"
done<"$2"
On my machine:
Code:
Perl:
real 0m0.136s
user 0m0.133s
sys 0m0.000s
Ruby:
real 0m0.485s
user 0m0.467s
sys 0m0.013s
Python:
real 0m0.557s
user 0m0.543s
sys 0m0.013s
Bash:
real 1m54.069s
user 0m5.203s
sys 0m8.727s
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.