shell script: find subwords with 'compilation' word
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
shell script: find subwords with 'compilation' word
Hello,
Can someone please tell me how to write a bash or shell script for the following questions:
Given a word such as compilation, the subword problem can be described as follows: list all of the words which can be created, using only letters in the given word. For example subwords of compilation word include clap, and mop. It would also contain the subword lotion, because there were two o's available to use in the word. In general, if a word has more the two i's or more than two o's or more than one repetition of any other letter of compilation it cannot be a subword.
I have a dictionery with 1000 of words in it, and I need to find all the subword in the compilation.
Only thing that I know is the I need to use grep command with regular expression in it. But I am not quite familiar with regular expression.
on linux (not POSIX) there is a printf function which does random
strings like you require, can't remember what, but strangely I ran across it yesyerday
while looking up vnsprintf.
it's part of the printf family, it's in a man page
If I could think of an easy way
I wouldn't have done it in C
It's not really easy to do character level
stuff in any scripting languages, or grep/sed -like tools.
They are more designed for line-oriented operations and strings.
Also, I think, anything you may do would be horribly messy and contrived and
could be incredibly slow.
typeset words=0
while read str
do
typeset i=0
while [ $i -lt 9 ]
do
count[$i]=0
let i=i+1
done
typeset flag=1
i=0
let j=i+1
while [ $i -lt 64 ] && [ -n ${str:$i:$j} ] && [ ${str:$i:$j}!='\n']
do
if [ ${str:$i:$j}!=c ] && [ ${str:$i:$j}!=o ] && [ ${str:$i:$j}!=m ] && [ ${str:$i:$j}!=p ] && [ ${str:$i:$j}!=i ] && [ ${str:$i:$j}!=l ] && [ ${str:$i:$j}!=a ] && [ ${str:$i:$j}!=t ] && [ ${str:$i:$j}!=n ]
then
flag=0
break
fi
let i=i+1
let j=i+1
done
if [ $flag==0 ]
then
continue
fi
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==c ]
then let count[0]=count[0]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==o ]
then let count[1]=count[1]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==m ]
then let count[2]=count[2]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==p ]
then let count[3]=count[3]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==i ]
then let count[4]=count[4]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==l ]
then let count[5]=count[5]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==a ]
then let count[6]=count[6]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==t ]
then let count[7]=count[7]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
do
if [ ${str:$i:$j}==n ]
then let count[8]=count[8]+1
fi
let i=i+1
let j=i+1
done
i=0
j=1
while [ $i -lt 9 ]
do
if [ $i!=1 ] && [ $i!=4 ] && [ ${count:$i:$j} -gt 1 ]
then flag=0;
fi
let i=i+1
let j=i+1
done
if [ ${count:1:2} -gt 2 ] && [ ${count:4:5} -gt 2 ]
then flag=0
fi
if [ ${#str} -gt 2 ]
then flag=0
fi
if [ $flag==1 ]
then
echo $str
fi
done < words.txt
Here is the code I wrote, but I am gettingt the following error.
[: ./test.sh 152: unbalanced []
where 152 is the line of the code "done < words.txt"
Can someone please tell me where I am making a mistake?
The code works now with no errors(it was a syntax that missed a space), but there is no output been produced.
Code:
typeset words=0
while read str
do
typeset i=0
while [ $i -lt 9 ]
do
count[$i]=0
let i=i+1
done
typeset flag=1
i=0
let j=i+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}!="c" ] && [ ${str:$i:$j}!="o" ] && [ ${str:$i:$j}!="m" ] && [ ${str:$i:$j}!="p" ] && [ ${str:$i:$j}!="i" ] && [ ${str:$i:$j}!="l" ] && [ ${str:$i:$j}!="a" ] && [ ${str:$i:$j}!="t" ] && [ ${str:$i:$j}!="n" ]
then
flag=0
break
fi
let i=i+1
let j=i+1
done
if [ $flag==0 ]
then
continue
fi
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==c ]
then let count[0]=count[0]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==o ]
then let count[1]=count[1]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==m ]
then let count[2]=count[2]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==p ]
then let count[3]=count[3]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==i ]
then let count[4]=count[4]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==l ]
then let count[5]=count[5]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==a ]
then let count[6]=count[6]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==t ]
then let count[7]=count[7]+1
fi
let i=i+1
let j=i+1
done
i=0
let j=j+1
while [ $i -lt 64 ] && [ -n ${str:$i:$j} ] && [ ${str:$i:$j}!="\n" ]
do
if [ ${str:$i:$j}==n ]
then let count[8]=count[8]+1
fi
let i=i+1
let j=i+1
done
i=0
j=1
while [ $i -lt 9 ]
do
if [ $i!=1 ] && [ $i!=4 ] #&& [ ${count:$i:$j} -gt 1 ]
then flag=0;
fi
let i=i+1
let j=i+1
done
if [ $count[1] > 2 ] && [ $count[4] > 2 ]
then flag=0
fi
if [ ${#str} > 2 ]
then flag=0
fi
if [ $flag==1 ]
then echo $str
fi
done < words.txt
Could someone please give me tips on why this is not working?
Suresh, you will not be able to do this using shell tools.
They are not up to the job.
Using my C example I did the anagrams of 'compilation' and it took
11 minutes, and that's just using the whole word not even subwords.
This produces 40 million words; ten million when duplicates are removed.
So including subwords is going to a be 50 million at least maybe.
You will need to use a much more intelligent approach than brute-force.
If 'compilation' produces 50 million combinations then obviously this is
too large a list.
You will need to apply rules to remove impossible combinations, like
say, three letters in a row.
If you prune it down to maybe 1 million It is still larger than your dictionary!
As the dictionary is a very small subset of possible 'words'
maybe turn it on it's head, and use the dictionary entries to search the word.
See if each word in the dictionary can be made from the target word.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.