LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-29-2005, 01:00 PM   #1
suresheva31
LQ Newbie
 
Registered: Oct 2004
Posts: 25

Rep: Reputation: 15
shell script: find subwords with 'compilation' word


Hello,

Can someone please tell me how to write a bash or shell script for the following questions:


Given a word such as compilation, the subword problem can be described as follows: list all of the words which can be created, using only letters in the given word. For example subwords of compilation word include clap, and mop. It would also contain the subword lotion, because there were two o's available to use in the word. In general, if a word has more the two i's or more than two o's or more than one repetition of any other letter of compilation it cannot be a subword.

I have a dictionery with 1000 of words in it, and I need to find all the subword in the compilation.

Only thing that I know is the I need to use grep command with regular expression in it. But I am not quite familiar with regular expression.

Any help would much much appreicatied.

Thanks


Suresh
 
Old 09-29-2005, 01:07 PM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Homework?
What code did you write so far, and what's the problem with it?
 
Old 09-29-2005, 03:20 PM   #3
suresheva31
LQ Newbie
 
Registered: Oct 2004
Posts: 25

Original Poster
Rep: Reputation: 15
Quote:
Originally posted by Hko
Homework?
What code did you write so far, and what's the problem with it?

Nope, this is for work? I need to find something for my work. that is why?
 
Old 09-29-2005, 03:54 PM   #4
addy86
Member
 
Registered: Nov 2004
Location: Germany
Distribution: Debian Testing
Posts: 332

Rep: Reputation: 31
What you're looking for is an anagram generator. Search the net for it.
 
Old 09-30-2005, 02:38 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,283

Rep: Reputation: 172Reputation: 172
on linux (not POSIX) there is a printf function which does random
strings like you require, can't remember what, but strangely I ran across it yesyerday
while looking up vnsprintf.

it's part of the printf family, it's in a man page
 
Old 09-30-2005, 09:12 AM   #6
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,283

Rep: Reputation: 172Reputation: 172
I got intrigued so knocked this recursive solution up.
It doesn't filter duplicates.

Could it be more elegant? any offers?

Here's an anagram creator. Unfort. it's not quite what you want as
you need sub-words too.
should have read the spec. properly. phew!

Code:
#include<stdio.h>
#include<string.h>

#define BUFSZ 1023

/* 
** (c) George W. Bush
*/

void anagramerizerate(char * so_far, char * remains)
{
    char right_buf[BUFSZ + 1];
    char left_buf[BUFSZ + 1];

    char * p = right_buf;
    static char blob[2];
    int done = 0;

    strncpy (left_buf, so_far, BUFSZ);
    strncpy (right_buf, remains, BUFSZ);

    for (p=right_buf; *p; p++) {

	if (*p == '_') {
		continue;
	}

	blob[0] = *p;
	strcat (left_buf, blob);
	*p = '_';
	anagramerizerate(left_buf, right_buf);
	done++;

	strncpy (left_buf, so_far, BUFSZ);
	strncpy (right_buf, remains, BUFSZ);
    }
   if(!done) puts(so_far);

}

void anagram(const char * string)
{
	anagramerizerate( "", string);
}

int main(int argc, char ** argv, char ** envp)
{
	if (--argc) {

		for(argv++; argc; argc--, argv++) {
			anagram(*argv);
		}
	}

return(0);
}
Code:
$ anagram how do
how
hwo
ohw
owh
who
woh
do
od
 
Old 10-01-2005, 10:40 AM   #7
suresheva31
LQ Newbie
 
Registered: Oct 2004
Posts: 25

Original Poster
Rep: Reputation: 15
Quote:
Originally posted by bigearsbilly
I got intrigued so knocked this recursive solution up.
It doesn't filter duplicates.

Could it be more elegant? any offers?

Here's an anagram creator. Unfort. it's not quite what you want as
you need sub-words too.
should have read the spec. properly. phew!

Code:
#include<stdio.h>
#include<string.h>

#define BUFSZ 1023

/* 
** (c) George W. Bush
*/

void anagramerizerate(char * so_far, char * remains)
{
    char right_buf[BUFSZ + 1];
    char left_buf[BUFSZ + 1];

    char * p = right_buf;
    static char blob[2];
    int done = 0;

    strncpy (left_buf, so_far, BUFSZ);
    strncpy (right_buf, remains, BUFSZ);

    for (p=right_buf; *p; p++) {

	if (*p == '_') {
		continue;
	}

	blob[0] = *p;
	strcat (left_buf, blob);
	*p = '_';
	anagramerizerate(left_buf, right_buf);
	done++;

	strncpy (left_buf, so_far, BUFSZ);
	strncpy (right_buf, remains, BUFSZ);
    }
   if(!done) puts(so_far);

}

void anagram(const char * string)
{
	anagramerizerate( "", string);
}

int main(int argc, char ** argv, char ** envp)
{
	if (--argc) {

		for(argv++; argc; argc--, argv++) {
			anagram(*argv);
		}
	}

return(0);
}
Code:
$ anagram how do
how
hwo
ohw
owh
who
woh
do
od

Is there is any other way we could do this using grep command and regualr expressions.

Suresh
 
Old 10-03-2005, 03:41 AM   #8
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,283

Rep: Reputation: 172Reputation: 172
If I could think of an easy way
I wouldn't have done it in C


It's not really easy to do character level
stuff in any scripting languages, or grep/sed -like tools.
They are more designed for line-oriented operations and strings.
Also, I think, anything you may do would be horribly messy and contrived and
could be incredibly slow.

But I'm willing to be shown otherwise!
 
Old 10-03-2005, 08:33 AM   #9
sajith
Member
 
Registered: Sep 2005
Location: kannur
Posts: 59

Rep: Reputation: 15
hai suresh

please go through the linux man page to know more about grep command

there also egrep and fgrep commands are there


to know about more linux commands please go through the following link

http://linuxreviews.org/man/
 
Old 10-03-2005, 09:46 PM   #10
suresheva31
LQ Newbie
 
Registered: Oct 2004
Posts: 25

Original Poster
Rep: Reputation: 15
Code:
typeset words=0
while read str	
do
	typeset i=0
	while [ $i -lt 9 ]
	do
		count[$i]=0
		let i=i+1
	done
	typeset flag=1
	i=0
	let j=i+1
	while [ $i -lt 64 ] && [ -n ${str:$i:$j} ] && [ ${str:$i:$j}!='\n']
	do
		if [ ${str:$i:$j}!=c ] && [ ${str:$i:$j}!=o ] && [ ${str:$i:$j}!=m ] && [ ${str:$i:$j}!=p ] && [ ${str:$i:$j}!=i ] && [ ${str:$i:$j}!=l ] && [ ${str:$i:$j}!=a ] && [ ${str:$i:$j}!=t ] && [ ${str:$i:$j}!=n ]
		then
			flag=0
			break
		fi
		let i=i+1
		let j=i+1
	done

	if [ $flag==0 ]
	then
		continue
	fi

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==c ] 
		then let count[0]=count[0]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==o ] 
		then let count[1]=count[1]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==m ] 
		then let count[2]=count[2]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==p ] 
		then let count[3]=count[3]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==i ] 
		then let count[4]=count[4]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==l ] 
		then let count[5]=count[5]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==a ] 
		then let count[6]=count[6]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==t ] 
		then let count[7]=count[7]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!='\0' ] && [ ${str:$i:$j}!='\n' ]
	do
		if [ ${str:$i:$j}==n ] 
		then let count[8]=count[8]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	j=1
	while [ $i -lt 9 ]
	do
		if [ $i!=1 ] && [ $i!=4 ] && [ ${count:$i:$j} -gt 1 ]
		then flag=0;
		fi
		let i=i+1
		let j=i+1
	done
	
	if [ ${count:1:2} -gt 2 ] && [ ${count:4:5} -gt 2 ]
	then flag=0
	fi

	if [ ${#str} -gt 2 ]
	then flag=0
	fi

	if [ $flag==1 ]
	then
		echo $str
	fi

done < words.txt
Here is the code I wrote, but I am gettingt the following error.

[: ./test.sh 152: unbalanced []

where 152 is the line of the code "done < words.txt"

Can someone please tell me where I am making a mistake?


thanks

suresh
 
Old 10-03-2005, 10:27 PM   #11
suresheva31
LQ Newbie
 
Registered: Oct 2004
Posts: 25

Original Poster
Rep: Reputation: 15
The code works now with no errors(it was a syntax that missed a space), but there is no output been produced.

Code:
typeset words=0
while read str
do
	typeset i=0
	while [ $i -lt 9 ]
	do
		count[$i]=0
		let i=i+1
	done
	typeset flag=1
	i=0
	let j=i+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}!="c" ] && [ ${str:$i:$j}!="o" ] && [ ${str:$i:$j}!="m" ] && [ ${str:$i:$j}!="p" ] && [ ${str:$i:$j}!="i" ] && [ ${str:$i:$j}!="l" ] && [ ${str:$i:$j}!="a" ] && [ ${str:$i:$j}!="t" ] && [ ${str:$i:$j}!="n" ]
		then
			flag=0
			break	
		fi
		let i=i+1
		let j=i+1
	done

	if [ $flag==0 ]
	then
		continue
	fi

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==c ] 
		then let count[0]=count[0]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==o ] 
		then let count[1]=count[1]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==m ] 
		then let count[2]=count[2]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==p ] 
		then let count[3]=count[3]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==i ] 
		then let count[4]=count[4]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==l ] 
		then let count[5]=count[5]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==a ] 
		then let count[6]=count[6]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ ${str:$i:$j}!="\0" ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==t ] 
		then let count[7]=count[7]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	let j=j+1
	while [ $i -lt 64 ] && [ -n ${str:$i:$j} ] && [ ${str:$i:$j}!="\n" ]
	do
		if [ ${str:$i:$j}==n ] 
		then let count[8]=count[8]+1
		fi
		let i=i+1
		let j=i+1
	done

	i=0
	j=1
	while [ $i -lt 9 ]
	do
		if [ $i!=1 ] && [ $i!=4 ] #&& [ ${count:$i:$j} -gt 1 ]
		then flag=0;
		fi
		let i=i+1
		let j=i+1
	done
	
	if [ $count[1] > 2 ] && [ $count[4] > 2 ]
	then flag=0
	fi

	if [ ${#str} > 2 ]
	then flag=0
	fi

	if [ $flag==1 ]
	then echo $str
	fi

done < words.txt

Could someone please give me tips on why this is not working?
 
Old 10-05-2005, 04:17 AM   #12
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,283

Rep: Reputation: 172Reputation: 172
Suresh, you will not be able to do this using shell tools.
They are not up to the job.

Using my C example I did the anagrams of 'compilation' and it took
11 minutes, and that's just using the whole word not even subwords.
This produces 40 million words; ten million when duplicates are removed.
So including subwords is going to a be 50 million at least maybe.

You will need to use a much more intelligent approach than brute-force.

If 'compilation' produces 50 million combinations then obviously this is
too large a list.
You will need to apply rules to remove impossible combinations, like
say, three letters in a row.

If you prune it down to maybe 1 million It is still larger than your dictionary!

As the dictionary is a very small subset of possible 'words'
maybe turn it on it's head, and use the dictionary entries to search the word.
See if each word in the dictionary can be made from the target word.

It's not a trivial problem.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell Script: Find "Word" Run "Command" granatica Linux - Software 5 07-25-2007 07:42 AM
find shell script help liren Linux - Newbie 3 05-02-2005 03:05 PM
how to find the pid of a perl script from shell script toovato Linux - General 1 12-19-2003 06:25 PM
How to read ans parse MS word file using a Linux Shell script. Alek Linux - General 2 11-10-2003 02:07 PM
Linux can't find a shell script?? jt1020 Linux - General 4 04-27-2003 08:27 AM


All times are GMT -5. The time now is 02:58 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration