LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-30-2009, 12:01 PM   #1
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Rep: Reputation: 0
Post Construct a one-line command which turns a file into a rhyming dictionary


The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.
 
Old 01-30-2009, 12:18 PM   #2
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Can you define "similar endings" as an algorithm? I can't see how you could possibly solve the (homework?) problem without such a definition. With a good definition the exercise should be trivial.
 
Old 01-30-2009, 01:08 PM   #3
Blank Reg
LQ Newbie
 
Registered: Feb 2008
Posts: 13

Rep: Reputation: 0
Without a proper problem specification, no-one will be able to help you

"Construct a one-line command" ... in what?

C?

Java?

Shell script? - Which shell?

PERL?

Python?

What?

What resources and tools do you have? - Do you have a lookup dictionary of rhyming endings? ... (Without one, it's gonna be pretty tough in any number of lines, never mind in one line only)

Do you have a known subset of words in the unsorted list or is it the entire universe of possible words?

Whether a known subset or the universal set, Do you know in advance what subset is represented, or is it a blind sort?


You're gonna have to be more specific
 
Old 01-30-2009, 01:29 PM   #4
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by dasidongxi View Post
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.
Homework?

Try this:
Code:
rev words|sort|rev >rhyming.txt
It won't be perfect, because for best results you'll need to detect syllables, which may take more than one line.

Last edited by ErV; 01-30-2009 at 01:50 PM.
 
Old 01-30-2009, 02:01 PM   #5
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Thank you guys!

I have been considered "rev input|sort|rev >output", unfortunately, it doesn't work for such a large file!( about 400,000 lines.)

Is there any way(use BASH commands only) to solve this problem except define a appropriate algorithm?
 
Old 01-30-2009, 02:25 PM   #6
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by dasidongxi View Post
Thank you guys!

I have been considered "rev input|sort|rev >output", unfortunately, it doesn't work for such a large file!( about 400,000 lines.)
It works on my machine on file with 444000 lines.
How exactly it "doesn't work"?

Quote:
Originally Posted by dasidongxi View Post
Is there any way(use BASH commands only) to solve this problem except define a appropriate algorithm?
You could reimplement the whole thing in a bash script (i.e. reverse strings without rev), but it will take more than just one line and it will be much slower.
Also take a look at awk (can't help with awk - I am no awk guru), it might have some useful mechanisms to help with this problem.

Last edited by ErV; 01-30-2009 at 02:29 PM.
 
Old 01-30-2009, 03:01 PM   #7
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
It works on my machine on file with 444000 lines.
How exactly it "doesn't work"?
I don't know why it seems to work only if the file less than 1000 lines?

$ rev words|sort|rev >rhyming.txt
rev: words: Invalid or incomplete multibyte or wide character
 
Old 01-30-2009, 03:13 PM   #8
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Originally Posted by dasidongxi
How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together.
This doesn't work for the (US) English language. Same-ending words do not always rhyme. Consider, for example:
  • some
  • home

Ask your teacher what he was thinking...
 
Old 01-30-2009, 03:14 PM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by dasidongxi View Post
rev: words: Invalid or incomplete multibyte or wide character
To me it is not a problem with the amount of lines in the files, but the way some special characters appearing in the file are treated, based on your language settings. Which is the output of the following?
Code:
echo $LANG
and in which language the dictionary is written?
 
Old 01-30-2009, 03:29 PM   #10
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
To me it is not a problem with the amount of lines in the files, but the way some special characters appearing in the file are treated, based on your language settings. Which is the output of the following?
The language setting is en_US.utf8


Quote:
and in which language the dictionary is written?
Only English words in the dictionary file.
 
Old 01-30-2009, 03:35 PM   #11
Blank Reg
LQ Newbie
 
Registered: Feb 2008
Posts: 13

Rep: Reputation: 0
Have you considered cheating?

Alias a load of shell commands & String them together in a single line


You'll probably fail, if you do it that way though

Depends on whether you're supposed to find 'the right solution' or ... just 'a solution'

If the latter you might get marks for ingenuity, if the aliased commands could be shown to have a legitimate purpose apart from solving this one task - I wouldn't count on it though


TBH, IRL I just wouldn't attempt this in shell script

This is not a trivial problem and proper linguistic analysis of that sort is usually done with a proper AI solution ... And if it's done at all, it won't be in one line, but with either some kind of phoneme dictionary and a set of rules for rhyming ... a neural net ... or a hybrid of the two - Like I said, it's not a trivial task

As somebody else said - What was your teacher / tutor thinking when they set this task?


If I had to do it with some kind of scripting, rather than a proper solution, I'd do it in PERL - You might get it into a single line with PERL, but I wouldn't want to try debugging it!
 
Old 01-30-2009, 03:47 PM   #12
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by dasidongxi View Post
I don't know why it seems to work only if the file less than 1000 lines?

$ rev words|sort|rev >rhyming.txt
rev: words: Invalid or incomplete multibyte or wide character
It looks like file contains incorrect symbol or uses different encoding (especially if you took it from windows machine or something similar). Probably 1000th line has "wrong" symbol.

For example if it used "eastern european" 8bit encoding, then you could get such message on UTF8 system. Try to find line with broken symbol by splitting file, etc. Or make system temporary pretend to have "C" locale by running "export LANG="C"" before launching "rev" script or try this:
Code:
LANG="C" && rev words |sort|rev >output.txt
Quote:
Originally Posted by anomie View Post
This doesn't work for the (US) English language. Same-ending words do not always rhyme. Consider, for example:
  • some
  • home

Ask your teacher what he was thinking...
It works, because it sorts words alphabetically by their endings.
As I said, this solution isn't perfect, so if you don't like it, you'll have to spend some time detecting syllables and writing python scripts (you'll need phonetic dictionary and scripting language with dictionary (dictionary object, or "map") support). If it was homework, then I think rev|sort|rev is correct result.

Last edited by ErV; 01-30-2009 at 03:50 PM.
 
Old 01-30-2009, 03:56 PM   #13
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
@ErV: My comments were tongue-in-check, and made to a drive-by poster who is obviously posting his homework on the forums.
 
Old 01-30-2009, 04:25 PM   #14
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
For example if it used "eastern european" 8bit encoding, then you could get such message on UTF8 system.
ErV you're right!

I tried saving the dictionary file as UTF8, then it worked.

Thank you!
 
Old 01-30-2009, 04:26 PM   #15
dasidongxi
LQ Newbie
 
Registered: Jan 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
For example if it used "eastern european" 8bit encoding, then you could get such message on UTF8 system.
ErV you're right!

I tried saving the dictionary file as UTF8, then it worked.

Thank you!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
help me construct an rpm command pbhat Linux - Software 3 06-13-2008 08:07 AM
How to construct a library file like DLL in windows. SameerLx Linux - Newbie 1 06-16-2007 01:10 AM
Command line 'dictionary' access command. swiadek Linux - Software 3 02-23-2006 08:53 AM
nfs a file shares ok via file mngr but not command line. Suse 9.1 acummings Linux - Networking 2 10-09-2004 02:23 PM
Command to output file content line by line aznluvsmc Programming 2 09-12-2004 07:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration