Construct a one-line command which turns a file into a rhyming dictionary
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Construct a one-line command which turns a file into a rhyming dictionary
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.
Can you define "similar endings" as an algorithm? I can't see how you could possibly solve the (homework?) problem without such a definition. With a good definition the exercise should be trivial.
Without a proper problem specification, no-one will be able to help you
"Construct a one-line command" ... in what?
C?
Java?
Shell script? - Which shell?
PERL?
Python?
What?
What resources and tools do you have? - Do you have a lookup dictionary of rhyming endings? ... (Without one, it's gonna be pretty tough in any number of lines, never mind in one line only)
Do you have a known subset of words in the unsorted list or is it the entire universe of possible words?
Whether a known subset or the universal set, Do you know in advance what subset is represented, or is it a blind sort?
The file "words" is an alphabetically sorted dictionary, which have nearly 400,000 lines, with one word per line. How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together. The rhyming dictionary should be written to a new file called rhyming.txt.
Homework?
Try this:
Code:
rev words|sort|rev >rhyming.txt
It won't be perfect, because for best results you'll need to detect syllables, which may take more than one line.
I have been considered "rev input|sort|rev >output", unfortunately, it doesn't work for such a large file!( about 400,000 lines.)
It works on my machine on file with 444000 lines.
How exactly it "doesn't work"?
Quote:
Originally Posted by dasidongxi
Is there any way(use BASH commands only) to solve this problem except define a appropriate algorithm?
You could reimplement the whole thing in a bash script (i.e. reverse strings without rev), but it will take more than just one line and it will be much slower.
Also take a look at awk (can't help with awk - I am no awk guru), it might have some useful mechanisms to help with this problem.
How can I construct and execute a one-line command which turns this file into a rhyming dictionary in which words with similar endings are grouped together.
This doesn't work for the (US) English language. Same-ending words do not always rhyme. Consider, for example:
rev: words: Invalid or incomplete multibyte or wide character
To me it is not a problem with the amount of lines in the files, but the way some special characters appearing in the file are treated, based on your language settings. Which is the output of the following?
To me it is not a problem with the amount of lines in the files, but the way some special characters appearing in the file are treated, based on your language settings. Which is the output of the following?
Alias a load of shell commands & String them together in a single line
You'll probably fail, if you do it that way though
Depends on whether you're supposed to find 'the right solution' or ... just 'a solution'
If the latter you might get marks for ingenuity, if the aliased commands could be shown to have a legitimate purpose apart from solving this one task - I wouldn't count on it though
TBH, IRL I just wouldn't attempt this in shell script
This is not a trivial problem and proper linguistic analysis of that sort is usually done with a proper AI solution ... And if it's done at all, it won't be in one line, but with either some kind of phoneme dictionary and a set of rules for rhyming ... a neural net ... or a hybrid of the two - Like I said, it's not a trivial task
As somebody else said - What was your teacher / tutor thinking when they set this task?
If I had to do it with some kind of scripting, rather than a proper solution, I'd do it in PERL - You might get it into a single line with PERL, but I wouldn't want to try debugging it!
I don't know why it seems to work only if the file less than 1000 lines?
$ rev words|sort|rev >rhyming.txt
rev: words: Invalid or incomplete multibyte or wide character
It looks like file contains incorrect symbol or uses different encoding (especially if you took it from windows machine or something similar). Probably 1000th line has "wrong" symbol.
For example if it used "eastern european" 8bit encoding, then you could get such message on UTF8 system. Try to find line with broken symbol by splitting file, etc. Or make system temporary pretend to have "C" locale by running "export LANG="C"" before launching "rev" script or try this:
Code:
LANG="C" && rev words |sort|rev >output.txt
Quote:
Originally Posted by anomie
This doesn't work for the (US) English language. Same-ending words do not always rhyme. Consider, for example:
some
home
Ask your teacher what he was thinking...
It works, because it sorts words alphabetically by their endings.
As I said, this solution isn't perfect, so if you don't like it, you'll have to spend some time detecting syllables and writing python scripts (you'll need phonetic dictionary and scripting language with dictionary (dictionary object, or "map") support). If it was homework, then I think rev|sort|rev is correct result.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.