LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-22-2017, 01:43 AM   #1
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 9.2
Posts: 1,207

Rep: Reputation: 51
Depends: .... but it is not going to be installed


I need "wordlists" in every possible language. All I need is the basic list of newline-terminated strings consisting of one word which, as I understand, exists in every language for which there is a keyboard.
As a test, I tried to download the Italian version from the library (waiting for my own Internet connection since February 2017 from the monopoly .au Telstra) but although the wanted download is a simple text file, I get about 3 screens of messages like that described above and I cannot see any justification for that, at least for what I want to do with it. This is to check that when a visitor submits something in a textarea of a form, what is entered is only valid (utf-8) words in the language of the visitor.
My questions are:
1) Is my Debian 7 too old? Or do I need an other download approach? I suspect I only need the "master" of aspell-it, not the full package. Or should I look for a version of it suitable to Debian 7?
2) Is there a simpler word list available (for each language) (I do not need the dictionary/spell-checker part of it)?

Thank you for your help.
 
Old 06-22-2017, 02:19 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 17,243
Blog Entries: 10

Rep: Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160
please show us all the steps & their oputput.

btw i'm not totally clear what you are trying to achieve here - installing software i presume?
 
1 members found this post helpful.
Old 06-22-2017, 05:37 AM   #3
!!!
Member
 
Registered: Jan 2017
Location: Fremont, CA, USA
Distribution: Trying any&ALL on old/minimal
Posts: 748

Rep: Reputation: 323Reputation: 323Reputation: 323Reputation: 323
Quote:
This is to check that when a visitor submits something in a textarea of a form, what is entered is only valid (utf-8) words in the language of the visitor.
Do you mean you want just the first file from each of these .deb pkgs? https://packages.debian.org/wheezy/wordlist

Or did you want to web-search this: lists of words in each|every language ?

Last edited by !!!; 06-22-2017 at 06:03 AM.
 
1 members found this post helpful.
Old 06-22-2017, 10:41 AM   #4
DavidMcCann
LQ Veteran
 
Registered: Jul 2006
Location: London
Distribution: PCLinuxOS, Debian
Posts: 5,773

Rep: Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133Reputation: 2133
Basically, you need a spell-checker for every possible language. That's a lot of files! This might be one of those cases where a good idea turns out to be impractical.

Hunspell handles things with a dictionary like
accredit/Snd
accreted
accretion/SM
accrual/MS
accrue/SGD
where the codes at the end refer to an affix list like
SFX 6 y iful [^aeiou]y
SFX 6 0 ful [aeiou]y
SFX 6 0 ful [^y]
A simple list of words would be even more bulky, as "accredit/Snd" would have to be entered as "accredit, accreditation, accredits" etc.
 
1 members found this post helpful.
Old 06-23-2017, 12:02 AM   #5
rblampain
Senior Member
 
Registered: Aug 2004
Location: Western Australia
Distribution: Debian 9.2
Posts: 1,207

Original Poster
Rep: Reputation: 51
Thank you for the answers. No, I am not trying to install software, only "wordlists" (lists of alpabetically sorted words or data files) that I can access with my "home-made" application. These lists need be "extracted", they are not a file in the package.
Since I posted, I worked out some of the answers but got more questions, so here it is.

I have been able to install all the Aspell packages available from Debian 7 and get 49 word lists (simple lists of words) with
Code:
aspell -d <iso-639 code-language> dump master > <file>
except for Basque and Portuguese where I get the message:
Code:
..can not be opened for reading
This happens with Aspell-eu-es (Basque), Aspell-pt, Aspell-pt-br (or pt-BR) and Aspell-pt-pt (all Portuguese versions) [/code]

The CJK languages (Chinese, Japanese, Korean) are not available in Aspell.
Question: How can I get something like a word list (or symbol list?) in these languages? The purpose being to check that what is entered in a HTML FORM by a visitor is mostly (allowing for misspelling) in the language of the visitor and is valid words or text, I do not need a spell-checker capability because that involves too much interaction with the visitor.

Question: How can I get word-lists of these CJK languages?

Google has about 100 language translations available and Aspell has about 50 word lists/spell-checkers. This seems to indicate that there is at least 50 languages (considered important enough by Google) for which Internet users are likely to need to use a major language (English, French, Hindi?) supported by the Linux community and used in their country while their minor language is not supported yet.

Question: Is there a way to find what the major language used as a replacement is for an unsupported minor language without having to research each occurence on its own? (There are probably hundreds of those languages that use common alphabets for which there is a computer keyboard but no word-list/spell-checker and I prefer not to reinvent the wheel - if there is a wheel but I suspect the answer is probably "no".)

Question: in the likely event that there is no such list of words for a particular minor (or major) language, is there a way to create a check-list of charaters (similar to a word-list) to ensure that what is submitted is a series of valid utf-8 characters of that language? This seems possible but I have not been able to find how. The point being of course, that only "letters" and (not-used-by-all) punctuation, but not special characters, are acceptable.

What is going to happen here is like this:
A website is intended to be world-readable as a Wiki in many unsupported minor languages, because no translation of the website exists in those languages, it is presented in the most likely major language to be understood in that part of the world (for example Hindi for an Indian dialect) and if the visitor is so inclined and capable, she or he can propose in a textarea a correct or better translation of a sentence for which he or she is then also given the original sentence in English as a guide of what is meant. Basic knowledge of 3 languages (which is good enough in this case) is regularly common in those parts of the world.

Perhaps someone can offer some answers or suggestions. Perhaps someone has experienced the same requirements.
 
Old 06-26-2017, 12:04 AM   #6
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,513

Rep: Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009
I am not aware of word lists in Asian languages. The words are built differently. It's almost impossible to determine what order to put them in.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Something depends on something else but something else is to be installed javascriptninja Linux - Newbie 3 02-05-2012 04:22 PM
help with apt-get: dependency error (Depends: but is not going to be installed) Avatar Debian 7 08-21-2009 10:01 AM
Partition size depends upon what ? tofee Linux - Newbie 2 03-23-2006 12:56 AM
OpenSSH Server Depends DrWorm Linux - Software 0 07-17-2005 06:26 PM
Gentoo - What depends on KDE? Orkie Linux - Distributions 3 05-09-2005 02:22 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration