LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 08-25-2017, 06:05 PM   #1
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 3,656

Rep: Reputation: 51
An ASCII dictionary file like those used by some orthographic corrector programs.


Among the files contained in the Linux distro I use there is a file containing a large list of English words in lexicographical order, let's call it a dictionary file. It is a plain ASCII text file and I know it is used by some orthographic corrector software in the system. Write this moment, as I write this post, it is being read by some program which directly or indirectly causes to underline in red the words I misspell as I type them.

I am looking for a file just like that but containing words from a certain non English tongue (Latin chars also).

[WHAT follows can be skipped] It wouldn't be used in combination with orthographic corrector software. Imagine somebody just told me, to my amazement, there are no words in Spanish with the stress falling in the last syllable and ending in a vowel. If only I had a dictionary file, hence plain ASCII, for that language I could instantly know if he was right. And I could do many other things with it which I could never do with files in other more sofisticated formats.
[WHAT follows can be skipped END]

It is possible other linux distros have a file like that and, so, I would only have to look for linux distros for non-English speakers. This is a question. And of course it has not to have anything to do with linux. The question in more general or concise terms is: (i) Is it likely to get ASCII dictionary files (please observe these are just lists of words belonging to a certain language, let's say like an ordinary dictionary but containing only the article headers), in whatever language? (ii) Where should I begin looking for?

I hope I was not too technical in this post.
 
Old 08-25-2017, 06:26 PM   #2
ShadowCat8
Member
 
Registered: Nov 2004
Location: Ontario, CA
Distribution: Gentoo, Arch, (RedHat4.x-9.x, FedoraCore 1.x-4.x, Debian Potato-Sarge, LFS 6.0, etc.)
Posts: 241

Rep: Reputation: 52
Greetings,

I'm not sure if it is exactly what you are looking for, but I would check the available dictionary files for whichever language you need from Apache OpenOffice and/or LibreOffice to start and see if those have what you need. I know I have had to manually edit some of those files in the past and it was just a list of words, like you described.

Again, I am not sure if that's what you are looking for, but I think it would be a good place to start.

HTH. Let us know.

Last edited by ShadowCat8; 08-25-2017 at 06:27 PM.
 
Old 08-25-2017, 07:22 PM   #3
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 3,656

Original Poster
Rep: Reputation: 51
Are you sure you understand what ASCII code is? But first thanks for your post. I have just read in wikipedia about libreoffice and do not think it has anything to do with ASCII files. Maybe I'm wrong

Last edited by stf92; 08-25-2017 at 07:23 PM.
 
Old 08-25-2017, 10:20 PM   #4
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian, Mageia, and whatever VMs I happen to be playing with
Posts: 12,885
Blog Entries: 18

Rep: Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346
You might take a look at aspell. It offers many dictionaries.

Last edited by frankbell; 08-25-2017 at 10:21 PM.
 
Old 08-25-2017, 11:22 PM   #5
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 3,656

Original Poster
Rep: Reputation: 51
Thanks. I visited aspell.net and Im trying to understand how I can come out with a file containing say 300,000 words. It seems one must compile first, I'd never have thought. Are you sure by douwnloading one of their packages I can, after compilation get something like

a
ass
atheneum
beam
....
zeta,

That is, first word, linefeed, second word, linefeed, and so on?
 
Old 08-25-2017, 11:35 PM   #6
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian, Mageia, and whatever VMs I happen to be playing with
Posts: 12,885
Blog Entries: 18

Rep: Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346Reputation: 3346
There's a Slackbuild: https://slackbuilds.org/repository/1...?search=aspell

I haven't tested it.

I have used aspell, but it's been a while and I don't remember the context, but, as I remember, it worked quite nicely.
 
Old 08-26-2017, 03:19 PM   #7
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,882

Rep: Reputation: 567Reputation: 567Reputation: 567Reputation: 567Reputation: 567Reputation: 567
The dictionary files are separate from the source code and available here:
ftp://ftp.gnu.org/gnu/aspell/dict/0index.html
Specifically, for Spanish:
ftp://ftp.gnu.org/gnu/aspell/dict/es...1.11-2.tar.bz2
 
Old 08-26-2017, 07:28 PM   #8
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 3,656

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by gnashley View Post
The dictionary files are separate from the source code and available here:
ftp://ftp.gnu.org/gnu/aspell/dict/0index.html
Specifically, for Spanish:
ftp://ftp.gnu.org/gnu/aspell/dict/es...1.11-2.tar.bz2
Thanks a lot. I downloaded that very file (Spanish) yesterday, I uncompress and found no ASCII file at all. Perhaps the ASCII text file I look for is created after make is run, but I doubt very much. Well, this is a question.
 
Old 09-05-2017, 03:48 PM   #9
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,882

Rep: Reputation: 567Reputation: 567Reputation: 567Reputation: 567Reputation: 567Reputation: 567
Oh, sorry. But I looked a little bit and found a couple of interesting sites:
Just Words
SCOWL Filter
aspell SCOWL
 
Old 09-26-2017, 07:11 PM   #10
ShadowCat8
Member
 
Registered: Nov 2004
Location: Ontario, CA
Distribution: Gentoo, Arch, (RedHat4.x-9.x, FedoraCore 1.x-4.x, Debian Potato-Sarge, LFS 6.0, etc.)
Posts: 241

Rep: Reputation: 52
Greetings again,

@stf92: Yes, I am very familiar with ASCII, including the differences between which ASCII set certain older personal computing systems would use. (e.g. What do you do when you are trying to use a document written in Commodore's PET ASCII on a TRS-80 and the code points don't match? Secret Squirrel answer: You write your own program to convert the mismatches. ;-) hehe)

So we are all on the same page here, this is a link to a copy of the ASCII table. And, remember that the first part of UTF-8 *is* the ASCII table... At least as far as the first 128 characters go.

Quote:
Originally Posted by UTF-8 article on Wikipedia
"... It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well...."
And, with that in mind, the first thing I was concerned about when you were indicating that you wanted a Spanish dictionary and kept harping on ASCII was the question if *all* of the accented and tilde'd characters would be available to you from straight ASCII. Well, according to the table above, they are part of the Extended ASCII set. Of course, they are also available in Unicode as well, so you *should* be able to use a Spanish dictionary file that is UTF-8 encoded without issue. I imagine it might depend on what you are going to use it for.

HTH. Let us know.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
file command sees regular ASCII text file as ASCII Java program text bbraml Linux - Software 6 08-30-2013 09:52 AM
[SOLVED] Whose is the spelling corrector? LQ's? stf92 LQ Suggestions & Feedback 4 06-18-2011 06:14 PM
Convert binary file in to ascii file using shell script scream Linux - Newbie 5 05-24-2011 08:59 PM
URGENT keyboard.put orthographic accent wrong. gonvaro Linux - Newbie 3 12-03-2010 08:50 PM
Suggest a good offline dictionary-database for gnome-dictionary. abhijeetnayak Linux - Software 1 07-05-2009 06:17 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 06:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration