LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   An ASCII dictionary file like those used by some orthographic corrector programs. (https://www.linuxquestions.org/questions/general-10/an-ascii-dictionary-file-like-those-used-by-some-orthographic-corrector-programs-4175612634/)

stf92 08-25-2017 05:05 PM

An ASCII dictionary file like those used by some orthographic corrector programs.
 
Among the files contained in the Linux distro I use there is a file containing a large list of English words in lexicographical order, let's call it a dictionary file. It is a plain ASCII text file and I know it is used by some orthographic corrector software in the system. Write this moment, as I write this post, it is being read by some program which directly or indirectly causes to underline in red the words I misspell as I type them.

I am looking for a file just like that but containing words from a certain non English tongue (Latin chars also).

[WHAT follows can be skipped] It wouldn't be used in combination with orthographic corrector software. Imagine somebody just told me, to my amazement, there are no words in Spanish with the stress falling in the last syllable and ending in a vowel. If only I had a dictionary file, hence plain ASCII, for that language I could instantly know if he was right. And I could do many other things with it which I could never do with files in other more sofisticated formats.
[WHAT follows can be skipped END]

It is possible other linux distros have a file like that and, so, I would only have to look for linux distros for non-English speakers. This is a question. And of course it has not to have anything to do with linux. The question in more general or concise terms is: (i) Is it likely to get ASCII dictionary files (please observe these are just lists of words belonging to a certain language, let's say like an ordinary dictionary but containing only the article headers), in whatever language? (ii) Where should I begin looking for?

I hope I was not too technical in this post.

ShadowCat8 08-25-2017 05:26 PM

Greetings,

I'm not sure if it is exactly what you are looking for, but I would check the available dictionary files for whichever language you need from Apache OpenOffice and/or LibreOffice to start and see if those have what you need. I know I have had to manually edit some of those files in the past and it was just a list of words, like you described.

Again, I am not sure if that's what you are looking for, but I think it would be a good place to start.

HTH. Let us know.

stf92 08-25-2017 06:22 PM

Are you sure you understand what ASCII code is? But first thanks for your post. I have just read in wikipedia about libreoffice and do not think it has anything to do with ASCII files. Maybe I'm wrong

frankbell 08-25-2017 09:20 PM

You might take a look at aspell. It offers many dictionaries.

stf92 08-25-2017 10:22 PM

Thanks. I visited aspell.net and Im trying to understand how I can come out with a file containing say 300,000 words. It seems one must compile first, I'd never have thought. Are you sure by douwnloading one of their packages I can, after compilation get something like

a
ass
atheneum
beam
....
zeta,

That is, first word, linefeed, second word, linefeed, and so on?

frankbell 08-25-2017 10:35 PM

There's a Slackbuild: https://slackbuilds.org/repository/1...?search=aspell

I haven't tested it.

I have used aspell, but it's been a while and I don't remember the context, but, as I remember, it worked quite nicely.

gnashley 08-26-2017 02:19 PM

The dictionary files are separate from the source code and available here:
ftp://ftp.gnu.org/gnu/aspell/dict/0index.html
Specifically, for Spanish:
ftp://ftp.gnu.org/gnu/aspell/dict/es...1.11-2.tar.bz2

stf92 08-26-2017 06:28 PM

Quote:

Originally Posted by gnashley (Post 5752248)
The dictionary files are separate from the source code and available here:
ftp://ftp.gnu.org/gnu/aspell/dict/0index.html
Specifically, for Spanish:
ftp://ftp.gnu.org/gnu/aspell/dict/es...1.11-2.tar.bz2

Thanks a lot. I downloaded that very file (Spanish) yesterday, I uncompress and found no ASCII file at all. Perhaps the ASCII text file I look for is created after make is run, but I doubt very much. Well, this is a question.

gnashley 09-05-2017 02:48 PM

Oh, sorry. But I looked a little bit and found a couple of interesting sites:
Just Words
SCOWL Filter
aspell SCOWL

ShadowCat8 09-26-2017 06:11 PM

Greetings again,

@stf92: Yes, I am very familiar with ASCII, including the differences between which ASCII set certain older personal computing systems would use. (e.g. What do you do when you are trying to use a document written in Commodore's PET ASCII on a TRS-80 and the code points don't match? Secret Squirrel answer: You write your own program to convert the mismatches. ;-) hehe)

So we are all on the same page here, this is a link to a copy of the ASCII table. And, remember that the first part of UTF-8 *is* the ASCII table... At least as far as the first 128 characters go.

Quote:

Originally Posted by UTF-8 article on Wikipedia
"... It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well...."

And, with that in mind, the first thing I was concerned about when you were indicating that you wanted a Spanish dictionary and kept harping on ASCII was the question if *all* of the accented and tilde'd characters would be available to you from straight ASCII. Well, according to the table above, they are part of the Extended ASCII set. Of course, they are also available in Unicode as well, so you *should* be able to use a Spanish dictionary file that is UTF-8 encoded without issue. I imagine it might depend on what you are going to use it for.

HTH. Let us know.


All times are GMT -5. The time now is 12:03 AM.