How can I implement a whole new writing system?

MarcvsHdr · 08-11-2010, 10:38 AM

So basically, there is a really cool writing system I have been working on. It could be viewed (for simplification purposes) like an encryption method for the Latin script.

Facts about the writing system:

It has a little over 300 symbols.
It is syllable-driven.
It is highly compositional(eg. "c", "ca", "cae", "ca " and "ci." all map to different symbols - and NOT by overlaying elements)
Symbols have medium graphical complexity (comparable to Korean Hangul, or Japanese Hiragana)
Has a rather complex set of diacritics (~10, some of which can go on any symbol)
Has no ligatures

How transliteration occurs:
Sequences of Latin symbols map to certain symbols. Example below:

[G][rou][p ][hu][g.]

Characters sequences between "[" and "]" map to a single symbol (so it would take only 5 symbols to write "Group hug.").

How I want it to work:
I would like to have a daemon that:

Intercepts all text displayed on the screen.
Converts it to my writing system (changes letter sequences with individual Unicode codes)
Leaves unsupported symbols unchanged.
Displays all the text on the screen using my font and characters intertwined with the fonts and characters left unchanged.

For example, if you take the following line of C++ code:

for (i = 1; i <= n; i++)

I would like it displayed like this:

[fo][r ]([i ]= [1]; [i ]<= [n]; [i]++)

Bold-symbols should be in my Unicode font with special symbols defined for this writing system, and the rest should be in its original font and encoding.

Also, I would like this encoding to hold for display-purposes only. The data in the memory should remain unaffected.

This also means real-time adjustments: if I open a text editor (say, from the OpenOffice Suite) and I start typing, I would like to see what I type encoded with my writing system, even though the document actually contains Latin letters. This also means that the symbol immediately before the cursor may change as you type.

Please tell me if there is any way I can do this (and of course, without reinventing the wheel

)

Thanks.

ArthurSittler · 08-11-2010, 11:06 PM

MarcvsHdr,
If you are trying to recognize symbol sequences in an input file and emitting tokens representing elements in some alternate language, then you are writing a scanner. You may want to examine the manual for the scanner generator program lex, or, in Linux, flex. The usual method of invoking the scanner is to have the routine which needs the tokens to call the scanner to give it the next token.
I always build in a test driver main() routine bracketed with conditional compilation switch which simply asks for tokens until the scanner returns an end-of-input token.

ArthurSittler · 08-18-2010, 04:40 PM

In my earlier response, I suggested you might be trying to write a scanner. Depending on how much you may need to process and rearrange the output relative to the input, you might actually be trying to write a translator. In that case, you may also want to study parsers and parser generators, such as yacc, or, in Linux and other open-source environments, Bison.
Flex and Bison are commonly thought of as compiler writing tools. Based on a simple, orderly description of the form of the input, flex and bison generate a program which interprets input and generates output or other actions depending on the interpretation of the input. Looking at this in a slightly broader view, there are few programs which could not fit the model of interpreting input and producing actions based on that input. I think of these compiler writing tools as program generating tools. In some cases, using flex and bison to write your program may be a bit like using a cannon to kill flies. They may produce fluffier code than a more direct approach. Small, efficient code is not always the most important goal in programming. Usually producing an end result with less programming time is more important. I consider readability and maintainability to be more important than any other characteristics of software. In this usual case, having programs automatically create your programs for you can be a better option.
Flex and bison do not do all the work for you, either. You still have to write code to produce the actions of your program. Even so, understanding these tools well enough to use them can make otherwise daunting tasks much easier.

MarcvsHdr · 08-19-2010, 02:12 PM

Thank you

. I think a simple finite automaton might do the trick in this case, so I should just have a look at flex.

Of course, though, as life always gets ahead of me, I now find myself working on a Neural Network generator, so I had to ditch everything else for about a month

.

I'm hoping I will find time for my writing system project soon... still, generating the writing was the easy part, I'm more worried on how to send the appropriate text from my (let's call it) scanner-proxy to the X server.

A friend of mine also suggested that I look into the Korean language packages, which might provide with some solutions if only it isn't hard-coded (which I still don't know).