[SOLVED] UTF-8, not utf-8 or utf8 in locale setting to have SCIM working?
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
UTF-8, not utf-8 or utf8 in locale setting to have SCIM working?
In /etc/profile.d/scim.sh, shipped in the scim package I see:
Code:
# For SCIM to work, you need to use a UTF-8 locale. Make sure it ends on
# ".UTF-8", not "utf-8"! As an example, you would need to use en_US.UTF-8
# for a US locale (export LANG=en_US.UTF-8), not en_US.
However, "locale -a|grep -i utf" only returns locales ending in .utf8. I understand that utf8 is an alias for UTF-8 but still, I never had an issue setting LANG to fr_FR.utf8, nor a complaint from a Slint user using that form.
I am not a scim user myself, however my question is: is it still true that setting LANG to <something>.utf-8 or to <something>.utf8 prevents scim of working properly?
I ask because in the Slint installers we use the form <something>.uf8 and I don't want to prevent scim to work.
Last edited by Didier Spaier; 10-15-2015 at 10:37 AM.
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.
Yes I have seen such statements like this one in the document you linked to:
Quote:
Please do not write UTF-8 in any documentation text in other ways (such as utf8 or UTF_8), unless of course you refer to a variable name and not the encoding itself.
But I couldn't find any convincing backing for that.
I have downloaded the whole 1SO-10646 docs in pdf format (150 megabytes, as that includes the glyphs...) and also looked into the last Unicode specification (version 8.0.0) and found nothing about the alias.
Also the POSIX specification doesn't say anything about UTF-8 (or I need better glasses): it just mentions more generally UCS.
Finally, I just know that the alias can be used in some programming languages and have seen it mentioned in an RFC (I can't remember which at the moment).
Still I confirm that I didn't have any issue so far (maybe because glibc is lenient?) and stay curious about the problems that could or not actually arise in SCIM.
Last edited by Didier Spaier; 10-15-2015 at 01:00 PM.
Why so adamant to go against the advice in the Slackware script? What is there to gain? If things do go wrong because of your use of lowercase .utf-8 people will complain in this forum and not in your mailbox.
My goal is not to go against an advice I just discovered today! I am just trying to figuring if not following (involuntarily) that advice so far could have really hurt an user.
Incidentally I also discovered today that Salix' localesetup use the same naming scheme, so I am not alone
Anyway I will probably end up checking myself if no SCIM user posts an answer.
Last edited by Didier Spaier; 10-15-2015 at 02:10 PM.
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.
locale -a pretty much shows the locale directories which are indeed named as lowercase without dash utf8 as we can see from /usr/lib{,64}/locale. The charmap prints the "correct" name which is uppercase with dash UTF-8.
Quote:
Originally Posted by Didier Spaier
Thanks for your answer Eric.
Yes I have seen such statements like this one in the document you linked to:But I couldn't find any convincing backing for that.
This is a bit different because the document speaks about the unicode encoding (or the unicode standard if you like) when it mentions "always write is as UTF-8" and not the linux locale.
Code:
% LANG=el_GR.kkk locale > /dev/null
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
% LANG=el_GR.utf locale > /dev/null
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
% LANG=el_GR.utf8 locale > /dev/null
% LANG=el_GR.utf-8 locale > /dev/null
% LANG=el_GR.UTF8 locale > /dev/null
% LANG=el_GR.UTF-8 locale > /dev/null
The correct term to use is .UTF-8 but on linux (or more correctly on glibc), all variations work as you see. The best action for config files is to always use the proper term even if others work because you might use the same config on another OS (i learnt that the hard way some years ago when i copied a config of mine to netbsd and it took me a long time to find why it didn't work )
Thanks for your answer, Imitheos, that seems to confirm my assumption about glibc, although I didn't find anything in the docs about that. I must admit that I didn't dive in the code where I would have drowned myself.
I tried SCIM in Slackware-14.1 and still with LANG=fr_FR.utf8 and that works. This is not surprising as /etc/profile.d/scim.sh in Salix-Mate-14.1 was obviously borrowed to Slackware.
I will take a note to reconsider these settings as soon as Slint will have to migrate to a *bsd...
Meanwhile, I mark this thread as [SOLVED]
PS Still, I think that you are right generally speaking to try to make everything portable as much as possible.
That was my guideline writing convtags (see my signature below), strictly following the POSIX specification for sed. For instance I used only basic regular expressions (although I assume that most if not all sed implementations allow usage of extended ones).
Last edited by Didier Spaier; 10-15-2015 at 11:10 PM.
Reason: Typo fix.
I did more testing. It seems that what really counts is that the locale set has actually an UTF-8 encoding, regardless of its name.
For instance I have now LANG set to fa_IR (there is no fa_IR.utf8 listed by locale -a) and as you can see in the three lines below I can type in Persian, Tamoul and Greek:
ُاهس هس حثقسهضد
டொஸ் இஸ் ட்ஃmஇல்
Τηισ ισ Γρεεκ
This works also in xfce4-terminal and kate.
But if I set LANG to fr_FR that doesn't work everywhere: it works in this online editor as well as e.g. in leafpad, geany or kate, but not in terminals like e.g. xfce4-terminal.
Last edited by Didier Spaier; 10-16-2015 at 10:48 AM.
Reason: kate mentioned.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.