LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 07-29-2019, 12:48 PM   #1
gda
Member
 
Registered: Oct 2015
Posts: 130

Rep: Reputation: 27
Slackware and UTF-8 encoding


Hi all,

not sure this is really related to Slackware.... Anyway I have some problem in writing a simple C code handling UTF8 encoded strings properly. As far as I have understood C libraries just use the encoding defined in the environment they live in. In particular for Slackware the default encoding is ISO-8859-2 and so I suppose this encoding is used.

Indeed the following code:

Code:
char test[40];
test[0]=0;
strcat(test,"øèéö");
fprintf(stdout,"test=%s\n",test);
when run in an terminal with ISO-8859-2 encoding returns:

test=řčéö

This is totally expected as the UTF8 characters "ø" and "è" are mapped to different character in the ISO-8859-2 table ("ř" and "č" respectively).

The problem is that if I add at the very beginning of the above code the line

Code:
setlocale(LC_ALL,"es_US.utf8");
and I run the the code in a UFT8 terminal I got:

test=����

So why the UTF8 string is not displayed as written into the code? I have the feeling I'm missing something very basic but I cannot figure out what it is...

Thanks in advance for your help!
 
Old 07-29-2019, 02:29 PM   #2
Labinnah
Member
 
Registered: May 2014
Location: Łódź, Poland
Distribution: Slackware-current
Posts: 185

Rep: Reputation: 112Reputation: 112
This is not utf-8 string. You written it as ISO-8859-2 and it stay in that form. By setlocale function you only set how terminal interprets strings. It doesn't transcode them to utf-8. If you want display uft-8 string you must write them as utf-8 (in editor you use). Or you can use iconv function
Code:
man 3 iconv
 
Old 07-29-2019, 03:02 PM   #3
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,500

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
I can't offer much help, but I can sympathise. I'm in Ireland.

Some smart-alec wrote that "War is God's way of teaching Americans Geography!"
Slackware isn't marvellously aware of locale eccentricities. I'm afraid I can only suggest you try some other locales. The various attempted ones for Ireland (across several programs) are:
en_IE
en_IE.utf8
en_IE@euro
en_IE.iso8859-15
etc. etc.

es_US is a dead duck unless it's in glibc locales. But there must be others close. Don't worry about having the basic charset, and don't forget the right-hand 'Alt' key or Alt_Gr. I'd start by changing the 'US' bit. I chose the locale that gave me the best compromise for my situation, where I want mathematical keys and áéíóú for Irish words. I also get '…ºª¯ῶ÷€©™¡≤≥¦×'(a multiply different from x)and no doubt a few others, but I lose the cents sign, and our € (euro) currency has cents.

Be aware also that you can set each LC_ setting differently. You usually hear of setting LC_ALL, but that sets them all. You can pick and choose, but it's up to you.
 
Old 07-29-2019, 04:50 PM   #4
ehartman
Senior Member
 
Registered: Jul 2007
Location: Delft, The Netherlands
Distribution: Slackware
Posts: 1,674

Rep: Reputation: 888Reputation: 888Reputation: 888Reputation: 888Reputation: 888Reputation: 888Reputation: 888
Quote:
Originally Posted by gda View Post
In particular for Slackware the default encoding is ISO-8859-2 and so I suppose this encoding is used.
No, nowadays the default encoding is UTF-8:
Quote:
# en_US.UTF-8 is the Slackware default locale.
and some lines further
export LANG=en_US.UTF-8

# 'C' is the old Slackware (and UNIX) default, which is 127-bit ASCII
# with a charmap setting of ANSI_X3.4-1968.
#export LANG=C
(from the file /etc/profile.d/lang.sh.new, out of the etc- package).
Note that the 2nd "export" is commented out (as are all other options).

If your system has ISO-8859-2 as a default, you did that yourself, maybe when you installed your system.

Then the "str*" functions are not magical or so: when iso-8859 characters go into them, you will get iso-8859 chars printed, INdependant of your locale.
So to get utf-8 ones: IN an editor with the utf-8 locale, create the characters in the
"strcat(test,"øèéö");"
line, so that those chars are utf-8 (that is: multibyte) and not iso-8859 (which is a single byte character set, it only has 256 values to display, that is why there are multiple iso-8859-* standards).utf-8 has characters that can be UP to 4 bytes, so it can display many more characters, but it does mean the same code(s) mean different things, as values 128 through 255 are used differently in iso-8859 from utf-8
See for more info
man iso-8859-2 or
man utf-8
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting UTF-16 files to another encoding (such as UTF-8) crisostomo_enrico Solaris / OpenSolaris 3 03-25-2008 05:30 PM
im getting UTF-8 to STRING: Could not open converter from 'UTF-8' to 'ISO-8859-1' jabka Linux - Newbie 2 11-24-2006 05:44 AM
How do I know how a file is encoded? UTF-8, UTF-16, etc.. ?? brynjarh Linux - General 1 12-03-2004 11:11 AM
[Enter] in text documents diffrent on Windows and Linux? UTF-8/UTF-16 problem or? brynjarh Linux - General 1 11-24-2004 05:20 AM
X11 / UTF-8 locale seems missing 'fr_FR.UTF-8' chrsitophermann Debian 11 07-17-2004 02:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 07:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration