LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 04-11-2016, 05:44 PM   #16
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled

Quote:
Originally Posted by Didier Spaier View Post
I'm not sure that will work, as I don't see UTF-32 among the proposed encoding in kwrite or kate. I would try in geany instead.

But how can you paste it if it's not already in a file or a file by itself? Do you have an example of some text where I can see one of these emoticons?
I'll try to post here, but not sure it's supported by LQ:

Code:
$ cat test-emote.txt
☺
<body>❤</body>
 
Old 04-11-2016, 05:45 PM   #17
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled
Looks like it worked. ASnyways, to answer my question, yes, the emoticon characters are supported by UTF-8. So maybe I can hack the plugin to use UTF-8 instead.
 
Old 04-11-2016, 06:10 PM   #18
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1 on Lenovo Thinkpad W520
Posts: 8,581

Rep: Reputation: Disabled
No worries, all 120,672 characters that contains the Unicode Standard, version 8.0 can be encoded in UTF-8 as well as in the 6 other encoding schemes see page 41 (77 of the pdf)
 
Old 04-11-2016, 07:18 PM   #19
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Didier Spaier View Post
No worries, all 120,672 characters that contains the Unicode Standard, version 8.0 can be encoded in UTF-8 as well as in the 6 other encoding schemes see page 41 (77 of the pdf)
Good to know.

Does anyone know if UCS-2 supports the full UTF-8 spectrum?
 
Old 04-11-2016, 08:48 PM   #20
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled
Well, it's nothing to do with Unicode support in python built with UCS-2. It's possibly a bug in Gajim that only effects users with python built with UCS-2 instead of UCS-4.

The following python code is what breaks it for me:

Code:
if string:
    string = re.sub(gajim.interface.invalid_XML_chars_re, '', string)
return string
and the invalid XML chars are:

Code:
self.invalid_XML_chars = u'[\x00-\x08]|[\x0b-\x0c]|[\x0e-\x1f]|' \
    u'[\ud800-\udfff]|[\ufffe-\uffff]'
So, I should be able to fix it and submit a patch upstream, hopefully.
 
Old 04-12-2016, 12:20 AM   #21
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1 on Lenovo Thinkpad W520
Posts: 8,581

Rep: Reputation: Disabled
Quote:
Originally Posted by ethoms View Post
Does anyone know if UCS-2 supports the full UTF-8 spectrum?
No, and it should now be considered as obsolete. Page 882 (918) of the aforementioned specification:
Quote:
UCS-4. UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in 10646.
UCS-2. UCS-2 stands for “Universal Character Set coded in 2 octets” and is also known as “the two-octet BMP form.” It was documented in earlier editions of 10646 as the two-octet (16-bit) encoding consisting only of code positions for plane zero, the basic Multilingual Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard
So UCS-2 can only encode one of the 65536 characters that begin with the code point U+0000 and end with U+FFFF (see this article). This excludes all characters with a code point of U+10000 and above as e.g. shown in this table. For instance the characters drawn with following glyphs cannot be encoded in UCS-2 : 𐀲 𐂃 𐃌 𐄪 𐅪 𐏋 𐒝 𐦀 𐭻 𑇨 😻 🙎 🚈 🚜 🚿 🚁 🚂 🞐 🡽

Last edited by Didier Spaier; 04-12-2016 at 03:54 AM. Reason: Typo fix.
 
1 members found this post helpful.
Old 04-12-2016, 12:35 PM   #22
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled
Well, I managed to patch it by taking out some of the invalid_XML_chars. It seems although mostly Gajim works fine with a UCS-2 python, regex operations on unicode behave differently between UCS-2 and UCS-4. They are not compatable. Anyways, it's fixed for me, what a relief. Confidence in Slackware and python somewhat restored.

What's interesting is that U+10000 characters seem to work. For example ther following unicode emoticon works fine in Gajim:

http://unicode.org/cldr/utility/char...%98%80&B1=Show
 
Old 04-12-2016, 12:50 PM   #23
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1 on Lenovo Thinkpad W520
Posts: 8,581

Rep: Reputation: Disabled
With U+1F600 saved as "emoticon" with encoding UTF-8 in geany:
Code:
/tmp$ LANG=en iconv -f UTF-8 emoticon -t UCS-2
iconv: illegal input sequence at position 0
/tmp$ LANG=en iconv -f UTF-8 emoticon -t UCS-4
�
/tmp$
Which was expected. (the font I use in xterm has no glyph for this emoticon but a least the conversion is OK to UCS-4)

Last edited by Didier Spaier; 04-12-2016 at 12:53 PM.
 
Old 04-30-2016, 12:15 PM   #24
ethoms
Member
 
Registered: Nov 2011
Posts: 113

Original Poster
Rep: Reputation: Disabled
Is geany a python app? This issue only effects python. And it's only a problem because the regex patterns for unicode characters/strings are not compatible between UCS-2 and UCS-4 compiled python runtimes.
 
Old 04-30-2016, 12:45 PM   #25
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1 on Lenovo Thinkpad W520
Posts: 8,581

Rep: Reputation: Disabled
Quote:
Originally Posted by ethoms View Post
Is geany a python app?
No, it's written in C.
 
Old 10-16-2016, 05:03 PM   #26
eduardr
LQ Newbie
 
Registered: Sep 2011
Posts: 17

Rep: Reputation: Disabled
Moving to a ucs4 default in a future Slackware sounds like a good idea. Just ran into this issue trying to use the pre-built Google tensorflow binaries, which expect ucs4. For now I compiled a new python2 with ucs4 enabled and then recompiled numpy and scipy and then was able to use tensorflow.
 
Old 10-16-2016, 05:15 PM   #27
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 5,426

Rep: Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206Reputation: 3206
Quote:
Originally Posted by eduardr View Post
Moving to a ucs4 default in a future Slackware sounds like a good idea.
It's already done in -current. It will be default on the next stable release of Slackware. It was just brought up too late in the development of 14.2 and Pat didn't feel comfortable throwing that change in the mix so close to the release.

Code:
+--------------------------+
Thu Sep 8 21:35:02 UTC 2016
d/python-2.7.12-x86_64-1.txz: Upgraded.
       Compiled using --enable-unicode=ucs4.
       The upstream default for Python Unicode is ucs2, but ucs4 is more widely
       used and recommended now. Any Python scripts or binaries that use UCS-2
       will need to be recompiled. These can be identified with the following
       grep command: grep -r -l PyUnicodeUCS2 /usr 2> /dev/null
 
Old 08-08-2018, 10:03 AM   #28
troqnec
LQ Newbie
 
Registered: May 2013
Posts: 25

Rep: Reputation: Disabled
Hi friends, I try to start acestream engine but I have problem with my ucs2 python2.7 and ucs4 for acestream. When I try to start engine I have this:"
Code:
~# /opt/acestream_3.1.16_ubuntu_16.04_x86_64/start-engine --client-console
xx Cannot load libraries: path /opt/acestream_3.1.16_ubuntu_16.04_x86_64/lib
Traceback (most recent call last):
  File "<entry>", line 9, in <module>
ImportError: /opt/acestream_3.1.16_ubuntu_16.04_x86_64/lib/acestreamengine/Core.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8
I try with:
Code:
# iconv -f UCS-4 -t UTF-8 Core.so Core1.so
���������������������������������iconv: illegal input sequence at position 40
and with:
Code:
# iconv -f UCS-4 -t UCS-2 Core.so Core1.so
iconv: illegal input sequence at position 0
Can I copy this file on a flashdrive and open in some ubuntu with ucs4 python and save as with some other encoding or?
 
Old 08-08-2018, 07:51 PM   #29
Richard Cranium
Senior Member
 
Registered: Apr 2009
Location: Carrollton, Texas
Distribution: Slackware64 14.2
Posts: 3,068

Rep: Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453Reputation: 1453
You might get better answers if you ask this on an Ubuntu forum versus a Slackware one.
 
Old 08-09-2018, 02:18 AM   #30
troqnec
LQ Newbie
 
Registered: May 2013
Posts: 25

Rep: Reputation: Disabled
Hi Richard Cranium, I think that this is right place, because Slackware 14.2 64 bit is with python ucs2 default and slack users have problems with this not the ubuntu users. I don't understand python thinks and I just try with suggestion how to start this acesream engine on my Slackware 14.2 64 bit desktop machine.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux on Cisco UCS host - Storage migration yashraj221087 Linux - Server 1 08-10-2015 07:00 AM
Problems using awk/sed/sort with a ucs-2le encoded file Jem7v! Programming 3 02-05-2010 07:03 AM
Perl File handling issue how to handle ucs 16 character set alix123 Programming 1 10-27-2008 07:51 AM
Where are UCS Unicode strings for GTK? donnied Linux - Desktop 0 08-11-2008 11:19 AM
python problem - compiled from source - python -V still showing old version txm123 Linux - Newbie 1 02-15-2006 12:05 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 01:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration