LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 07-20-2010, 08:12 AM   #1
gacl
Member
 
Registered: Feb 2004
Posts: 44

Rep: Reputation: 15
xpdf Won't Play Nice With UTF-8


Hello,

I use Vector Linux 6 (based on Slackware 12.1) and I set the encoding to UTF-8 because I use Spanish characters. The problem is that when I try to open files with accent marks, for instance, with xpdf they all look garbled. How can I get xpdf to display the names correctly? Thanks.

Gus
 
Old 07-23-2010, 04:28 PM   #2
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 268

Rep: Reputation: 54
Describe garbled.

The usual is that there are substitute glyphs for the wanted characters, which would mean that the font that xpdf is using does not have the spanish characters in it. It is possible to have spanish fonts on your system and yet have programs that do not know enough to use them.
I think many PDF define their own fonts in the file, and it may be the fault of the PDF file.
 
Old 07-23-2010, 05:19 PM   #3
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Rep: Reputation: 50
I think most spanish chars are available in the Latin 1 charset. I also think most latin chars are two-byte chars in UTF8 encoding and only one in ASCII or Latin 1. It may be that xpdf doesn't understand utf-8 encoded chars. xpdf may have an INI setting or preference you can change or maybe there's a flag you can set somewhere to use utf-8 encoding. If not, you should probably set your encoding to something xpdf understands.
 
Old 07-24-2010, 08:08 AM   #4
gacl
Member
 
Registered: Feb 2004
Posts: 44

Original Poster
Rep: Reputation: 15
The content of the files is displayed correctly but not the names. For instance, a file named rsum will be displayed as rĩsumĩ in the open box and in the title bar. I've played around with the .xpdfrc file without success.
 
Old 07-24-2010, 08:48 AM   #5
sneakyimp
Member
 
Registered: Dec 2004
Posts: 795

Rep: Reputation: 50
Quote:
Originally Posted by gacl View Post
The content of the files is displayed correctly but not the names. For instance, a file named rsum will be displayed as rĩsumĩ in the open box and in the title bar. I've played around with the .xpdfrc file without success.
Sounds to me like xpdf can play nice with utf8 but that your file system --or whatever system translates a filename into something that appears on your screen -- might not. "", is a 2-byte char when utf-8 encoded. It's a one-byte char in latin-1 so when these systems that don't understand utf-8 look at the filename, they think each is two other chars.
 
Old 07-24-2010, 01:58 PM   #6
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 268

Rep: Reputation: 54
The sensors program prints out a degrees symbol. On a console it prints some garbage character, but on an x console it prints a degree.
Man pages have had some strange character in them (for years) that does not print on consoles.

XPDF is displaying file names (open selection) using its own window and font.
For filenames, it is likely choosing one of the default fonts setup by the KDE system controls.
KDE has a control setup program in the main KDE menu.
Hunt down the KDE display properties and the fonts that are setup.
There are different fonts for different uses and various sizes.
Make sure KDE is setup with fonts that have all the characters you need.
If you use Gnome or any other window manager then same thing.

Still cannot find it ??
Write down the settings for all the KDE fonts.
Set them all to some weird easily recognizable font, different for each one.
See if any of the weird fonts show up in XPDF.

Still cannot find it ??
Look at the XPDF font closly and write down the usual characteristics.
Serif or San-serif, how the 'm' is made, the 'g', the 'j',
note the 'ae' spacing, and what glyph it displays for the special spanish characters.
Use the font selector program and go through all the fonts looking for an identical font.
If you find one then disable it.
When XPDF is forced to use a different font then you have found it.

If you have disabled all the candidates and cannot change XPDF then it must be using an internal font. Some programs do that, but it would be very strange for an x-window program. They cannot be fixed except by getting an updated program.

Get a copy of the XPDF source. Many distributions have them.
Go into the source and find the font used to display filenames.
Fixing it depends upon your programming skills and how badly it is built-in, and you may find something entirely different.

Try a different PDF viewer, there are more than one.

Addendum: Did strings on XPDF last night, and did NOT see any font names, but did see font function calls.
Looked at KDE fonts, and the file listing font looks like half of them. Was not motivated enough to mess up my own fonts trying to find out which was being used.

There is a KDE tool to look at all UTF characters. Check the spanish characters and see if they are one or two byte encodings.
I think all of ASCII and the latin extensions to it encode as one byte. UTF-8 only goes to 2 bytes (and more) for extension pages for
the eastern, african, asian, oriental, arabic, and other non-latin languages (and Klingon).

Last edited by selfprogrammed; 07-25-2010 at 03:24 PM.
 
1 members found this post helpful.
Old 07-26-2010, 11:54 AM   #7
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 268

Rep: Reputation: 54
Checked the KDE character map last night. Searched for some Spanish characters and found four. I was surprised to see that they are giving a UTF-8 encoding using two bytes, with one having three bytes.
They have UTF-16 values that are well under 256.
Having written a document on the coding of UTF-8, from what I remember, there are multiple ways to encode a particular character under UTF-8, with one being considered canonical. Unfortunately, some canonical systems differ.
It does not matter, the creator of the filenames decided which UTF-8 encoding was used and you are seeing one glyph or two glyphs.

My Character map showed the Spanish character glyphs, so my default fonts
in Slackware Linux 2.6.33 have those characters.

Going through the XPDF docs (/usr/docs/xpdf-*) there is a long Changlog file that lists many Unicode documents. They were very systematic about using their use of Unicode and their knowledge of it. But it is possible that they did not consider Unicode in filenames.

I am a little suspicious of that filename list. It looks as if they might be using some tool (from KDE or gtk) to do the open file. Many of the tools in KDE display similar boxes for open-file.
This does not help you much, but points out that if might not be XPDF that is messing up the filenames.

If you post some of the bad filenames then we could play with them too.
But we are unlikely to solve this without examining the XPDF source code.
A bug report to the XPDF support team might be in order, because they would know how they got the filenames displayed. See their /usr/docs for contact info.
 
Old 07-30-2010, 03:23 PM   #8
gacl
Member
 
Registered: Feb 2004
Posts: 44

Original Poster
Rep: Reputation: 15
Sneakyimp, I think the filesystem is OK because Thunar displays the characters in question just fine.

Selfprogrammed, I do use Evince but XPDF is much lighter and faster. VectorLinux uses XFCE. When I go to "Keyboard Preferences" I can easily type accented characters in the test area.

It seems that this is a recognized bug? (Link: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422346)

Thanks.


Gus
 
Old 07-31-2010, 03:13 PM   #9
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 268

Rep: Reputation: 54
I read the bug report. The substitute characters with ~ must be part of the Latin expansion applied to many character sets, which means that the glyphs are in the font, but it is using the wrong encoding to get to them.
It looks like trying to decode UTF-8 using IBM-Locale page, with double characters because it is not UTF-8 decoding the two bytes.
Probably could track down which IBM locale it is using, but it would not
be of much use. Changing your Locale would not help either, it is missing the UTF-8 decoding.

I don't think I can be of much more use, and it looks like a job for the xpdf dev team. Sorry. Disconnecting from thread.

Last edited by selfprogrammed; 07-31-2010 at 03:14 PM.
 
Old 07-31-2010, 06:07 PM   #10
gacl
Member
 
Registered: Feb 2004
Posts: 44

Original Poster
Rep: Reputation: 15
Thank you anyway.


Gus
 
  


Reply

Tags
utf8, xpdf


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does DELL play nice with SuSE ccin1492 Suse/Novell 10 04-16-2006 08:05 PM
X11R6.9.0 Won't Play Nice pkozub Linux - Software 2 01-21-2006 11:45 PM
Why can't Flash play nice with aRts like everything else? squirrels Linux - Software 13 10-17-2005 09:11 AM
Anyone getting Suse 9.0 and Vmware to play nice? Caeda Linux - Software 1 12-10-2003 07:35 PM
httpd and dhcp won't play nice Citizen Bleys Linux - Software 2 11-17-2003 06:44 AM


All times are GMT -5. The time now is 09:23 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration