LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-18-2012, 04:09 PM   #1
k_kush
LQ Newbie
 
Registered: May 2012
Posts: 3

Rep: Reputation: Disabled
Exclamation Issue with Spanish Characters


Hi,

So this application loads these files onto a Linux Server. These files have Spanish characters in the name eg : Zimbrão.doc

But when it gets loaded on to the server the Spanish characters get to changed to ascii character - Zimbrão.doc. And once they retrieve these files from the server they come with the modified names.

How can that be fixed? it looks like the server is unable to understand the character.

Any help will be appreciated.

Thanks.
 
Old 05-18-2012, 04:35 PM   #2
Doc CPU
Senior Member
 
Registered: Jun 2011
Location: Stuttgart, Germany
Distribution: Mint, Debian, Gentoo, Win 2k/XP
Posts: 1,099

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Hi there,

Quote:
Originally Posted by k_kush View Post
So this application loads these files onto a Linux Server.
what application? How are files transferred, using what protocol?

Quote:
Originally Posted by k_kush View Post
These files have Spanish characters in the name eg : Zimbrão.doc
I guess you're using UTF-8 encoding to represent these characters, right?

Quote:
Originally Posted by k_kush View Post
But when it gets loaded on to the server the Spanish characters get to changed to ascii character - Zimbrão.doc.
That's not ASCII. ASCII doesn't contain these characters, it's a 7bit code.

Quote:
Originally Posted by k_kush View Post
And once they retrieve these files from the server they come with the modified names.
You have a problem with character encoding, that much is sure. But to tell you how to resolve the problem, more background knowledge is required.
Are we talking about a web application that exchanges data using HTTP? Then possibly the server side of the application just fails to specify the correct encoding in the HTTP response headers.

Quote:
Originally Posted by k_kush View Post
it looks like the server is unable to understand the character.
No, probably the server doesn't have to understand the character; it just doesn't tell the client correctly how to interpret the characters.

[X] Doc CPU
 
1 members found this post helpful.
Old 05-18-2012, 05:36 PM   #3
k_kush
LQ Newbie
 
Registered: May 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thank You Doc. I will try to answer most of your questions to give you a better picture.

I am new to this company. This is an application which my company uses. I am not too sure about the protocol.

How to identify if I am using the UTF-8 encoding? I tried to recreate the scenario as in. I tried creating a file on another Linux server

touch désolé
this is the result - d?sol?

Yes that much I know it is a web application that exchanges data using HTTP. When you say server side of the application does that mean that the changes have to be done the app code or is it required at the server level?

Ok so the server doesn't have to understand the character, but how to make it interpret the characters?
 
Old 05-19-2012, 04:43 AM   #4
Doc CPU
Senior Member
 
Registered: Jun 2011
Location: Stuttgart, Germany
Distribution: Mint, Debian, Gentoo, Win 2k/XP
Posts: 1,099

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Hi there,

please allow me to put your post in a different order. The conclusion is at the end. ;-)

Quote:
Originally Posted by k_kush View Post
Yes that much I know it is a web application that exchanges data using HTTP. When you say server side of the application does that mean that the changes have to be done the app code or is it required at the server level?
Well, typical web applications consist of some HTML and some program code on the server side, the program parts often done in PHP, ASP.net, rarely Java. This program code produces HTML output that is sent to the client, so that you can use a normal web browser to use that web application.
When I speak about changes to the server side, I mean changes to this PHP or ASP or Java code, whatever it actually is.

Quote:
Originally Posted by k_kush View Post
This is an application which my company uses. I am not too sure about the protocol.
If it's a web application in the usual meaning, it uses HTTP, so that it works with a plain browser.

Quote:
Originally Posted by k_kush View Post
Ok so the server doesn't have to understand the character, but how to make it interpret the characters?
touch désolé
this is the result - d?sol?
Let me strike out a bit further.

[BEGIN: Character encoding basics]
Text can be stored in many different ways; the characters that make up the text can be coded in different ways.
One of the simplest and oldest encodings is ASCII: It uses one byte per character, while the highest bit is unused. So it can store 128 different characters. The first 32 of them are reserved as control characters (like line feed, escape, or end-of-file), that leaves 96 printable characters. These are the 26 letters of the basic English alphabet in upper and lower case, the digits 0..9, and a few essential punctuation marks. That's it. No diacritics, no umlauts, no Greek or Cyrillic letters.
In the late 80's, IT experts began to use the yet unused topmost bit and could now represent 256 different characters with a single byte. They still kept the lower half as it was defined by ASCII, but about the additional 128 characters, there was a mess for many years, because they were assigned differently on almost every computer or software.
In the 90's, a few different specifications came up to standardize the whole set of 256 characters. However, people from different parts of the world, speaking many different languages, each found that a different set of characters was necessary. That's why there are different 8bit encodings. Today, the most important one -at least in the Western world- is the ISO-8859-x family with the -x denoting a few variants, but the majority of characters is the same across all ISO-8859 encodings.

But still, the ISO-8859-x series didn't contain the full set of characters that were used even within Europe, let alone Asia. Hundreds of other characters were actually needed. As a solution to that Babylonian mix of characters sets and encodings that prevailed so far, expert all around the world created the Unicode character set which contains a standardized (and still growing) set of many thousand characters. It's obvious that they cannot be expressed with a single byte any longer, however, the guys made up a clever encoding as a compromise. They invented UTF-8 as one of the standard encodings for Unicode.

UTF-8 represents a character with a variable number of bytes. The 128 ASCII characters are stored as single bytes, as they always were. All characters beyond ASCII need 2, 3 or even 4 bytes. The letter 'ä' for example (used in German) is stored as a two byte sequence 0xC3, 0xA4.

However, if a program has to display text containing this character, it has to know that this sequence is supposed to be one UTF-8 character. If it doesn't, and assumes a traditional 8bit encoding like ISO-8859-1, it wouldn't display "Mädchen" [German "girl"] as intended, but instead "Mädchen". It would take the two byte sequence as two separate characters. A reader who's familiar with the language can still guess what it means, but it looks garbage. Like the example in your first post.
On the other hand, there are byte sequences that are not valid as a UTF-8 code. If a program expects UTF-8 code and encounters such an invalid sequence, it displays a replacement character, usually a question mark. Like your example above.
[END: Character encoding basics]

Quote:
Originally Posted by k_kush View Post
How to identify if I am using the UTF-8 encoding?
If that web application that you're using is an established standard in your company, you won't want to change anything about it. Or maybe you're not even allowed.
Instead, you have to know what encoding is supposed to be used with this application - there should be people who know, or documentation that tells you about this. Then you have to adapt your own tools to it. Probably your browser, probably your text editor. If all programs involved in the process use the same encoding, there's no problem, even if one might "forget" to tell the others about it.

[X] Doc CPU
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
xml ed -i spanish characters problem corcodelagaze Programming 1 02-28-2012 07:01 PM
How can I type characters that aren't in my Spanish keyboard? DeeDeeRamone Linux - General 2 10-16-2008 06:43 PM
Spanish accented characters in an English keyboard (Xubuntu)? joseantmm Ubuntu 2 01-23-2007 09:04 AM
Fedora Spanish Characters - a new twist: mySQL jjdoll42 Linux - Software 1 05-19-2005 02:28 PM
Print Spanish Characters in Epson FX-850 blas Linux - General 0 01-29-2004 12:56 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration