LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-26-2010, 10:58 AM   #1
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Rep: Reputation: 0
Unhappy Linux not reading non-ASCII characters


Hi all,

Well, I have a web application in Linux server. All my Java codes are there.
FYI, whenever user entered non-ASCII characters(e.g. ∞,€,™) in a text field in my web application, and I check the log of my Java code in Linux server, it returns weird characters.

Suppose user entered ∞ in the text field. I should get ∞ in my log too. However, I got weird characters in return.

Any idea? Is this a Linux bug?
 
Old 04-26-2010, 11:32 AM   #2
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by tailangong View Post
Is this a Linux bug?
This has absolutely nothing to do with the Linux OS.

Are you inputting the data using one character encoding and viewing it with another?
 
Old 04-26-2010, 12:05 PM   #3
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by tailangong View Post
Hi all,

Well, I have a web application in Linux server. All my Java codes are there.
FYI, whenever user entered non-ASCII characters(e.g. ∞,€,™) in a text field in my web application, and I check the log of my Java code in Linux server, it returns weird characters.

Suppose user entered ∞ in the text field. I should get ∞ in my log too. However, I got weird characters in return.

Any idea? Is this a Linux bug?
If you expect a function in a programming language to read something in some manner, your expectations should be based on the official documentation of the function.

So, based on what official documentation, specifically, on which part of it do you expect the characters to be read the way you expect them to be read ?
 
0 members found this post helpful.
Old 04-26-2010, 08:57 PM   #4
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by MTK358 View Post
This has absolutely nothing to do with the Linux OS.

Are you inputting the data using one character encoding and viewing it with another?
Everything entered by the user will be passed to my Java program to be inserted into database. On HTML side, I use UTF-8 as my character encoding. In Java, I use request.getParameter to get the value entered by the user. I don't think I have any encoding for that.

ps: Im working from the terminal, can terminal captures non-ASCII characters? Sorry, Im new in Linux.
 
Old 04-26-2010, 09:11 PM   #5
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by tailangong View Post
...

ps: Im working from the terminal, can terminal captures non-ASCII characters? Sorry, Im new in Linux.
It depends on the terminal and on the way it is configured.
 
Old 04-26-2010, 09:22 PM   #6
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Hi Sergei,

Do you have any idea on how to check this?
What command do I need to check those configurations?
 
Old 04-27-2010, 01:50 AM   #7
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
There are several environment variable that might give you an idea as to how the terminal is set up. Try the following commands from your terminal:

$ echo $LANG
$ echo $LC_ALL
$ echo $LC_CTYPE
 
Old 04-27-2010, 01:58 AM   #8
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
I found out that my page was encoded in iso-8859-1 and echo $LANG in linux returns en_US.UTF-8.
MTK358 was right about it. However, I tried to change my page to use utf-8 as character encoding but to no avail.
Problem still persists. Why??
 
Old 04-27-2010, 02:11 AM   #9
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
Depending upon the whole application you need to ensure that everything is using the right encoding. From HTTP, Java environemnt, database(?).

Grab something like wireshark and see what is actually being sent along the wire, that should help to narrow the problem down.
 
Old 04-27-2010, 03:30 AM   #10
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Wireshark?? Hmm, I don't have enough time to learn a new tool now.
I was just hoping on how to solve this encoding problem
 
Old 04-27-2010, 04:40 AM   #11
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
For your information, when I execute System.out.println("\u20AC"), I should have get € (Euro sign symbol). But what I get is a bunch of weird characters such as �
Can help?
 
Old 04-27-2010, 05:02 AM   #12
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
When I run the code I get a Euro sign.

what do you get with the following?
Code:
import java.nio.charset.Charset;

public class unicodeTest
{
   public static void main(String[] args)
   {
      System.out.println(Charset.defaultCharset());
      System.out.println("\u20AC");
   }
}
 
Old 04-27-2010, 05:05 AM   #13
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by tailangong View Post
Hi Sergei,

Do you have any idea on how to check this?
What command do I need to check those configurations?
To have an idea I need to know what kind of terminal you are using. Also, see http://www.linuxquestions.org/questi...2/#post3948904 .
 
Old 04-27-2010, 05:25 AM   #14
tailangong
LQ Newbie
 
Registered: Apr 2010
Posts: 10

Original Poster
Rep: Reputation: 0
Hi graemef,

I got this ---> �


Hi Sergie,

FYI,

LANG=en_US.UTF-8
HOSTTYPE=i386-linux
VENDOR=intel
OSTYPE=linux
MACHTYPE=i386
 
Old 04-27-2010, 05:28 AM   #15
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
When I run the same program I get:

Code:
UTF-8
€
Does this mean that your Java implementation doesn't have a defaultCharset?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
using extended ascii characters sasser Linux - Newbie 2 04-06-2010 08:50 AM
ASCII characters in my script... Firebar Programming 9 10-27-2008 04:59 PM
mouse keys and non-ascii characters elyk Slackware 8 12-02-2005 12:46 PM
ascii characters lakshman Linux - General 1 03-14-2003 11:28 AM
Deleting non ASCII characters Thinkgeekness Linux - Networking 4 03-04-2003 01:29 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration