LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-15-2011, 05:54 AM   #1
qrange
Senior Member
 
Registered: Jul 2006
Location: Belgrade, Yugoslavia
Distribution: Debian stable/testing, amd64
Posts: 1,061

Rep: Reputation: 47
malformed utf-8 character


please help.
I need some software that will check .xml file and tell me which character is malformed in 'utf-8'.

I am using perl for some parsing.

thanks.
 
Old 06-16-2011, 09:26 AM   #2
qrange
Senior Member
 
Registered: Jul 2006
Location: Belgrade, Yugoslavia
Distribution: Debian stable/testing, amd64
Posts: 1,061

Original Poster
Rep: Reputation: 47
this has been resolved. if i understood correctly the problem was that software and parser for xml truncated to certain number of bytes and utf8 uses more bytes for a single letter (what a waste). so the last one became 'malformed'.
 
Old 06-16-2011, 09:45 AM   #3
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by qrange View Post
utf8 uses more bytes for a single letter (what a waste).
It's not a waste.

It uses only one byte for ASCII characters, then adds more bytes as needed only for certain characters.

If not for encodings like UTF-8 that have variable-length characters, every character would have to be big enough to store the highest posible value. Now that would be a waste.

Last edited by MTK358; 06-16-2011 at 09:46 AM.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to change character sets from iso-8859-1 to UTF-8? okos Slackware 8 06-26-2008 03:18 PM
mkisofs UTF-8 character Unable to make a DVD creznedmick Linux - Software 1 11-16-2006 02:18 AM
Conversion from character set 'UTF-8' to @local error Postgre Slackware 2 09-11-2006 09:15 PM
National character and UTF-8 pingu Linux - Software 3 04-20-2005 01:58 AM
[Enter] in text documents diffrent on Windows and Linux? UTF-8/UTF-16 problem or? brynjarh Linux - General 1 11-24-2004 05:20 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration