LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-22-2008, 11:12 PM   #1
navinkaus
LQ Newbie
 
Registered: Dec 2008
Posts: 14

Rep: Reputation: 0
How to get length of UTF8 string


Is there any API which returns the length of UTF8 string ?


Thanks,
Navin
 
Old 12-23-2008, 01:45 AM   #2
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
Are we supposed to guess the language?
 
Old 12-23-2008, 03:14 AM   #3
navinkaus
LQ Newbie
 
Registered: Dec 2008
Posts: 14

Original Poster
Rep: Reputation: 0
My program is in C and looking to get the length of UTF8 encoded string.
 
Old 12-23-2008, 03:38 AM   #4
navinkaus
LQ Newbie
 
Registered: Dec 2008
Posts: 14

Original Poster
Rep: Reputation: 0
Using following custom function:

int strlen_utf8(char *s)
{
int i = 0, j = 0;
while (s[i])
{
if ((s[i] & 0xc0) != 0x80)
j++;
i++;
}
return j;
}

Let's say if string contains japanese word(5 japanese characters) with 10 bytes at binary level, it will return 5.
In case if string contains 5 ASCII characters (5 bytes at binary level ), it will return 5.

Anybody see any problem in this ?
 
Old 12-23-2008, 03:42 AM   #5
navinkaus
LQ Newbie
 
Registered: Dec 2008
Posts: 14

Original Poster
Rep: Reputation: 0
one more thing, I googled and read on some website that strlen will work on linux i.e. it understands UTF8 encoded string.

But it's not true, I created a hello world and found that it returns the length till it reached '\0'. So if Japanase word takes 10 characters at binary level after encoding, strlen will return 10 instead of length of printable characters of that japanese word.

Any comments ?
 
Old 12-23-2008, 05:11 AM   #6
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
You should not believe everything you read on random websites. A more reliable source of information on UTF-8 and related issues is http://www.cl.cam.ac.uk/~mgk25/unicode.html. For your particular needs, http://linux.about.com/library/cmd/blcmdl3_mbstowcs.htm would seem suitable.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get string length in C ++ for a c string without trailing \0 nc3b Programming 10 12-28-2007 09:46 AM
string length ramesh_manu Linux - Newbie 1 02-24-2007 12:33 PM
Limit a string length for translation? mic Programming 3 01-20-2006 06:58 PM
Invalid UTF8 string passed to pango_layout_ Curtux Mandriva 0 05-24-2005 07:42 PM
is there any limit for string length in basic_string? pippet Programming 13 02-01-2005 06:58 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration