To know the function on checking whether a character is ascii or unicode in C.

murugesan · 01-20-2009, 11:19 PM

From the following url
http://www.codersource.net/win32_unicode_ascii.html
The function IsTextUnicode is related to Windows VC++ library.
I would like to know the library/function which provides such facility.

PEdroArthur_JEdi · 01-21-2009, 12:58 AM

The code reads 80 bytes from the file and tries to determines if it is a UTF-8 encoded... And don't know how to do it, but there is a simple and easy way to check if the same text isn't in ASCII.

If you search for the ASCII table, you will realize that all values are in the range starting from 0 to 127. So, you may do something like this:

Code:

for (i = 0 ; i < 80 ; i++)
	if ((unsigned char)string[i] >= 0x80)
		return NON_ASCII;

May this help you...

murugesan · 01-21-2009, 02:36 AM

Hi,

I found "UTF-8 octet sequence" from the following url:
http://www.faqs.org/rfcs/rfc3629.html

checking for ch&0xF0
switch(ch&0xF0)
{
case 0xC0: // UTF-8 octet sequence
case 0xE0:
case 0xF0:
printf("unicode") ;
break ;
default:
printf("ascii") ;
}

Thanks for the reply.

graemef · 01-23-2009, 10:51 PM

I'm not convinced that your code snippet will work. UTF-8 has a number of different byte sequences depending upon the number of bytes required to represent the Unicode character.

1 byte : 0xxxxxxx This is the same as 7-bit ASCII
2 bytes: 110xxxxx Followed by 10xxxxxx
3 bytes: 1110xxxx Followed by 10xxxxxx 10xxxxxx
4 bytes: 11110xxx Followed by 10xxxxxx 10xxxxxx 10xxxxxx
5 bytes: 111110xx Followed by 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
6 bytes: 1111110x Followed by 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Where x is the bits of the Unicode character in question and the ones or zeros are required for the encoding to be properly formed.