LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   updating C code for utf8 strings—help with mbtowc() (https://www.linuxquestions.org/questions/programming-9/updating-c-code-for-utf8-strings%97help-with-mbtowc-731675/)

quiescere 06-09-2009 10:05 AM

updating C code for utf8 strings—help with mbtowc()
 
Hi—

I manage a small community server which hosts a local chat program originally written by one of our users about 13 years ago. The last significant update was December of 2001. I am now wading through the code trying to adapt it to handle utf8 locales (and cleaning up other obsolescences in the process). In the process, I hope to allow extended typographical characters, too.

I am stuck on the isalnum() function. Globalyzer says ANSI does not provide an ismbalnum(), so one must convert a multibyte character to wide using mbtowc() and then use iswalnum():
Code:

#include <stdlib.h>
int mbtowc( wchar_t *pwc, const char *s, size_t n );

int iswalnum(wint_t c);

n is, I think, MB_CUR_MAX, defined in stdlib.h. It looks like, for a line of original code like this:
Code:

if (isalnum(testchar) {
I should instead have something more like:
Code:

wchar_t widetestchar;
if (!mbtowc(widetestchar, testchar, MB_CUR_MAX)) exit(1);
if (iswalnum(widetestchar)) {

Am I on the right track here? The last time I was programming regularly this sort of thing wasn't really on the radar.

Thanks—
q.

(o, and I'm aware that there are plenty of modern chat programs already written that I could install instead. That won't fly here.)

David1357 06-09-2009 12:43 PM

Quote:

Originally Posted by quiescere (Post 3568035)
n is, I think, MB_CUR_MAX, defined in stdlib.h.

n is the number of bytes in s.

You want
Code:

wchar_t widetestchar;
int result;

result = mbtowc(widetestchar, testchar, sizeof(testchar));

if (0 == result) {
    fprintf(stderr, "testchar points at a NULL byte\n");
    exit(1);
}
else if (-1 == result) {
    fprintf(stderr, "mbtowc failed\n");
    exit(1);
}

if (iswalnum(widetestchar)) {

This will handle
Code:

char testchar[TEST_SIZE];
and
Code:

char testchar;
Your snippet does not show how "testchar" is declared. Also, it does not look like your syntax correctly handles the values returned by "mbtowc".

quiescere 06-10-2009 11:11 AM

Doh!

I must have read right over the description of the return codes half a dozen times yesterday and it completely failed to register.

Thanks so much for your quick and helpful response.

----q.

David1357 06-10-2009 01:25 PM

Quote:

Originally Posted by quiescere (Post 3569344)
I must have read right over the description of the return codes half a dozen times yesterday and it completely failed to register.

It happens to everybody. The lack of consistency of API return codes eventually bites every programmer on the derriere.

Quote:

Originally Posted by quiescere (Post 3569344)
Thanks so much for your quick and helpful response.

You are eminently welcome. Please, pay it forward.


All times are GMT -5. The time now is 02:03 AM.