LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 06-09-2009, 10:05 AM   #1
quiescere
Member
 
Registered: Sep 2003
Distribution: Slackware64 13.1
Posts: 52

Rep: Reputation: 15
updating C code for utf8 strings—help with mbtowc()


Hi—

I manage a small community server which hosts a local chat program originally written by one of our users about 13 years ago. The last significant update was December of 2001. I am now wading through the code trying to adapt it to handle utf8 locales (and cleaning up other obsolescences in the process). In the process, I hope to allow extended typographical characters, too.

I am stuck on the isalnum() function. Globalyzer says ANSI does not provide an ismbalnum(), so one must convert a multibyte character to wide using mbtowc() and then use iswalnum():
Code:
#include <stdlib.h>
int mbtowc( wchar_t *pwc, const char *s, size_t n );

int iswalnum(wint_t c);
n is, I think, MB_CUR_MAX, defined in stdlib.h. It looks like, for a line of original code like this:
Code:
if (isalnum(testchar) {
I should instead have something more like:
Code:
wchar_t widetestchar;
if (!mbtowc(widetestchar, testchar, MB_CUR_MAX)) exit(1);
if (iswalnum(widetestchar)) {
Am I on the right track here? The last time I was programming regularly this sort of thing wasn't really on the radar.

Thanks—
q.

(o, and I'm aware that there are plenty of modern chat programs already written that I could install instead. That won't fly here.)
 
Old 06-09-2009, 12:43 PM   #2
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,300
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by quiescere View Post
n is, I think, MB_CUR_MAX, defined in stdlib.h.
n is the number of bytes in s.

You want
Code:
wchar_t widetestchar;
int result;

result = mbtowc(widetestchar, testchar, sizeof(testchar));

if (0 == result) {
    fprintf(stderr, "testchar points at a NULL byte\n");
    exit(1);
}
else if (-1 == result) {
    fprintf(stderr, "mbtowc failed\n");
    exit(1);
}

if (iswalnum(widetestchar)) {
This will handle
Code:
char testchar[TEST_SIZE];
and
Code:
char testchar;
Your snippet does not show how "testchar" is declared. Also, it does not look like your syntax correctly handles the values returned by "mbtowc".
 
Old 06-10-2009, 11:11 AM   #3
quiescere
Member
 
Registered: Sep 2003
Distribution: Slackware64 13.1
Posts: 52

Original Poster
Rep: Reputation: 15
Doh!

I must have read right over the description of the return codes half a dozen times yesterday and it completely failed to register.

Thanks so much for your quick and helpful response.

----q.
 
Old 06-10-2009, 01:25 PM   #4
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,300
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by quiescere View Post
I must have read right over the description of the return codes half a dozen times yesterday and it completely failed to register.
It happens to everybody. The lack of consistency of API return codes eventually bites every programmer on the derriere.

Quote:
Originally Posted by quiescere View Post
Thanks so much for your quick and helpful response.
You are eminently welcome. Please, pay it forward.
 
  


Reply

Tags
utf8


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to code a "less than" with strings Guess Linux - Software 2 12-28-2007 10:03 PM
utf8 hraposo Debian 1 08-11-2006 10:59 AM
how to find duplicate strings in vertical column of strings markhod Programming 7 11-02-2005 04:04 AM
User Preferences: Use HTML code instead of vB code? (vB code is overrated) stefanlasiewski LQ Suggestions & Feedback 5 07-26-2005 01:37 AM
How to compare these two strings in one line code? powerplane Programming 4 07-10-2003 12:09 AM


All times are GMT -5. The time now is 12:33 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration