LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-13-2012, 10:23 AM   #1
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Rep: Reputation: 0
Handling field width in multibyte strings with printf formats?


I use a solution to a problem (that more people than me must have had) that I feel is less than elegant. Therefore I'm curious how other have handled similar situations.

Core problem:
The standard printf() function will print multibyte strings with no problem but it will not correctly handle width formatting since it will calculate the length of the string in bytes and not displayed characters.

For utf-8 encoded strings this is a problem since one displayed character might use two bytes. This means that formats such as

Code:
"%-20s : %-10s\n"
(for example) will not work as expected for true multibyte strings even when using a setlocale() that indicates utf-8 (behavior with glibc)

Possible solutions (and drawbacks)

1. Use wide character as internal format.
This solves the problem but requires a massive, error-prone, rewrite of existing code. In addition this internal
format is non portable (in general) unless strings are printed out with wprintf* family and assuming the proper locale is set so that the output has a proper encoding. Furtermore this will only work for stream input/output. There is no equivalent to read/write non-buffered input/output.

2. Semi-manual format
By manually calculating the displayed width of strings known to be mb it is possible to preformat strings that later can be printed with the normal standard printf() family. However such manual conversion must go over wide character in some way since this seems to be the only way to guarantee correct count of displayed characters regardless of encoding.

The example below illustrates one possible way of doing this

Code:
/* Calculate displayed number of chars in a mb string */
size_t _mblen(const char *s) {
  mbstate_t t;
  const char *scopy = s;
  memset(&t, 0, sizeof (t));
  return mbsrtowcs(NULL, &scopy, strlen(scopy), &t);
}

/* Pad a mb string to 'pad' displayed size */
int _mbpad(char *s,size_t pad, size_t maxlen) {
  size_t mbn=_mblen(s);
  size_t n=strlen(s);
  if( (size_t)-1 == n || n+pad >= maxlen || mbn > pad ) return -1;
  for(size_t i=0; i < pad-mbn; ++i ) {
    s[n+i] = ' ';
  }
  s[n+pad-mbn]='\0';
  return 0;
} 

/** possible usage. Assume the strings mystring1 and mystring2 exists **/
const size_t bsize=255;
char tmpbuf1[bsize],tmpbuf2[bsize]

strncpy(tmpbuf1,mystring1,bsize-1); tmpbuf1[bsize-1]='\0';
strncpy(tmpbuf2,mystring2,bsize-1); tmpbuf2[bsize-1]='\0';
_mbpad(tmpbuf1, 30, bsize); // Ignore possible error condition for clarity
_mbpad(tmpbuf2, 30, bsize); // Ignore possible error condition for clarity
printf("%s : %s\n",tmpbuf1,tmpbuf2);
// Equivalent to printf("%-30s : %-30s\n",mystring1,mystring2); for non-mb strings.

Other people must have solved the same problem. How did you handle it? Should it be considered a bug in glibc that printf family doesn't know about the locale (and mb strings)?

(I should note that in my actual application I make frequent use of va_list versions of the printf* family which makes it impossible to implement this "silently" under the hood since this would require pre-parsing of the formatting string and adjusting only the string arguments)

Thoughts?
 
Old 03-14-2012, 08:39 PM   #2
dwhitney67
Senior Member
 
Registered: Jun 2006
Location: Maryland
Distribution: Kubuntu, Fedora, RHEL
Posts: 1,541

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Have you looked into using wprintf()?

Edit: I guess you have, and it seems that you have your reasons against pursuing its usage.

Last edited by dwhitney67; 03-14-2012 at 08:41 PM.
 
Old 03-15-2012, 01:27 AM   #3
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
Yes, this is basically the solution 1. as listed in my post. Using wide-chars for all internal processing would be a possibility but that would require a complete refactoring of the program. All char and char * types has to be changed. With this comes some subtle but error prone issues. Since I don't really need the full wide-char functionality (the utf-8 half-way house is fine) this is not really a road I want to travel.
 
  


Reply

Tags
formatting, printf



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk: how to print a field when field position is unknown? elfoozo Programming 12 08-18-2010 03:52 AM
get console window width: why ioctl always return 0 width? karatelambda Programming 2 07-07-2010 07:57 AM
Adjust table width so that it fits to page width ynovh Programming 2 03-22-2010 03:13 PM
printf field width ygloo Programming 2 03-04-2007 12:35 PM
Can we specify variable field width in a scanf() format string? skie_knite007 Programming 3 05-13-2005 12:56 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:39 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration