LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-24-2010, 11:21 AM   #1
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Rep: Reputation: 0
Question system() call and character encoding problem


I use system() call in order to call upon "/bin/mail" to send notification mail from a Unix daemon server program.

Everything works fine apart from character encoding problem in the subject. Basically the notification mails works by executing something like

Code:
snprintf(buffer,blen-1,
	"echo '%s' | /usr/bin/mail -s '%s' '%s'",message, subject, to);
buffer[blen-1] = '\0';
int ret = system(buffer);
and this works just fine as long as there is no need to use MIME encoded-word syntax in the subject, i.e.
"=?charset?encoding?encoded text?="
and here lies the problem.

When using "mail" directly from the command line this also works fine but when called upon via the system() call there is a different behavior.

If called from the command line with a subject that requires MIME encoded-word syntax the mail program will correctly do the encodation before sending it on to the MTA. However this does not seem to work when used through the system() call. What happens is that when using the "mail" command through the system() call no MIME word-encoding at all is done.

My initial assumption was that the system() call would create a very "bare" environment where the locale (I use an utf-8 system) was not correct. However, running system() and examining the environment it is the same environment as I have on the command line, i.e. I cannot spot any differences.

However, I still suspect this is the problem even though I couldn't spot any difference.

Is there anyone used to dealing with the complexity of character-encodation that could give me a hint on how to address this issue?
 
Old 07-24-2010, 01:48 PM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
It should give exactly the same result...

Could you please provide the example command (with values passed)?

Also (you have probably already done it): are you sure the command run by system() is exactly the same as you run in command line? (quotes, special characters...)
 
Old 07-24-2010, 05:54 PM   #3
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
Seems to be something interesting

This seems to be more interesting problem than I first envisioned :-)

If I create a small test program by "lifting" out the mail function from my real program, setting the same compiler flag the test program work flawless (and does the correct encoding). But even when I use the same hardcoded subject both in the test and in the "real" program the problem appears in the real program but not in the test.

I have double checked that:
  • The files are encoded the same way (utf8)
  • The compiler flags are identical e.g. -std=gnu99 -g -O2
  • The code defines which affects the standard library routines are the same, e.g. #define _GNU_SOURCE.

But still, the test program works perfectly and the real program doesn't. I have to confess that this is the first time in a long time I'm actually baffled. It will be interesting to get to the bottom of this - I'm sure there is a lesson to be learnt here. But right now I'm not making much progress.

The mail function is trivial and included below as reference (with only some internal comments removed to keep it short)

Code:
int send_mail(const char *subject, const char *to, const char *message) {
    char buffer[MAIL_MAXBUFF_LEN];

    if( strlen(message) >= MAIL_MAXBUFF_LEN ) {
        syslog(LOG_ERR,"Truncating mail");
    }
    snprintf(buffer,MAIL_MAXBUFF_LEN-1,
	"echo '%s' | /usr/bin/mail -s '%s' '%s'",message, subject, to);
    buffer[MAIL_MAXBUFF_LEN-1] = '\0';
    return system(buffer);
}
 
Old 07-25-2010, 03:20 AM   #4
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
Have you already tried to printf() the string before you pass it to system()? There must be some difference, simply
 
Old 07-25-2010, 04:39 AM   #5
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
I wish it was that simple :-)

In order to be 100% sure it is identical I have actually inserted the exact same test code to send the same "hardcoded" mail in the real program as in the test program. Bascically the core of the test program is

Code:
static char X_body[] = "The body of the message\n\n";
static char X_to[] = "some_mail@gmail.com";
static char X_subject[] = "Extended chars: åäö";

send_mail(X_subject, X_to, X_body);
The above code executed in the context of the Unix daemon cannot handle the extended chars while the test program works just fine (!)

I can also add that the test program also detaches from the controlling terminal in the same was as the daemon and also runs as the same user as the daemon.

This really irritates me. Normally I'm able to figure out issues like this fairly quickly but I'm soon at loss what to test more. I simply cannot think of anything else that differs between the "real" daemon and the test program. The test program is
  • daemonized the exact same way
  • started as "root" and then switched to the same user as the daemon
  • sends the mail from its own thread as in the real daemon

Logically, the only reason for this behavior is that the daemon gets a shell environment which has an undefined/wrong locale setting and hence the mail program is unable to detect the extended chars. I just don't know how to prove this since executing, for example, "locale", in the dameon gives the exact same LC settings.

Can anyone think of any other way to prove and show evidence that the shell locale is different in the daemon?
 
Old 07-25-2010, 04:54 AM   #6
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
Partly solved

Following my own advice I one more time checked the locale in case I was tired when I checked it.

In the test program the system() executes under a shell that has the locale

Code:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
However, in the daemon the locale is

Code:
LANG=en_US.UTF-8
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX
This explains the problems. What remains is just to understand why the locale is different.

Last edited by johan162; 07-25-2010 at 04:58 AM.
 
Old 07-25-2010, 06:14 AM   #7
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
Lightbulb Problem solved

As reference for anyone else with similar issue.

For some reason the system() call in my daemon gets executed with a shell which has a different locale than the default on the system. The system default locale in my case is "en_US.UTF8" but the system() call crates a shell that uses "POSIX" locale.

This creates a problem since POSIX locale just handles 7bit ASCII.

I have not yet fully understood why my daemon system() call has this behavior and my test program doesn't.

A quick fix (hack) to solve these symptom is to add a LC_ALL setting in tyhe system() call, i.e.

Code:
system("LC_ALL=en_US.UTF8 export LC_ALL; THE_REAL_COMMAND_TO_BE EXECUTED");
I still have to understand why this change of behavior exists in the system() call in my daemon but not in my (what I thought identical) test program.

Last edited by johan162; 07-25-2010 at 06:15 AM.
 
Old 07-25-2010, 06:54 AM   #8
johan162
LQ Newbie
 
Registered: Jan 2009
Location: Stockholm
Distribution: SuSE 11.1
Posts: 27

Original Poster
Rep: Reputation: 0
Lightbulb Problem fully solved and understood

The core of the problem has to do with how the daemon is started. I normally use SuSE distribution but this is similar in other distributions as well.

My daemon uses a custom standard start script which is located under "/etc/init.d/ " where (according to LFS) system services start script are located. This in turn uses the templates provided to support the starting/stopping/restarting of a daemon common for all these boot scripts.

These templates sources a common shell script "/etc/rc.status" and in that shell script the first three lines are:
Code:
# Do _not_ be fooled by non POSIX locale
LC_ALL=POSIX
export LC_ALL
and there is the issue. All boot scripts are started in an environment with LC_ALL set to POSIX regardless of the system locale.

The reason my test programed worked was of course that I was starting that manually from the shell so the LC_ALL was correctly inherited from the shell.

So from this exercise we learn that all boot time start scripts uses POSIX locale and hence are immune for UTF8 encodings. I'm not sure I understand the reason for this restriction/practice but an interesting new knowledge at least for me.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
character encoding wakatana Linux - Newbie 3 10-13-2009 10:13 AM
Remote X character encoding problem vencik Debian 0 01-28-2008 06:01 AM
Character encoding problem for filename in ubuntu xenthos Linux - General 1 03-22-2006 12:59 PM
gnome-terminal and character encoding problem guillaume31 Linux - Software 0 03-13-2006 09:09 AM
Changing system character encoding to ISO 8859-1 flork SUSE / openSUSE 1 12-15-2005 06:21 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration