LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   C++ standard wide iostreams code conversion - outputs garbage (https://www.linuxquestions.org/questions/programming-9/c-standard-wide-iostreams-code-conversion-outputs-garbage-759157/)

yeye_olive 10-02-2009 03:51 AM

C++ standard wide iostreams code conversion - outputs garbage
 
Hello all,

I am trying to have a C++ program output unicode strings on std::wcout. I use a UTF-8 based locale (either fr_FR.UTF-8 or en_US.UTF-8). The following code

Code:

#include <iostream>

int main(int, char **) {
        std::wcout.imbue(std::locale(""));
        std::wcout << L"\x2026\x00e9";
        return 0;
}

where L"\x2026\x00e9" is the same as L"…é", outputs some garbage: the sequence of bytes 2e 2e 2e 3f.

The C equivalent works perfectly though:

Code:

#include <assert.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>

int main(int argc, char **argv) {
        setlocale(LC_ALL, "");
        assert(fwide(stdout, 1) > 0);
        fputws(L"\x2026\x00e9", stdout);
        return 0;
}

I confirmed the issue on Gentoo with gcc 4.3.2 and Ubuntu with gcc 4.3.3. Oddly enough, doing the conversion explicitly with std::codecvt works:

Code:

#include <cassert>
#include <iostream>
#include <locale>

typedef std::codecvt<wchar_t, char, std::mbstate_t> CodeCvt;

int main(int, char **) {
        std::mbstate_t state = {0,};

        wchar_t source[] = L"\x2026\x00e9";
        const wchar_t *sourceEnd = source + sizeof(source) / sizeof(*source);

        char dest[100];
        char *destEnd = dest + sizeof(dest) / sizeof(*dest);

        std::cout.imbue(std::locale(""));
        const CodeCvt& facet = std::use_facet<CodeCvt>(std::cout.getloc());

        const wchar_t *sourceNext;
        char *destNext;
        assert(facet.out(state, source, sourceEnd, sourceNext, dest, destEnd, destNext) == std::codecvt_base::ok && sourceNext == sourceEnd);

        std::cout << dest;
        return 0;
}

Has anyone any explanation for this behaviour? Am I missing something?

smeezekitty 10-02-2009 05:07 PM

operating system?

yeye_olive 10-03-2009 01:44 AM

I run Gentoo Linux with gcc 4.3.2 and Ubuntu Linux with gcc 4.3.3 and both are affected.

ta0kira 10-03-2009 01:49 AM

Have you tried compiling the C one with g++? I'd like to know what happens in that case.
Kevin Barry

neonsignal 10-03-2009 08:30 AM

Interestingly, it works if the global locale is set before the stream is used, ie
Code:

std::locale::global(std::locale("fr_FR.UTF-8"));
It seems that imbue is affecting only formatting, not the encoding.

That would be painful if you wanted to have streams with different locales.

---edit---
ah, explanation here

Turning off the standard stream synchronization at the start of the program is a nicer way to get the desired behaviour:
Code:

std::ios::sync_with_stdio(false);

yeye_olive 10-04-2009 01:09 PM

@ta0kira

gcc or g++ are both affected.

@neonsignal

Thank you, that fixed it! That was a nasty one. I suppose the compromise the libstdc++ devs made makes sense, but I wish they documented it clearly.


All times are GMT -5. The time now is 09:12 AM.