ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
for any istream/ostream/wistream/wostream-based class (notice that both string and wstring are used, normally this gives compiler error). Obviously to do thi, I'll need to overload ostream/wostream operator<< and for that I'll need conversion routine.
I spent some time looking for solution, found following:
1) utf8_codecvt_facet. Looks good, but was meant to be used in wifstreams/wofstreams, so it might be difficult to use this thing for conversion between utf8<->wstring.
2) libiconv. Too heavy for lightweight project, requires to open source code, so I can't just took one routine from there. I also neet only utf8<->wstring conversion, so it looks like overkill.
3) This code:
It is closest to what I've been looking for, but apparently it doesn't work. I.e. narrow(widen(std::string("фыва"))) on machine with utf8 locale doesn't return "фыва", and I don't see second string ("mnopqrstuvwx") in terminal on my machine.
4) mbstowcs It will work, but I'd like to avoid C functions and changing C locale in this project.
But no "pure lightweight C++ solution without external libraries".
Right now I'm writing my own conversion routine using this specification, but I'd like to know If there is a standard way to convert between utf8 and wstring in pure C++ or not, and I suspect I missed something. Any ideas?
for any istream/ostream/wistream/wostream-based class (notice that both string and wstring are used, normally this gives compiler error). Obviously to do thi, I'll need to overload ostream/wostream operator<< and for that I'll need conversion routine.
I spent some time looking for solution, found following:
1) utf8_codecvt_facet. Looks good, but was meant to be used in wifstreams/wofstreams, so it might be difficult to use this thing for conversion between utf8<->wstring.
2) libiconv. Too heavy for lightweight project, requires to open source code, so I can't just took one routine from there. I also neet only utf8<->wstring conversion, so it looks like overkill.
3) This code:
It is closest to what I've been looking for, but apparently it doesn't work. I.e. narrow(widen(std::string("фыва"))) on machine with utf8 locale doesn't return "фыва", and I don't see second string ("mnopqrstuvwx") in terminal on my machine.
4) mbstowcs It will work, but I'd like to avoid C functions and changing C locale in this project.
But no "pure lightweight C++ solution without external libraries".
Right now I'm writing my own conversion routine using this specification, but I'd like to know If there is a standard way to convert between utf8 and wstring in pure C++ or not, and I suspect I missed something. Any ideas?
Thanks for your time.
Is there any particular reason you have the code at size 1?
Is there any particular reason you have the code at size 1?
I'm not sure what exactly you are talking about. Please, explain/elaborate. If you were asking why I'm not using wide character streams instead of single-character streams, this is because I have code that uses mixture of wstring/string classes. Moving everything to wstream is not possible, because certain calls require const char*, and some configuration files needs to be 8bit-compatible. so I'll run into utf8<->wchar_t conversion anyway, it simply can't be avoided. Also, storing data as utf8 in external files is more compact, less platform-dependant (for example windows wchar_t might be 2bytes long, while on linux it might be 4 bytes long), even when you use wchar_t-based strings internally.
Anyway, I met this problem conversion problem in the past few times and avoided it. So right now I'd like to know how to do conversion in the "right" way, in pure C++.
utf8_codecvt_facet should work for any stream. STL doesn't provide a wide stringstream directly, but you can typedef it from the std::basic_stringstream (and friends) template.
utf8_codecvt_facet should work for any stream. STL doesn't provide a wide stringstream directly, but you can typedef it from the std::basic_stringstream (and friends) template.
As I said, Id like to do conversion in pure C++, without external libraries (I also had some trouble ripping utf8_codecvt_facet from boost). C has mechanics for that (setlocale + wcstombs), so it would be strange if standard C++ doesn't allow that.
code conversion facet works, but make sure it meets the spec
Quote:
Originally Posted by ErV
I spent some time looking for solution, found following:
[size=1]
1) utf8_codecvt_facet. Looks good, but was meant to be used in wifstreams/wofstreams, so it might be difficult to use this thing for conversion between utf8<->wstring.
You could typedef a wide string stream and make a locale with this code conversion facet and imbue it into the string and then just reading the stream would do the conversion for you. n.b. The boost utf-8 conversion facet doesn't follow the unicode spec, and leaves you open to security problems with alternate overly long encodings (as does your implementation below. From the From the Unicode Standard Version 5.2:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.