about I/O text files, UTF8 and C
How can I read and write text files (encoding: UTF8, Programming Lang: C, O.S.: Linux)? Is there any prepared source code? How about working with ‘\n’ or ‘\t’ … when the encoding is UTF8?
|
Not sure what you are asking here. Are you trying to write a program that can read and write to a text file?
|
That's exactly what C does best; essentially UTF-8 *is* 7-bit ASCII:
Code:
#include <stdio.h> http://www.cprogramming.com/tutorial/unicode.html http://www.ibm.com/developerworks/li.../l-linuni.html http://www.linux.org/docs/ldp/howto/...e-HOWTO-6.html 'Hope that helps .. PSM |
Quote:
To read any UTF-8 file (except in the trivial case when the file only contains 7-bit ASCII) then a UTF-8 library should be used. Since I normally use C++ rather than C I use the Qt library, however I'm sure that there is a good library within the Gnome project that you could use. For the serious checkout IBM's ICU library. |
|
Hi again, Badry -
Quote:
I want to emphasize - for the majority of string handling you're likely to do in C (and certainly for things like 8-bit '\n' and '\t' characters), you can essentially say "the standard C library gives you UTF-8 for free". Because the majority of string handling in most "classic" C programs *IS* 7-bit ASCII. Please follow up on some of the links provided. They're all good. I'd especially recommend this one: http://www.cprogramming.com/tutorial/unicode.html Your .. PSM |
Quote:
|
Thanks! I have written a similar code. However, it does not works. I want to read a Persian text. And after some simple modifications (such as finding tabs, points, ...), write to another file.
Quote:
|
Whoops, sorry badry!
If you want Persian text, then you're definitely *not* doing 7-bit ASCII. However, you *can* probably do everything you need with UTF-8. And the links I and others gave you should still be useful. Also: please look at this exchange: http://www.mail-archive.com/linux-ut.../msg05595.html 'Hope that helps .. PSM PS: You're already going UTF-8. That's definitely the right thing to do, and apologies again for confusing things. Here's a quote from the above link emphasizing this point: Quote:
|
Hi, Badry -
One more addendum - a *very* good article I'd like to see *every* developer read: http://www.joelonsoftware.com/articles/Unicode.html IMHO .. PSM PS: I know it helped clarify more than a couple of things for me! :) |
All times are GMT -5. The time now is 07:47 PM. |