how to detect the charset of a string
I can convert a string from a charset to a other charset.example from Big5 to utf8.
But when I read a string in.how can I get the charset of the string ? thanks a lot |
First off, what language would you be using?
|
trial and error
the tool you want is iconv: put you text into a file and convert it usind iconv (I don't know if there is a GUI tool). or if you only want to convert filenames (rename them) you should check convmv.
concerning detection of the source charset I don't know of any tool for the job. as far as I know you only choice is trial and error, meaning that you guess the source encoding and check whether the output is as you want it. |
Hello.
Try `konwert': Code:
cat file | konwert any/ru-koi8r | less Code:
cat file | konwert any/ru-test From manpage: Code:
Currently supported languages are cs (Czech), de (German), Hope this is useful. Bye. P.S.: I don't know is there a C language API to konwert's functionality (iconv have such API). I think, no. |
as far as i know, there is no way to know the typeset of a string... and it is the same for a raw text file, the only thing you could try is to guess the typeset from what is in it, but not much more... :o
|
To detect source charset I use package enca. Homepage: trific.ath.cx/software/enca/ (to workaround url pub limit).
|
I study the source code of mozilla,there are some codes are used to auto detect the charset of a string ,but it is too complex.I wanna get a simplified algorithm or policy of a auto detecting charset like mozilla
thanks a lot |
All times are GMT -5. The time now is 03:04 AM. |