-   Solaris / OpenSolaris (
-   -   Converting UTF-16 files to another encoding (such as UTF-8) (

crisostomo_enrico 03-25-2008 03:47 PM

Converting UTF-16 files to another encoding (such as UTF-8)

I received a bunch (>1700) of scripts generated by a Microsoft SQL Server Enterprise Manager and I must work on them. I think they are UTF-16 files, which is the internal representation of text of Windows >= 2000 and on Solaris they just appear as data.

bash-3.2$ file dbo.tTransactionIncidents.TAB
dbo.tTransactionIncidents.TAB: data
I mean, I cannot grep or sed through them if I don't re-encode them. With vim, I can :set fileencoding=utf-8, then update and write the file, and it works, but the problem is that the number of files is so high that I need a way to do it with a script and I'm not aware of any tool or command (not even vim) to do the work with.

Have you got any suggestion?
Thanks a lot,

bulliver 03-25-2008 05:04 PM



require 'iconv'
ic ="ASCII", "UTF-16LE") # replace 'ASCII' with 'UTF-8' if you prefer

ARGV.each do |file|
  in_file =
  out_file ="#{file}.out", "w")
  in_file.each do |line|

Note: This is untested. Will re-encode all input files to ascii and name as: "original_name.out".
You will need to use shell globbing or find/xargs to supply it with all your file names.



You can skip the middleman. Ruby iconv is just a wrapper for the iconv C library/utility. Have a look at 'man iconv'.

jlliagre 03-25-2008 05:20 PM

Or simpler:

iconv -f UTF-16 -t UTF-8 file

crisostomo_enrico 03-25-2008 05:30 PM

Thank you very much, to both of you, it works!


All times are GMT -5. The time now is 02:43 PM.