LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Other *NIX Forums > Solaris / OpenSolaris
User Name
Password
Solaris / OpenSolaris This forum is for the discussion of Solaris and OpenSolaris.
General Sun, SunOS and Sparc related questions also go here.

Notices

Reply
 
Search this Thread
Old 03-25-2008, 03:47 PM   #1
crisostomo_enrico
Member
 
Registered: Dec 2005
Location: Madrid
Distribution: Solaris 10, Solaris Express Community Edition
Posts: 547

Rep: Reputation: 35
Converting UTF-16 files to another encoding (such as UTF-8)


Hi.

I received a bunch (>1700) of scripts generated by a Microsoft SQL Server Enterprise Manager and I must work on them. I think they are UTF-16 files, which is the internal representation of text of Windows >= 2000 and on Solaris they just appear as data.
Quote:
bash-3.2$ file dbo.tTransactionIncidents.TAB
dbo.tTransactionIncidents.TAB: data
I mean, I cannot grep or sed through them if I don't re-encode them. With vim, I can :set fileencoding=utf-8, then update and write the file, and it works, but the problem is that the number of files is so high that I need a way to do it with a script and I'm not aware of any tool or command (not even vim) to do the work with.

Have you got any suggestion?
Thanks a lot,
Enrico.
 
Old 03-25-2008, 05:04 PM   #2
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86; Gentoo PPC; Gentoo Sparc64; FreeBSD; OS X; Solaris
Posts: 3,731
Blog Entries: 4

Rep: Reputation: 68
Code:
#!/usr/bin/ruby

require 'iconv'
ic = Iconv.new("ASCII", "UTF-16LE") # replace 'ASCII' with 'UTF-8' if you prefer

ARGV.each do |file|
  in_file = File.new(file).readlines
  out_file = File.new("#{file}.out", "w")
  in_file.each do |line|
    out_file.write(ic.iconv(line))
  end
  out_file.close
end
Note: This is untested. Will re-encode all input files to ascii and name as: "original_name.out".
You will need to use shell globbing or find/xargs to supply it with all your file names.

HTH

Edit:

You can skip the middleman. Ruby iconv is just a wrapper for the iconv C library/utility. Have a look at 'man iconv'.

Last edited by bulliver; 03-25-2008 at 05:14 PM.
 
Old 03-25-2008, 05:20 PM   #3
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,493

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Or simpler:
Code:
iconv -f UTF-16 -t UTF-8 file
 
Old 03-25-2008, 05:30 PM   #4
crisostomo_enrico
Member
 
Registered: Dec 2005
Location: Madrid
Distribution: Solaris 10, Solaris Express Community Edition
Posts: 547

Original Poster
Rep: Reputation: 35
Thank you very much, to both of you, it works!

Enrico.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Change Encoding from UTF-8 to CP1256 AGazzaz Linux - Newbie 4 12-21-2007 04:58 PM
im getting UTF-8 to STRING: Could not open converter from 'UTF-8' to 'ISO-8859-1' jabka Linux - Newbie 2 11-24-2006 05:44 AM
determine encoding type of a file (ie - UTF-8) chovy Linux - Software 1 04-03-2006 12:46 AM
[Enter] in text documents diffrent on Windows and Linux? UTF-8/UTF-16 problem or? brynjarh Linux - General 1 11-24-2004 05:20 AM
X11 / UTF-8 locale seems missing 'fr_FR.UTF-8' chrsitophermann Debian 11 07-17-2004 02:04 PM


All times are GMT -5. The time now is 01:55 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration