LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 05-11-2010, 07:53 AM   #1
webhope
Member
 
Registered: Apr 2010
Posts: 184

Rep: Reputation: 30
How to convert files to UTF-8


Hi, I have sites written in Win-1250 alias cp150. I need to convert all files to UTF-8. I know about iconv but I found problem with this tool. When I tried to convert I had to make a copy of the original file. But I have many files in (sub)folders. So If I would want to use iconv instead, I would need to convert all files from *.php to *.tmp and than copy or rename back to *.php. This looks too difficult. Any other way how to do it, something intelligent and simple?

Thanx
 
Old 05-11-2010, 08:11 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
You already have the utility (iconv). You simply have to find the files to be converted and then decide where to put the converted files. (This is the "intelligent and simple" way to do it)

One possibility:
Create a directory where all the converted files will go. Suppose it is "nufiles". Then, simply do this:

Code:
for filename in *; do iconv <options> $filename > nufiles/$filename; done
If the files are in many different places, then something like this:
Code:
for filename in $(find <path> -name "*.php*") do
    iconv <options> $filename > tmpfile
    mv tmpfile $filename
done
 
Old 05-11-2010, 08:27 AM   #3
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Thanx I will try
 
Old 05-11-2010, 09:13 AM   #4
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
I have this code:
Code:
path="www"
clear
for filename in $(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css"); do
 iconv -f WINDOWS-1250 -t UTF-8 < $filename > tmpfile
 mv tmpfile $filename
done
However it returns some error messages:

- iconv: enter sequence not allowed on position 263
- file does not exist
- mv: cannot get informations about *„tmpfile“ - file does not exist

Last edited by webhope; 05-12-2010 at 04:32 AM.
 
Old 05-11-2010, 11:56 AM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Code:
iconv -f WINDOWS-1250 -t UTF-8 < $filename > tmpfile
The syntax for iconv is:
iconv -f oldformat -t newformat file

I don't know if your form is equivalent, but I would try it first as specified. Adding the redirection "> tmpfile" is fine regardless

Last edited by pixellany; 05-11-2010 at 11:58 AM.
 
Old 05-11-2010, 02:44 PM   #6
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Well I try your syntax and much less errors happened.

I found this error:

file name: "www/information/select, insert, update.txt"

Returns:
iconv: file „www/information/select,“ cannot open: file not found
iconv: file „insert,“ cannot open: file not found
iconv: file „update.txt“ cannot open: file not found

Edit:
Solution: "$filename"

Last edited by webhope; 05-12-2010 at 04:30 AM.
 
Old 05-11-2010, 04:49 PM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
My hunch is that you need to look thru all your subfolders and find whatever is causing this----starting with how $filename gets set to that long string.

When you find the directory with the evidence, run the command there.

Be sure to look for hidden files, also...

ls -a
ls -al
 
Old 05-12-2010, 04:26 AM   #8
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Well 1st problem disappeared. I don't know how. I just copied new (original) folder and run the script again. The php files are right converted.

However see this:

Code:
# pwd
www/system/vstupy/+info
# ls -l
celkem 17
-rwxrwxrwx 1 root root    0 2010-05-12 11:11 info*
-rwxrwxrwx 1 root root 3618 2005-04-17 18:23 info - errors.txt*
-rwxrwxrwx 1 root root  956 2005-04-07 21:37 info - funkce.txt*
-rwxrwxrwx 1 root root  424 2005-04-07 21:31 info - parametry.txt*
-rwxrwxrwx 1 root root 1394 2005-04-17 17:46 info - site calling.txt*
-rwxrwxrwx 1 root root  775 2005-04-07 21:39 info - used variables.txt*
# ls -a
./   info*               info - funkce.txt*     info - site calling.txt*
../  info - errors.txt*  info - parametry.txt*  info - used variables.txt*
Code:
++ iconv -f WINDOWS-1250 -t UTF-8 www/system/vstupy/+info/info
iconv: input file „www/system/vstupy/+info/info“ cannot open: file not found
++ mv tmpfile www/system/vstupy/+info/info
++ for filename in '$(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css")'
++ iconv -f WINDOWS-1250 -t UTF-8 -
^C
On the last line it stopped because of the file has 0 length I think.
I added -size +10 to end of the find command

Edit:
No success.

I delete the file "info" and run again. And it stopped with the same error. I dont know what to do with it.

Note:
The iconv error illegal (not allowed) input sequence on position n means that the file was already converted and I converted it twice.

Last edited by webhope; 05-12-2010 at 04:47 AM.
 
Old 05-12-2010, 07:00 AM   #9
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Do you see the file name "info - errors.txt"? So the space is like separator even I have "$filename" in the ""
Code:
iconv: input file „www/system/vstupy/+info/info“
++ iconv -f WINDOWS-1250 -t UTF-8 -
The first is "info" - does not exist
The second is "-" ... and this causes that process stoped like "read" command.

What I need to do to fix the syntax?

This is my code:

Code:
path="www"
for filename in $(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css"); do
 iconv -f WINDOWS-1250 -t UTF-8 "$filename" > tmpfile
 mv tmpfile "$filename"
done

Last edited by webhope; 05-12-2010 at 08:26 AM.
 
Old 05-12-2010, 09:50 AM   #10
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Maybe is the problem in the separator of find command?
Does the find see the "info - errors.txt" file as three files? I think so. Then I set

Code:
IFS=$'\n'
And it works!
 
Old 05-12-2010, 10:19 AM   #11
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Indeed---spaces in filenames not good.

I would advise changing them all ( a simple script can--eg-- change all the spaces to "_" )
 
Old 05-12-2010, 10:27 AM   #12
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
I also find spaces in filenames a pain in the @$$.

I basically made a rule for myself to treat spaces as if they are illegal filename characters.
 
Old 05-12-2010, 01:42 PM   #13
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by pixellany View Post
Indeed---spaces in filenames not good.

I would advise changing them all ( a simple script can--eg-- change all the spaces to "_" )
I better like spaces when not dealing with files for a www site. These files are just descriptive *.txt.

I think it is posible to remake the command to syntax like
Code:
find -name "*.php" -exec iconv <options> {}
not sure the syntax is correct.

Last edited by webhope; 05-12-2010 at 02:04 PM.
 
Old 05-12-2010, 01:44 PM   #14
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Quote:
Originally Posted by webhope View Post
I better like spaces when not dealing with files for a www site. This files are just descriptive *.txt.
I don't understand.
 
Old 05-12-2010, 01:53 PM   #15
webhope
Member
 
Registered: Apr 2010
Posts: 184

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by MTK358 View Post
I don't understand.
I leave spaces when I create web sites. But if it is not a web site file, I better like spaces. The file name is better to be readable.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert UTF-8 to wchar_t navinkaus Programming 1 12-21-2008 07:51 AM
Converting UTF-16 files to another encoding (such as UTF-8) crisostomo_enrico Solaris / OpenSolaris 3 03-25-2008 05:30 PM
convert text-file from utf-8 to iso-8859-1 [SOLVED] @ngelot Linux - Server 1 06-12-2007 05:47 AM
I need perlscript to convert text file in UTF-16 cccc Programming 3 07-04-2004 04:08 AM
convert CSV (TEXT) files to UTF-16 cccc Programming 1 07-01-2004 01:54 AM


All times are GMT -5. The time now is 01:13 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration