LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   How to convert files to UTF-8 (https://www.linuxquestions.org/questions/linux-software-2/how-to-convert-files-to-utf-8-a-807144/)

webhope 05-11-2010 07:53 AM

How to convert files to UTF-8
 
Hi, I have sites written in Win-1250 alias cp150. I need to convert all files to UTF-8. I know about iconv but I found problem with this tool. When I tried to convert I had to make a copy of the original file. But I have many files in (sub)folders. So If I would want to use iconv instead, I would need to convert all files from *.php to *.tmp and than copy or rename back to *.php. This looks too difficult. Any other way how to do it, something intelligent and simple?

Thanx

pixellany 05-11-2010 08:11 AM

You already have the utility (iconv). You simply have to find the files to be converted and then decide where to put the converted files. (This is the "intelligent and simple" way to do it)

One possibility:
Create a directory where all the converted files will go. Suppose it is "nufiles". Then, simply do this:

Code:

for filename in *; do iconv <options> $filename > nufiles/$filename; done
If the files are in many different places, then something like this:
Code:

for filename in $(find <path> -name "*.php*") do
    iconv <options> $filename > tmpfile
    mv tmpfile $filename
done


webhope 05-11-2010 08:27 AM

Thanx I will try

webhope 05-11-2010 09:13 AM

I have this code:
Code:

path="www"
clear
for filename in $(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css"); do
 iconv -f WINDOWS-1250 -t UTF-8 < $filename > tmpfile
 mv tmpfile $filename
done

However it returns some error messages:

- iconv: enter sequence not allowed on position 263
- file does not exist
- mv: cannot get informations about *„tmpfile“ - file does not exist

pixellany 05-11-2010 11:56 AM

Code:

iconv -f WINDOWS-1250 -t UTF-8 < $filename > tmpfile
The syntax for iconv is:
iconv -f oldformat -t newformat file

I don't know if your form is equivalent, but I would try it first as specified. Adding the redirection "> tmpfile" is fine regardless

webhope 05-11-2010 02:44 PM

Well I try your syntax and much less errors happened.

I found this error:

file name: "www/information/select, insert, update.txt"

Returns:
iconv: file „www/information/select,“ cannot open: file not found
iconv: file „insert,“ cannot open: file not found
iconv: file „update.txt“ cannot open: file not found

Edit:
Solution: "$filename"

pixellany 05-11-2010 04:49 PM

My hunch is that you need to look thru all your subfolders and find whatever is causing this----starting with how $filename gets set to that long string.

When you find the directory with the evidence, run the command there.

Be sure to look for hidden files, also...

ls -a
ls -al

webhope 05-12-2010 04:26 AM

Well 1st problem disappeared. I don't know how. I just copied new (original) folder and run the script again. The php files are right converted.

However see this:

Code:

# pwd
www/system/vstupy/+info
# ls -l
celkem 17
-rwxrwxrwx 1 root root    0 2010-05-12 11:11 info*
-rwxrwxrwx 1 root root 3618 2005-04-17 18:23 info - errors.txt*
-rwxrwxrwx 1 root root  956 2005-04-07 21:37 info - funkce.txt*
-rwxrwxrwx 1 root root  424 2005-04-07 21:31 info - parametry.txt*
-rwxrwxrwx 1 root root 1394 2005-04-17 17:46 info - site calling.txt*
-rwxrwxrwx 1 root root  775 2005-04-07 21:39 info - used variables.txt*
# ls -a
./  info*              info - funkce.txt*    info - site calling.txt*
../  info - errors.txt*  info - parametry.txt*  info - used variables.txt*

Code:

++ iconv -f WINDOWS-1250 -t UTF-8 www/system/vstupy/+info/info
iconv: input file „www/system/vstupy/+info/info“ cannot open: file not found
++ mv tmpfile www/system/vstupy/+info/info
++ for filename in '$(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css")'
++ iconv -f WINDOWS-1250 -t UTF-8 -
^C

On the last line it stopped because of the file has 0 length I think.
I added -size +10 to end of the find command

Edit:
No success.

I delete the file "info" and run again. And it stopped with the same error. I dont know what to do with it.

Note:
The iconv error illegal (not allowed) input sequence on position n means that the file was already converted and I converted it twice.

webhope 05-12-2010 07:00 AM

Do you see the file name "info - errors.txt"? So the space is like separator even I have "$filename" in the ""
Code:

iconv: input file „www/system/vstupy/+info/info“
++ iconv -f WINDOWS-1250 -t UTF-8 -

The first is "info" - does not exist
The second is "-" ... and this causes that process stoped like "read" command.

What I need to do to fix the syntax?

This is my code:

Code:

path="www"
for filename in $(find $path -name "*.php" -o -name "*.inc" -o -name "*.txt" -o -name "*.cfg" -o -name "*.dat" -o -name "*.css"); do
 iconv -f WINDOWS-1250 -t UTF-8 "$filename" > tmpfile
 mv tmpfile "$filename"
done


webhope 05-12-2010 09:50 AM

Maybe is the problem in the separator of find command?
Does the find see the "info - errors.txt" file as three files? I think so. Then I set

Code:

IFS=$'\n'
And it works!

pixellany 05-12-2010 10:19 AM

Indeed---spaces in filenames not good.

I would advise changing them all ( a simple script can--eg-- change all the spaces to "_" )

MTK358 05-12-2010 10:27 AM

I also find spaces in filenames a pain in the @$$.

I basically made a rule for myself to treat spaces as if they are illegal filename characters.

webhope 05-12-2010 01:42 PM

Quote:

Originally Posted by pixellany (Post 3965832)
Indeed---spaces in filenames not good.

I would advise changing them all ( a simple script can--eg-- change all the spaces to "_" )

I better like spaces when not dealing with files for a www site. These files are just descriptive *.txt.

I think it is posible to remake the command to syntax like
Code:

find -name "*.php" -exec iconv <options> {}
not sure the syntax is correct.

MTK358 05-12-2010 01:44 PM

Quote:

Originally Posted by webhope (Post 3966054)
I better like spaces when not dealing with files for a www site. This files are just descriptive *.txt.

I don't understand.

webhope 05-12-2010 01:53 PM

Quote:

Originally Posted by MTK358 (Post 3966056)
I don't understand.

I leave spaces when I create web sites. But if it is not a web site file, I better like spaces. The file name is better to be readable.


All times are GMT -5. The time now is 02:22 PM.