LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-08-2012, 11:47 AM   #1
mad_mar
LQ Newbie
 
Registered: Feb 2012
Posts: 5

Rep: Reputation: Disabled
Error trying to convert file encoding


Hello,

I am trying to convert a large number of files from windows-1253 to utf-8 using the following code:

#!/bin/bash
FROM=windows-1253
TO=UTF-8
ICONV="iconv -f $FROM -t $TO"
# Convert
find ToUTF/ -type f -name "*" | while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done


When I run the file the result is:
line 10: syntax error near unexpected token `done'
line 10: `done'

Can anyone help me?

Thank you!!
 
Old 02-08-2012, 03:26 PM   #2
Harlin
Member
 
Registered: Dec 2004
Location: Atlanta, GA U.S.
Distribution: I play with them all :-)
Posts: 316

Rep: Reputation: 30
Have you tried:

changing

find ToUTF/ -type f -name "*" | while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done

to

while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done < find ToUTF/ -type f -name "*"

That may seem like "six or one half-dozen" but I have found that some bashes prefer the input at the end to while. Dunno why.

Also don't hesitate to put your code on paste.bin or dpaste and send the link here. That can help a lot!

</H>
 
Old 02-08-2012, 03:34 PM   #3
Harlin
Member
 
Registered: Dec 2004
Location: Atlanta, GA U.S.
Distribution: I play with them all :-)
Posts: 316

Rep: Reputation: 30
By the way, which Linux distro and version are you using with this? I'd be curious to know.

</H>
 
Old 02-08-2012, 03:39 PM   #4
Harlin
Member
 
Registered: Dec 2004
Location: Atlanta, GA U.S.
Distribution: I play with them all :-)
Posts: 316

Rep: Reputation: 30
You can also try a for loop. More readable. Less weirdness too with modern *nix variances :-D


#!/bin/bash

FROM=windows-1253
TO=UTF-8
ICONV="iconv -f $FROM -t $TO"

for each in $(ls ToUTF/)
do
cp $each $each.bak
cat $each.bak | $ICONV > $each
rm $each.bak
done
 
Old 02-09-2012, 09:16 AM   #5
mad_mar
LQ Newbie
 
Registered: Feb 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hello Harlin,

I 've tried both your suggestions but nothing happend.

The result for the first is

line 10: syntax error near unexpected token `done'
line 10: `done < find ToUTF/ -type f -name "*"'


and for the other

line 6: syntax error near unexpected token `$'do\r''
line 6: `do


I am using Gygwin Terminal 1.7.10-1 for windows vista.

Any other ideas or suggestions how to make this transformation massively from windows-1253 to utf8 otherwise I must do it manually in about 10000 files!!!
 
Old 02-09-2012, 01:32 PM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Perhaps there is a Windows/Linux end-of-line issue. You didn't happen to edit the script with a Windows editor, and try to run it on a Linux host, did you? That $'do\r' looks very suspicious.

--- rod.
 
Old 02-10-2012, 07:52 AM   #7
mad_mar
LQ Newbie
 
Registered: Feb 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
I use Notepad++ to edit the script, but I haven't got any pc with Linux to try it there.
 
Old 02-10-2012, 09:55 AM   #8
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I assume by 'Gygwin Terminal' you mean 'Cygwin Terminal', which I believe will want to see Linux-style end-of-lines. Cygwin should have a dos2unix or similar tool to perform the conversion. Or simply create a new version of the file in Cygwin.
--- rod.
 
Old 02-10-2012, 10:46 AM   #9
mad_mar
LQ Newbie
 
Registered: Feb 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
I cant't create a new file to Cygwin because I have only the terminal and I have the code into a .txt file. I 'll try to convert it and I 'll give you feedback.
 
Old 02-10-2012, 12:07 PM   #10
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Isn't your Cygwin terminal running a shell? If so, you should be able to use vi to re-create your script. Or, if you can use Windows copy/paste into the Cygwin window, simply start cat redirected to you script file, paste the text of the script, and type Ctrl-D to close the file.

--- rod.
 
Old 02-11-2012, 08:17 AM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Sigh, where to begin?

To start with, everybody here needs to please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.

Use quote tags when you want to highlight text from a previous post that you want to respond to.

Quote:
Originally Posted by Harlin
That may seem like "six or one half-dozen" but I have found that some bashes prefer the input at the end to while. Dunno why.
It's not a "preference" thing. The main reason to put the redirection at the end is because commands in pipe sequences are executed in subshells. Any changes made in a subshell environment will be lost when the command terminates. In a situation like this, there are no environmental changes that need to be saved, so either way would work.

http://mywiki.wooledge.org/BashFAQ/024

This however, does not work:
Code:
done < find ToUTF/ -type f -name "*"
Only the first word in a command string is considered a command name, so the find command will not run here. You need to run it inside a process substitution bracket, or use a similar technique.

In any case, reading file names from a command needs to be handled carefully, as word-splitting can cause all sorts of problems. Generally you'll have to use null-separators to ensure that the output of find is read correctly. See here:

http://mywiki.wooledge.org/BashFAQ/001

Next:
Code:
for each in $(ls ToUTF/)
Do not read lines with a for loop, and do not try parsing ls for filenames. The same word-splitting problems affect it as well.

http://mywiki.wooledge.org/DontReadLinesWithFor
http://mywiki.wooledge.org/ParsingLs

Globbing patterns, on the other hand, work very well in for loops.

Quote:
Originally Posted by mad_mar
Code:
ICONV="iconv -f $FROM -t $TO"
Variables are not designed for storing and running commands. Use a function instead.

http://mywiki.wooledge.org/BashFAQ/050


QUOTE ALL OF YOUR VARIABLE SUBSTITUTIONS. You should never leave the quotes off a variable expansion unless you explicitly want the resulting string to be word-split by the shell. This is a vitally important concept in scripting, so train yourself to do it correctly now. You can learn about the exceptions later.

http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes


Environment variables are generally all upper-case. So while not absolutely necessary, it's good practice to keep your own user variables in lower-case or mixed-case, to help differentiate them.


One more thing. According to iconv -l on my system there is no "windows-1253", but there are "WINDOWS-1253" and "CP1253". I don't know if capitalization is important in iconv, but I thought I'd suggest it. This assumes the gnu version of iconv, of course.


The final script as I'd write it:
Code:
#!/bin/bash

convert_it() {
	local from="WINDOWS-1253" to="UTF-8"
	iconv -f "$from" -t "$to" -o "$1.bak" "$1"
	mv -f "$1.bak" "$1"
}

while IFS="" read -r -d "" file; do

	convert_it "$file"

done < <( find ToUTF/ -type f -print0 )
Other than this, I'd say that a dos/unix line-ending problem is likely involved. Run the script through a line-ending converter before running it.


Edit: After a couple of tests, it appears that like iconv's -o option can be used to modify the original file in-place. So the function can be simplified to this:

Code:
convert_it() {
	local from="WINDOWS-1253" to="UTF-8"
	iconv -f "$from" -t "$to" -o "$1" "$1"
}

Last edited by David the H.; 02-11-2012 at 08:30 AM. Reason: as stated, plus minor corrections
 
Old 02-13-2012, 06:57 AM   #12
mad_mar
LQ Newbie
 
Registered: Feb 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hello,

David the H. I used the code you suggested with the updated version of convert_it()

Code:
#!/bin/bash

convert_it() {
	local from="WINDOWS-1253" to="UTF-8"
	iconv -f "$from" -t "$to" -o "$1" "$1"
}

while IFS="" read -r -d "" file; do

	convert_it "$file"

done < <( find ToUTF/ -type f -print0 )
I made an EOL conversion from Windows to Unix format using Notepad++. The result I get is the following

Code:
line 12: syntax error near unexpected token `<'
line 12: `done < <( find ToUTF/ -type f -print0 )'
Any suggestions?

Quote:
Variables are not designed for storing and running commands. Use a function instead.
I saw it in practice because when I used my original code I had just emptied my files!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert a windows like text file into *nix like utf-8 encoding automately? kcynice Linux - Newbie 2 03-23-2011 09:23 PM
convert file from UTF8 to ASCII encoding graemef Programming 8 12-15-2008 05:45 AM
Convert file from ISO-8859-1 to some Japanese encoding? (iconv errors) violagirl23 Linux - Software 5 03-26-2008 01:13 AM


All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration