LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-13-2012, 08:56 AM   #1
zomane
Member
 
Registered: Sep 2005
Location: Austria
Distribution: Debian, CentOS, OpenBSD, FreeBSD
Posts: 52

Rep: Reputation: 16
perl utf8 to ascii rename


Hello,
I've searched a lot with google but I cannot understand why when I'm using simple perl one-liner to transliterate file with utf8 content to ascii it works:

Initial sample.txt :
Code:
aasdööäääßßßßß.mp3
кирилиски имена тук.mp3
After perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' sample.txt

Code:
aasdooaaassssssssss.mp3
kiriliski imiena tuk.mp3
But this script doesn't want to rename files correctly:

Code:
#!/usr/bin/perl -w
use utf8;
use Text::Unidecode;
use warnings;
#use open qw/:std :utf8/;


if (!@ARGV) {
	@ARGV = <STDIN>;
	chop(@ARGV);
}

for (@ARGV) {
    	my $file = $_;
 	$new_file_name = unidecode($file);
#	binmode(STDOUT, ":utf8");
	print "$file \n";
	print "$new_file_name \n";
	rename $file, $new_file_name;
}
Result is:

Code:
aasdööäääßßßßß.mp3 
aasdAPAPA$?A$?A$?AAAAA.mp3 
кирилиски имена тук.mp3 
DoD,ND,D>>D,NDoD, D,D1/4DuD1/2Ddeg NNDo.mp3
Why Text::Unidecode transliterates file content correctly but filenames not?I cannot find any additional info regarding this in perldoc of that module.
What is wrong with my simple script ? Why it's producing junk not output same as one-liner?
Thank you in advance

P.S. - My locales:
Code:
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
================================================================

I've found something as a soluton:

I should set PERL_UNICODE to AS :

Code:
PERL_UNICODE=AS ./utf2ascii_renamer.pl *.mp3
aasdööäääßßßßß.mp3 
aasdooaaassssssssss.mp3 
кирилиски имена тук.mp3 
kiriliski imiena tuk.mp3
But this is a little bit ugly.
Do someone have an idea hot to look a little bit better than setting PERL_UNICODE before every run or setting it globally... maybe something with use ?
=============================================================


Code:
#!/usr/bin/perl -w -CSA
This above for me is a little bit better looking

-CSA is "magical" part:

-C flag controls Unicode features
A for Perl to treat your arguments as UTF-8 strings
S to set STDIN, STDOUT, and STDERR as UTF-8 filehandles

Last edited by zomane; 05-13-2012 at 10:27 AM. Reason: Solved...looks like
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can I remove BOM if I am going convert UTF8 to ASCII mlibot Linux - Newbie 3 08-06-2009 08:21 PM
convert file from UTF8 to ASCII encoding graemef Programming 8 12-15-2008 04:45 AM
in linux & c/c++: how do I convert an ascii string to utf8 & vice versa? davidh_uk Programming 2 02-06-2005 05:55 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration