Hello,
I've searched a lot with google but I cannot understand why when I'm using simple perl one-liner to transliterate file with utf8 content to ascii it works:
Initial sample.txt :
Code:
aasdööäääßßßßß.mp3
кирилиски имена тук.mp3
After
perl -C -MText::Unidecode -n -i -e'print unidecode( $_)' sample.txt
Code:
aasdooaaassssssssss.mp3
kiriliski imiena tuk.mp3
But this script doesn't want to rename files correctly:
Code:
#!/usr/bin/perl -w
use utf8;
use Text::Unidecode;
use warnings;
#use open qw/:std :utf8/;
if (!@ARGV) {
@ARGV = <STDIN>;
chop(@ARGV);
}
for (@ARGV) {
my $file = $_;
$new_file_name = unidecode($file);
# binmode(STDOUT, ":utf8");
print "$file \n";
print "$new_file_name \n";
rename $file, $new_file_name;
}
Result is:
Code:
aasdööäääßßßßß.mp3
aasdAPAPA$?A$?A$?AAAAA.mp3
кирилиски имена тук.mp3
DoD,ND,D>>D,NDoD, D,D1/4DuD1/2Ddeg NNDo.mp3
Why Text::Unidecode transliterates file content correctly but filenames not?I cannot find any additional info regarding this in perldoc of that module.
What is wrong with my simple script ? Why it's producing junk not output same as one-liner?
Thank you in advance
P.S. - My locales:
Code:
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
================================================================
I've found something as a soluton:
I should set PERL_UNICODE to AS :
Code:
PERL_UNICODE=AS ./utf2ascii_renamer.pl *.mp3
aasdööäääßßßßß.mp3
aasdooaaassssssssss.mp3
кирилиски имена тук.mp3
kiriliski imiena tuk.mp3
But this is a little bit ugly.
Do someone have an idea hot to look a little bit better than setting PERL_UNICODE before every run or setting it globally... maybe something with
use ?
=============================================================
Code:
#!/usr/bin/perl -w -CSA
This above for me is a little bit better looking
-CSA is "magical" part:
-C flag controls Unicode features
A for Perl to treat your arguments as UTF-8 strings
S to set STDIN, STDOUT, and STDERR as UTF-8 filehandles