perl UTF8 command line input
the problem is simple: how do i get perl to accept command-line arguments that are UTF8 or unicode characters?
i started using perl only recently and would say im a beginner/novice. however, i have done an extensive amount of searching and testing and cannot find a solution.
im using perl v5.8 on windows xp, however im sure if someone knows a solution for unix it will be similar in windows. in my testing and searching it seems that it may be a limitation of the console that i am using (i.e. cmd.exe); but even narrowing this down to whether it is a perl problem or console problem would be a great help.
most of what i found on the internet discusses handling UTF8 internally (ie within a perl script) or IO with files. i know how to do this but it is not what i want. i want to be able to run something like this: "perl myScript.pl ←" and have the script be able to print the argument to the console properly. the arrow is the unicode character U+2190. also the arrow isnt specifically what i want to print--i want to be able to print almost any unicode character. right now i can only get it to handle ASCII and Extended ASCII characters (integral values 1-255).
to me it seems like a console problem because as soon as i read the argument it has a value of 63, which is the ASCII value for question mark. the behaviour of cmd.exe is to print a question mark for characters it does not know. i am, however, able to store the hex value for the unicode character arrow in a perl script and have it printed, so the font i am using supports the character. again it just seems to be a problem with cmd.exe sending the character to perl.
i have tried many perl things such as: use utf8, use Encode, utf8::encode/decode, encode/decode_utf8, binmode STDIN/STDOUT ":encoding(UTF-8)". also there is a perl switch "-C"--which is supposed to make perl think command arguments are UTF-8--that i have tried which doesnt seem to help. specific to windows, i set the font to Lucida Console (the only unicode font it supports) and code page to 65001 (which is for UTF-8).
is anyone able to do this, in any operating system? i read somewhere that perl command line input is done only with ISO-8859-1 (the Latin 1 character encoding, known as codepage 1252 in windows), but this allows for only 191 characters, all of which are in the ASCII/Extended ASCII set. if this is the case and perl does not allow unicode characters as command line arguments, i can live with that. it just seems strange because i only read that from one source.
please let me know if you need any more information. your help is greatly appreciated.
Last edited by nadroj; 08-25-2008 at 11:28 PM.