nadroj 08-25-2008 11:21 PM

perl UTF8 command line input
the problem is simple: how do i get perl to accept command-line arguments that are UTF8 or unicode characters?

i started using perl only recently and would say im a beginner/novice. however, i have done an extensive amount of searching and testing and cannot find a solution.

im using perl v5.8 on windows xp, however im sure if someone knows a solution for unix it will be similar in windows. in my testing and searching it seems that it may be a limitation of the console that i am using (i.e. cmd.exe); but even narrowing this down to whether it is a perl problem or console problem would be a great help.

most of what i found on the internet discusses handling UTF8 internally (ie within a perl script) or IO with files. i know how to do this but it is not what i want. i want to be able to run something like this: "perl ←" and have the script be able to print the argument to the console properly. the arrow is the unicode character U+2190. also the arrow isnt specifically what i want to print--i want to be able to print almost any unicode character. right now i can only get it to handle ASCII and Extended ASCII characters (integral values 1-255).

to me it seems like a console problem because as soon as i read the argument it has a value of 63, which is the ASCII value for question mark. the behaviour of cmd.exe is to print a question mark for characters it does not know. i am, however, able to store the hex value for the unicode character arrow in a perl script and have it printed, so the font i am using supports the character. again it just seems to be a problem with cmd.exe sending the character to perl.

i have tried many perl things such as: use utf8, use Encode, utf8::encode/decode, encode/decode_utf8, binmode STDIN/STDOUT ":encoding(UTF-8)". also there is a perl switch "-C"--which is supposed to make perl think command arguments are UTF-8--that i have tried which doesnt seem to help. specific to windows, i set the font to Lucida Console (the only unicode font it supports) and code page to 65001 (which is for UTF-8).

is anyone able to do this, in any operating system? i read somewhere that perl command line input is done only with ISO-8859-1 (the Latin 1 character encoding, known as codepage 1252 in windows), but this allows for only 191 characters, all of which are in the ASCII/Extended ASCII set. if this is the case and perl does not allow unicode characters as command line arguments, i can live with that. it just seems strange because i only read that from one source.

please let me know if you need any more information. your help is greatly appreciated.

nadroj 08-26-2008 06:23 PM

update: i was able to give this a shot on opensuse. windows disappoints again: my problem is solved in linux with a two-line perl script. ive spent almost two weeks trying to get this to work on windows, and unfortunately that is the target. so this looks like it may be more of a windows-specific problem.

so, has anyone been able to receive unicode characters as command line arguments specifically on windows?


chrism01 08-29-2008 03:37 AM

These guys should have the answer:

nadroj 08-29-2008 09:02 PM

first: thanks chris. unfortunately, i think ive read every page on the web that has the words 'perl' and 'utf8' in it. as stated earlier, this IS a windows console (cmd.exe) problem, as the same script works on linux with little to no terminal configuration. what i had to settle with was to use windows' "code page 1252", which includes the ISO standards 8859-1 and 8859-15, which cover most western and European languages. its no universal unicode, but its better than nothing.

though if anyone does ever find a solution, please post it. i will _always_ be interested, especially considering the ~3 weeks i spent researching this. thanks

chrism01 08-31-2008 06:09 PM

Did you go there? Its a q & A site for ANY Perl qn (any platform). Some of the guys who write Perl itself (& the books) hangout there...

nadroj 08-31-2008 09:05 PM

ive searched there during my research, but i didnt go and post. i just made a post now, so ill let you know if anything turns up. thanks

