to identify the language of string in perl?
hi sir/hello friends
i am stuck at a point in perl. i coded to read a excel file which consist of English and Arabic strings in perl. my task is to read each string from file and get the language type(English/Arabic). ========read each cell from excel file========================== my $workbook = Spreadsheet::WriteExcel->new("test.xls"); for my $row ( $row_min .. $row_max ) { for my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); next unless $cell; print "Row, Col = ($row, $col)\n"; print "Value = ", $cell->value(), } ================================================================== how to get the language type of string stored in $cell->value(). help me out |
how the excel file stored? is this an xls, or ??
can you give us a small example? |
it is in .xls formate.
data stored in file is like this- سبب التاععع 57675 انتلب لبال ععع شلسا لشت allah ishwar is great pray him |
see $cell->encoding(); and $cell->get_rich_text(); this one will give you some information about the fonts used
|
hello sir/hi friends
I am trying to write a script in perl which reads a .xls (EXCEL) file and writes to other .xls(EXCEL) file in UNIX. Entries in the file to be read are in ARABIC and ENGLISH. My script is writing ENGLISH successfully but ARABIC is inserted as special characters in EXCEL file. Please help me out |
Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.
Go back and edit your earlier posts to include them too, please. As it stands, the long unbroken lines make my screen side-scroll. I'm not very familiar with perl, but as I understand it, in general there's really no foolproof way to programmatically determine what language a string was written in. You can only use tricks and statistical methods. Checking what font or encoding is used is one, as suggested earlier. Or write up a test to see if the string contains characters other than those found in standard English. Or if the file is in a unicode encoding, test the characters to see if they fall within the Arabic range. Also, a quick web search came up with this language detection plugin: http://search.cpan.org/~ambs/Lingua-...ua/Identify.pm Otherwise, I think you really need to explain what you are trying to do in more detail if you want to get more specific help. Provide some realistic examples of the input, the output you want, and any relevant code you have written up so far. |
All times are GMT -5. The time now is 04:25 AM. |