LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   to identify the language of string in perl? (https://www.linuxquestions.org/questions/linux-newbie-8/to-identify-the-language-of-string-in-perl-940787/)

vijay mishra 04-20-2012 02:06 AM

to identify the language of string in perl?
 
hi sir/hello friends
i am stuck at a point in perl.
i coded to read a excel file which consist of English and Arabic strings in perl.

my task is to read each string from file and get the language type(English/Arabic).
========read each cell from excel file==========================

my $workbook = Spreadsheet::WriteExcel->new("test.xls");
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {

my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;

print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value(),
}
==================================================================
how to get the language type of string stored in $cell->value().

help me out

pan64 04-20-2012 02:11 AM

how the excel file stored? is this an xls, or ??
can you give us a small example?

vijay mishra 04-20-2012 05:48 AM

it is in .xls formate.
data stored in file is like this-
سبب التاععع 57675 انتلب لبال ععع
شلسا لشت allah ishwar is great pray him

pan64 04-20-2012 06:04 AM

see $cell->encoding(); and $cell->get_rich_text(); this one will give you some information about the fonts used

vijay mishra 04-23-2012 06:41 AM

hello sir/hi friends

I am trying to write a script in perl which reads a .xls (EXCEL) file and writes to other .xls(EXCEL) file in UNIX.

Entries in the file to be read are in ARABIC and ENGLISH.

My script is writing ENGLISH successfully but ARABIC is inserted as special characters in EXCEL file.

Please help me out

David the H. 04-23-2012 09:40 AM

Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.

Go back and edit your earlier posts to include them too, please. As it stands, the long unbroken lines make my screen side-scroll.


I'm not very familiar with perl, but as I understand it, in general there's really no foolproof way to programmatically determine what language a string was written in. You can only use tricks and statistical methods. Checking what font or encoding is used is one, as suggested earlier. Or write up a test to see if the string contains characters other than those found in standard English. Or if the file is in a unicode encoding, test the characters to see if they fall within the Arabic range.


Also, a quick web search came up with this language detection plugin:
http://search.cpan.org/~ambs/Lingua-...ua/Identify.pm


Otherwise, I think you really need to explain what you are trying to do in more detail if you want to get more specific help. Provide some realistic examples of the input, the output you want, and any relevant code you have written up so far.


All times are GMT -5. The time now is 04:25 AM.