LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-02-2012, 08:21 PM   #1
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 141

Rep: Reputation: 16
win32 ole in deepr details in perl


Hello Gurus,
I am begginer in perl. I would like to ask several questions, some related to perl and its syntax but most will be regarding to WIN32 OLE. My main goal is to develop script that will check word document structure (return some information) and make some changes in this document (if it is possible). I am not shure if all that can be done throught perl and OLE. First I am sorry for posting things regarding microsoft OLE but mostly things that I have found was from this site, I also tried MSDN but seems very unclear and useless to me. Latest experimienting with OLE drives me crazy so I hope somebody could help.


1st
===
How to properly start win32 OLE ? I found somewhere this approach:

Code:
my $word = Win32::OLE->GetActiveObject('Word.Application')
    || Win32::OLE->new('Word.Application','Quit')
    or die Win32::OLE->LastError();
Seems clear, just few questions to enshure that I understand:
1. Can i replace or with || both stands for logical or ?
2. What is really going on in this code:
The perl tries to capture some of running instances of word application, if fails then starts it own instance and if this fails the error is printed ? What would happend if it captures some existing instance, and how to create this instance, it is just started word process ?
3. I read in OLE documentation that second argument is destructor (but seems that is not mandatory), what is really its purpose ? i know it is opposite of constructor but, there needs to be some method created with name 'Quit' or it is done automatically or what is going on ?
4. What does those "::" and "->" in code above stands for ? Is it some accessing of methods in some package or class or whatever ?
5. 'use warnings;' is the same as perl -w ?

2nd
===
Some codes that I've found on internet regarding OLE cointinues with following:

Code:
my $doc = $word->Documents->Open('C:\\Perl\\home\\001f.doc');
but then also found some weird structures, such as:
Code:
my $doc = $word->Documents->Open( { FileName => 'C:\\Perl\\home\\001f.doc', ReadOnly =>1 }) or die Win32::OLE->LastError();
SaveAs({FileName => 'exampletext.doc', FileFormat =>  wdFormatDocument,})
$doc->Close( { SaveChanges => $wdc->{wdDoNotSaveChanges} } );
First approach seems pretty clear, accessing method Open which is part of Document (class, package or whatever) but what does other do ?
What does mean {FileName => 'C:\\Perl\\home\\001f.doc', ReadOnly =>1} in fcunction arguments, why curly braces ?
What is the difference between '=>' and '->' ? Is closing properties (or what is correct name) in '{}' necessary ?

I have found that VBA has nearly simillar syntax 'ComputeStatistics(Statistic:=wdStatisticWords)' I am assuming this thing relates each other because in fact with OLE I am using microsoft technologies from perl. Also found solution which works with same functions/parameters as VBA (but without this strange assigment) here it is: http://www.perlmonks.org/?parent=334960;node_id=3333

3rd
===
On this site http://www.perlmonks.org/index.pl?node_id=422677 is mentioned that M$ word does not have information about number of pages contained in documet. So everytime when OLE is used the document is opened and page count is recalculated according to styles font size etc. Seems according M$ this is no problem http://support.microsoft.com/kb/185513
http://support.microsoft.com/kb/185509, or the page count is also recalculated during file openning ? Also found two solutions for perl which are working using wdPropertyPages - http://stackoverflow.com/questions/8...l-and-word-vba and using wdStatisticPages - http://www.perlmonks.org/?parent=334960;node_id=3333 So where is the truth is there infomration about page count or not.



4th
===
As I mentioned it seems that perl, VBA and PowerShell codes that I've found have several in common (I am not exper neither of those languages but they are acessing simillar variables). Following page describes how to obtain number of words and number of pages from document. As one of user suggested it can be obtained with '$selection->Words->{Count};' and '$selection->pagenumbers->{Count};' construction. However if I search word 'selection' in Object Browser of M$ Visual basic (from word document hit alt+F11 and F2) I found following:

Code:
Class Selection
    Member of Word
------------------------------------
	Property Words As Words
    read-only
    Member of Word.Selection

	
	Property Characters As Characters
    read-only
    Member of Word.Selection
==========================
Sub ShrinkDiscontiguousSelection()
    Member of Word.Selection
------------------------------------
Property Words As Words
    read-only
    Member of Word.Selection

Property Characters As Characters
    read-only
    Member of Word.Selection

As you can see both contains also variables (or properties or what is the correct name, please correct) for 'Characters' and 'Words' but seems that both 'Characters' and 'Words' are member of 'Word.Selection' how should I understand that? Also I tried to search for 'pagenumbers' as it was mentioned in above link but did not find anything except several 'wdPageNumberStyle' and 'PageNumbers' but not 'pagenumbers' (lowercase). Also I did not find in Object Browser that 'Word.Selection.Words' or 'Word.Selection.Characters' have 'Count' method (or property what is correct name) where this method (property) came from ? What does word 'as' means in above output it is some data type ?

Here I am posting mentioned code which I slightly altered

Code:
#!/usr/bin/perl
use Cwd 'abs_path';
use warnings;
use strict;
use Win32::OLE 'CP_UTF8';
$Win32::OLE::CP = CP_UTF8;
binmode STDOUT, 'encoding(utf8)';

print abs_path($0) . "\n";
print "=========\n";
my $document_name = 'C:\\Perl\\home\\thisIsPerl.doc';
my $word = Win32::OLE->GetActiveObject('Word.Application')
    || Win32::OLE->new('Word.Application')
    or die Win32::OLE->LastError();
	
$word-> {visible} = 0;
$word->Application->Selection;

my $document = $word->Documents->Open( { FileName => $document_name, ReadOnly =>1 }) or die Win32::OLE->LastError();
my $paragraphs = $document->Paragraphs ();
my $n_paragraphs = $paragraphs->Count ();

print "Words:", $word->Selection->Words->{Count}, "\n";
print "Characters:", $word->Selection->Characters->{Count}, "\n";
print "Paragraphs: ", $word->Selection->Paragraphs->{Count}, "\n";

$document->Close();
$word->exit;
$word->Quit;


Administrator@cepido /cygdrive/c/Perl/home
$ ./internet04_pgcnt.pl
/cygdrive/c/Perl/home/internet04_pgcnt.pl
=========
Words:1
Characters:1
Paragraphs: 1



but this code did not works perfectly. It always returns word count 1 no matter how many word are in document. Those investigations points me to another probably most important question, how are all those OLE objects organized ? The object browser is unclear to me, I also downloaded OLE/COM Object Viewer but bad luck also. I know this is not standard question to perl but I dont know where to ask. One idea which commes to mind is to list somehow all methods (properities variables packages) which are included in OLE throught perl, and then just try several of them according name, is this possible ?



5th
===
Is possible to process word document character by character ? Or even better is possible to query data from word like if it is SQL? Simply say select * from document where Font=Italic ?

I think that reading by words I have done here ()there are some little mistakes:

Code:
#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::OLE::Const 'Microsoft Word';
my $file = 'C:\\Perl\\home\\thisIsPerl.doc';

my $Word = Win32::OLE->new('Word.Application', 'Quit');

$Word->{'Visible'} = 0;
my $doc = $Word->Documents->Open($file);
my $paragraphs = $doc->Paragraphs() ;
my $n_paragraphs = $paragraphs->Count ();

for my $p (1..$n_paragraphs) {

	my $paragraph = $paragraphs->Item ($p);
    my $words = Win32::OLE::Enum->new( $paragraph->{Range}->{Words} );

    while ( defined ( my $word = $words->Next() ) ) {
        my $font = $word->{Font};
		print "IN_Text:", $word->{Text}, "\n" if $word->{Text} !~ /\r/;
		#print $text;
        #$font->{Bold} = 1 if $word->{Text} =~ /Perl/;
		
    }
	print "=============\n";
}

$Word->ActiveDocument->Close ;
$Word->exit;
$Word->Quit;

Works but throws some error at the end and did not proceed headers and footers






6th
===
I searched found following in Object Browser:
Const wdNumberOfPagesInDocument = 4
Member of Word.WdInformation

Const wdStatisticPages = 2
Member of Word.WdStatistic

What does mean thoe numbers ? I am shure they do not coresponds with actual number of word document pages (I was playing with code from which works http://www.perlmonks.org/?parent=334960;node_id=3333)






7th
===
Finally last question, I read somewhere that full path is necessary in OLE to open word document. I would like to pass document to procesing as an argumen to script but without needing specify full path (whole path should be appended to it after it will be passed to script) found somewhere that 'abs_path($0)' is using to doing someting similar but I had no luck. Also on Windows the slashes must be escaped and so on.




I am sorry for longer post but I am stuck at points that I've described, hope somebody knows answer. Thanks a lot for any idea
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ole programming in linux zali Programming 0 04-02-2007 01:22 PM
Just want a plain ole Router w/SUSE morph26 Linux - Security 1 05-18-2005 08:23 AM
Installing Perl DBI/ADO/Win32/OLE krzykard Linux - Software 0 11-19-2003 12:09 PM
The Good 'Ole Days rlheaton Slackware 3 07-24-2002 01:04 AM
an 'ole 386 M@tt Linux - Newbie 1 11-09-2001 01:33 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration