LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   converting a doc/ppt Windows files in html files under linux env (http://www.linuxquestions.org/questions/programming-9/converting-a-doc-ppt-windows-files-in-html-files-under-linux-env-418316/)

agrosu 02-22-2006 05:21 AM

converting a doc/ppt Windows files in html files under linux env
 
Hello everybody,
I have to deal on converting the Windows Word documents and ppt documents in html docs. This has to be done using Linux environment and C++ as language programing and the target will be a standalone application, which may use some other libraries. I have already done the conversion from pdf to html using xpdf, which provides the pdf structure. Yesterday, I spent all day long searching for an idea, and came across OpenOfice, which it appears that it could give me the structures from a doc/ppt file, but for this the OpenOffice server must run... so, no more standalone app. Can someone please point me to some documents to read about this?
Thank you so much and have a good day !

jlliagre 02-22-2006 07:51 AM

OpenOffice being a standalone application more than a server, why not leveraging on it and use it for batch processing ?

See:
http://www.xml.com/pub/a/2006/01/11/...penoffice.html
http://www.indesko.com/en/downloads/ooo2dbk

agrosu 02-23-2006 03:28 AM

Quote:

Originally Posted by jlliagre
OpenOffice being a standalone application more than a server, why not leveraging on it and use it for batch processing ?

Thanx for responding jlliagre. I wanted anyway to see how OO is dealing with converting, so I took your advice and I used OO basic (which seems so similar with VB) for converting some doc files into html files, applying a HTML (StarOffice) filter. It will not work in my case simply because of the time necessarly for doing the conversion. It took too much for OO to convert a doc into a html (for a 2 m file size it took around 5-6 seconds).
I also found wvware application which seems to do a pretty good job, and for the time part too. Maybe i can do some hacks into this application to reach my target. But anyway, this will resolve only doc part. It will still remain the ppt part. Anybody can help me with this please?

jlliagre 02-23-2006 04:53 AM

OO is handling ppt too.

agrosu 02-23-2006 05:29 AM

Quote:

Originally Posted by jlliagre
OO is handling ppt too.

Yep, I'm aware of this, but as I said, the OO can't be used because of the big amount of time for converting. for the same document (around 2 mega), for OO it took 5.732 seconds, but for wvware it toog 1.62 seconds.
I reached to ppthtml (which is part from xlhtml). This does the conversion from ppt to html, but a poor one. I'll keep researching....

jlliagre 02-23-2006 06:53 AM

5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man ! ;)

Moreover, you didn't try to convert the ppt, did you ?

agrosu 02-23-2006 10:09 AM

Quote:

Originally Posted by jlliagre
5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man ! ;)

Moreover, you didn't try to convert the ppt, did you ?

Well, what can I say.... it looks a lot of time to me 5.7 secs when I see that wvware does the same thing in 1.6 secs :). And yes, I tried the conversion from a ppt, and it worked well, but ppt documents are smaller than word documents. And I really have to think of these two aspects: size of the file and the time spent for converting it, because I have to convert a loooooot of documents, like 10.000 or more per day. And as we all know, the majority of the documents are big as size, right?... because they are documents after all :)
Also, I found some other applications that converts the ms docs into either text or html. These are antiword, catdoc and word2x. Maybe I could inspire from them .... :scratch:

jlliagre 02-23-2006 03:07 PM

Well, if performance is an issue, use a faster machine !


All times are GMT -5. The time now is 04:02 AM.