LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 02-22-2006, 06:21 AM   #1
agrosu
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Rep: Reputation: 0
converting a doc/ppt Windows files in html files under linux env


Hello everybody,
I have to deal on converting the Windows Word documents and ppt documents in html docs. This has to be done using Linux environment and C++ as language programing and the target will be a standalone application, which may use some other libraries. I have already done the conversion from pdf to html using xpdf, which provides the pdf structure. Yesterday, I spent all day long searching for an idea, and came across OpenOfice, which it appears that it could give me the structures from a doc/ppt file, but for this the OpenOffice server must run... so, no more standalone app. Can someone please point me to some documents to read about this?
Thank you so much and have a good day !
 
Old 02-22-2006, 08:51 AM   #2
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,523

Rep: Reputation: 365Reputation: 365Reputation: 365Reputation: 365
OpenOffice being a standalone application more than a server, why not leveraging on it and use it for batch processing ?

See:
http://www.xml.com/pub/a/2006/01/11/...penoffice.html
http://www.indesko.com/en/downloads/ooo2dbk
 
Old 02-23-2006, 04:28 AM   #3
agrosu
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jlliagre
OpenOffice being a standalone application more than a server, why not leveraging on it and use it for batch processing ?
Thanx for responding jlliagre. I wanted anyway to see how OO is dealing with converting, so I took your advice and I used OO basic (which seems so similar with VB) for converting some doc files into html files, applying a HTML (StarOffice) filter. It will not work in my case simply because of the time necessarly for doing the conversion. It took too much for OO to convert a doc into a html (for a 2 m file size it took around 5-6 seconds).
I also found wvware application which seems to do a pretty good job, and for the time part too. Maybe i can do some hacks into this application to reach my target. But anyway, this will resolve only doc part. It will still remain the ppt part. Anybody can help me with this please?

Last edited by agrosu; 02-23-2006 at 04:29 AM.
 
Old 02-23-2006, 05:53 AM   #4
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,523

Rep: Reputation: 365Reputation: 365Reputation: 365Reputation: 365
OO is handling ppt too.
 
Old 02-23-2006, 06:29 AM   #5
agrosu
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jlliagre
OO is handling ppt too.
Yep, I'm aware of this, but as I said, the OO can't be used because of the big amount of time for converting. for the same document (around 2 mega), for OO it took 5.732 seconds, but for wvware it toog 1.62 seconds.
I reached to ppthtml (which is part from xlhtml). This does the conversion from ppt to html, but a poor one. I'll keep researching....
 
Old 02-23-2006, 07:53 AM   #6
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,523

Rep: Reputation: 365Reputation: 365Reputation: 365Reputation: 365
5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man !

Moreover, you didn't try to convert the ppt, did you ?
 
Old 02-23-2006, 11:09 AM   #7
agrosu
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jlliagre
5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man !

Moreover, you didn't try to convert the ppt, did you ?
Well, what can I say.... it looks a lot of time to me 5.7 secs when I see that wvware does the same thing in 1.6 secs . And yes, I tried the conversion from a ppt, and it worked well, but ppt documents are smaller than word documents. And I really have to think of these two aspects: size of the file and the time spent for converting it, because I have to convert a loooooot of documents, like 10.000 or more per day. And as we all know, the majority of the documents are big as size, right?... because they are documents after all
Also, I found some other applications that converts the ms docs into either text or html. These are antiword, catdoc and word2x. Maybe I could inspire from them ....
 
Old 02-23-2006, 04:07 PM   #8
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris10, Solaris 11, Mint, OL
Posts: 9,523

Rep: Reputation: 365Reputation: 365Reputation: 365Reputation: 365
Well, if performance is an issue, use a faster machine !
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
converting DOC to HTML using PHP Xing Programming 3 02-17-2009 07:32 AM
Converting *.doc or *.rtf to PDF files Paulo Ges Programming 7 12-08-2006 10:07 AM
For people how want to play WMA files without converting MP3 files waelaltaqi Linux - Software 15 11-17-2005 09:30 AM
Converting html files to pdf saurya_s Linux - Software 1 01-12-2004 07:49 AM
Converting perl files to executable mac files mrozkan Programming 0 04-16-2002 10:56 AM


All times are GMT -5. The time now is 03:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration