Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
02-22-2006, 05:21 AM
|
#1
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Rep:
|
converting a doc/ppt Windows files in html files under linux env
Hello everybody,
I have to deal on converting the Windows Word documents and ppt documents in html docs. This has to be done using Linux environment and C++ as language programing and the target will be a standalone application, which may use some other libraries. I have already done the conversion from pdf to html using xpdf, which provides the pdf structure. Yesterday, I spent all day long searching for an idea, and came across OpenOfice, which it appears that it could give me the structures from a doc/ppt file, but for this the OpenOffice server must run... so, no more standalone app. Can someone please point me to some documents to read about this?
Thank you so much and have a good day !
|
|
|
02-23-2006, 03:28 AM
|
#3
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Original Poster
Rep:
|
Quote:
Originally Posted by jlliagre
OpenOffice being a standalone application more than a server, why not leveraging on it and use it for batch processing ?
|
Thanx for responding jlliagre. I wanted anyway to see how OO is dealing with converting, so I took your advice and I used OO basic (which seems so similar with VB) for converting some doc files into html files, applying a HTML (StarOffice) filter. It will not work in my case simply because of the time necessarly for doing the conversion. It took too much for OO to convert a doc into a html (for a 2 m file size it took around 5-6 seconds).
I also found wvware application which seems to do a pretty good job, and for the time part too. Maybe i can do some hacks into this application to reach my target. But anyway, this will resolve only doc part. It will still remain the ppt part. Anybody can help me with this please?
Last edited by agrosu; 02-23-2006 at 03:29 AM.
|
|
|
02-23-2006, 04:53 AM
|
#4
|
Moderator
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789
|
OO is handling ppt too.
|
|
|
02-23-2006, 05:29 AM
|
#5
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Original Poster
Rep:
|
Quote:
Originally Posted by jlliagre
OO is handling ppt too.
|
Yep, I'm aware of this, but as I said, the OO can't be used because of the big amount of time for converting. for the same document (around 2 mega), for OO it took 5.732 seconds, but for wvware it toog 1.62 seconds.
I reached to ppthtml (which is part from xlhtml). This does the conversion from ppt to html, but a poor one. I'll keep researching....
|
|
|
02-23-2006, 06:53 AM
|
#6
|
Moderator
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789
|
5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man !
Moreover, you didn't try to convert the ppt, did you ?
|
|
|
02-23-2006, 10:09 AM
|
#7
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Original Poster
Rep:
|
Quote:
Originally Posted by jlliagre
5.7 seconds doesn't looks to me a "big amount of time", you are an impatient (wo)man !
Moreover, you didn't try to convert the ppt, did you ?
|
Well, what can I say.... it looks a lot of time to me 5.7 secs when I see that wvware does the same thing in 1.6 secs . And yes, I tried the conversion from a ppt, and it worked well, but ppt documents are smaller than word documents. And I really have to think of these two aspects: size of the file and the time spent for converting it, because I have to convert a loooooot of documents, like 10.000 or more per day. And as we all know, the majority of the documents are big as size, right?... because they are documents after all
Also, I found some other applications that converts the ms docs into either text or html. These are antiword, catdoc and word2x. Maybe I could inspire from them ....
|
|
|
02-23-2006, 03:07 PM
|
#8
|
Moderator
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789
|
Well, if performance is an issue, use a faster machine !
|
|
|
All times are GMT -5. The time now is 05:32 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|