LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to convert word or pdf to html (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-convert-word-or-pdf-to-html-768337/)

ramakrishnankt 11-11-2009 01:20 AM

How to convert word or pdf to html
 
Hi,
I want to use an utility to convert word,pdf to html with same formatting.
that utility able to run in commandline also. i want to integrate it in web page. so which utility is suitable for it

markush 11-11-2009 01:58 AM

Hello ramakrishnankt and welcome to LQ,

this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.

With the worddocument it may help to save this file as a textfile and convert this into html. But be aware that all formatting will be lost.

Markus

vijaush 11-11-2009 02:37 AM

try using pdftohtml app.

r3sistance 11-11-2009 02:38 AM

Quote:

Originally Posted by markush (Post 3752580)
this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.

I don't buy that one at all, while you'd never get an exact copy. You can atleast automate the text section. The reason I say that is that google automatically converts .PDFss it finds into HTML. So I decided to take a quick look and found that adobe have a tool for this facility. I personally am not a great fan of .PDFs so I do not know how good this is.

It seems it's designed more as an HTML viewer then as a standard convertor however.

ramakrishnankt 11-11-2009 11:16 PM

How to convert pdf,doc to html
 
Quote:

Originally Posted by vijaush (Post 3752605)
try using pdftohtml app.

how to get it

vijaush 11-12-2009 04:42 AM

Just have a look at these links, they do what you want:

http://thelinuxsociety.org.uk/conten...rt-pdf-to-html


and http://www.ubuntugeek.com/howto-conv...tml-files.html

jschiwal 11-12-2009 05:10 AM

pdftohtml is supplied by the poppler-tools package. The results won't preserve the formatting but you will preserve the links. The sections will have generic headers. From what I've seen on the web, a page from a pdf would be converted to a graphic image. It would be better to use the source of a pdf (e.g. texi, LaTeX, xml, TeX) and produce an output format from the appropriate tool, or to post links to pdf document instead.

H_TeXMeX_H 11-12-2009 07:12 AM

For .doc to .html use:
http://wvware.sourceforge.net/

craigevil 11-12-2009 06:15 PM

poppler-utils and/or xpdf-utils

You can also try http://www.pdfdownload.org/free-pdf-to-html.aspx nice Firefox extension that works pretty well.
http://www.adobe.com/products/acroba...linetools.html
Convert pdf to html - http://www.convertpdftohtml.net/

Actually Google returns quite a few online conversion sites.
http://tinyurl.com/ykhntus

ramakrishnankt 11-13-2009 03:06 AM

How to convert pdf to html
 
Thanks for all
how to use wv Utilities
And any utility for converting doc to html

sandyago 06-05-2012 03:18 AM

Well I totally disagree with the saying that pdf format can not be converted to html page, becase I have tried it before and it works very well. what you need is a pdf to html converter, here are some instructions about how to convert pdf to html, hope it can help you solve your problem.

but about how to convert word to html,I never use that kind of tools before, but there must be one can solve the problem, just google your question to find one.

pan64 06-05-2012 04:05 AM

MS Word can itself save files in html format, so using macros you can open a word document and save it as html. OpenOffice can do it also


All times are GMT -5. The time now is 01:12 AM.