LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-11-2009, 02:20 AM   #1
ramakrishnankt
Member
 
Registered: Nov 2009
Location: India
Posts: 32

Rep: Reputation: 15
How to convert word or pdf to html


Hi,
I want to use an utility to convert word,pdf to html with same formatting.
that utility able to run in commandline also. i want to integrate it in web page. so which utility is suitable for it
 
Old 11-11-2009, 02:58 AM   #2
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,979

Rep: Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850Reputation: 850
Hello ramakrishnankt and welcome to LQ,

this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.

With the worddocument it may help to save this file as a textfile and convert this into html. But be aware that all formatting will be lost.

Markus
 
Old 11-11-2009, 03:37 AM   #3
vijaush
LQ Newbie
 
Registered: Feb 2007
Posts: 8

Rep: Reputation: 0
try using pdftohtml app.
 
Old 11-11-2009, 03:38 AM   #4
r3sistance
Senior Member
 
Registered: Mar 2004
Location: UK
Distribution: CentOS 5.4, Mac OS 10.4 (tiger)
Posts: 1,005

Rep: Reputation: 79
Quote:
Originally Posted by markush View Post
this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.
I don't buy that one at all, while you'd never get an exact copy. You can atleast automate the text section. The reason I say that is that google automatically converts .PDFss it finds into HTML. So I decided to take a quick look and found that adobe have a tool for this facility. I personally am not a great fan of .PDFs so I do not know how good this is.

It seems it's designed more as an HTML viewer then as a standard convertor however.
 
Old 11-12-2009, 12:16 AM   #5
ramakrishnankt
Member
 
Registered: Nov 2009
Location: India
Posts: 32

Original Poster
Rep: Reputation: 15
How to convert pdf,doc to html

Quote:
Originally Posted by vijaush View Post
try using pdftohtml app.
how to get it
 
Old 11-12-2009, 05:42 AM   #6
vijaush
LQ Newbie
 
Registered: Feb 2007
Posts: 8

Rep: Reputation: 0
Just have a look at these links, they do what you want:

http://thelinuxsociety.org.uk/conten...rt-pdf-to-html


and http://www.ubuntugeek.com/howto-conv...tml-files.html
 
Old 11-12-2009, 06:10 AM   #7
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670
pdftohtml is supplied by the poppler-tools package. The results won't preserve the formatting but you will preserve the links. The sections will have generic headers. From what I've seen on the web, a page from a pdf would be converted to a graphic image. It would be better to use the source of a pdf (e.g. texi, LaTeX, xml, TeX) and produce an output format from the appropriate tool, or to post links to pdf document instead.
 
Old 11-12-2009, 08:12 AM   #8
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285Reputation: 1285
For .doc to .html use:
http://wvware.sourceforge.net/
 
Old 11-12-2009, 07:15 PM   #9
craigevil
Senior Member
 
Registered: Apr 2005
Location: OZ
Distribution: Debian Sid
Posts: 4,734
Blog Entries: 12

Rep: Reputation: 461Reputation: 461Reputation: 461Reputation: 461Reputation: 461
poppler-utils and/or xpdf-utils

You can also try http://www.pdfdownload.org/free-pdf-to-html.aspx nice Firefox extension that works pretty well.
http://www.adobe.com/products/acroba...linetools.html
Convert pdf to html - http://www.convertpdftohtml.net/

Actually Google returns quite a few online conversion sites.
http://tinyurl.com/ykhntus
 
Old 11-13-2009, 04:06 AM   #10
ramakrishnankt
Member
 
Registered: Nov 2009
Location: India
Posts: 32

Original Poster
Rep: Reputation: 15
How to convert pdf to html

Thanks for all
how to use wv Utilities
And any utility for converting doc to html

Last edited by ramakrishnankt; 11-13-2009 at 04:09 AM. Reason: Adding content
 
Old 06-05-2012, 04:18 AM   #11
sandyago
LQ Newbie
 
Registered: Feb 2012
Posts: 2

Rep: Reputation: Disabled
Well I totally disagree with the saying that pdf format can not be converted to html page, becase I have tried it before and it works very well. what you need is a pdf to html converter, here are some instructions about how to convert pdf to html, hope it can help you solve your problem.

but about how to convert word to html,I never use that kind of tools before, but there must be one can solve the problem, just google your question to find one.
 
Old 06-05-2012, 05:05 AM   #12
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,124

Rep: Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271
MS Word can itself save files in html format, so using macros you can open a word document and save it as html. OpenOffice can do it also
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Word Document to PDF liguorir Linux - Software 12 05-14-2013 12:38 PM
How to convert a PDF to DOC (word compatible) ? Xeratul Linux - General 9 02-06-2007 08:29 PM
convert html or excell to pdf in linux cheo21 Linux - Server 2 01-22-2007 12:07 PM
Convert pdf to html or txt or remaster the pdf? jago25_98 Linux - Software 1 12-13-2005 02:11 AM
Is there an app which can convert pdf to, e.g., html? kornerr Linux - General 2 08-29-2005 12:59 PM


All times are GMT -5. The time now is 02:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration