Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
11-11-2009, 02:20 AM
|
#1
|
Member
Registered: Nov 2009
Location: India
Posts: 32
Rep:
|
How to convert word or pdf to html
Hi,
I want to use an utility to convert word,pdf to html with same formatting.
that utility able to run in commandline also. i want to integrate it in web page. so which utility is suitable for it
|
|
|
11-11-2009, 02:58 AM
|
#2
|
Senior Member
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,979
Rep:
|
Hello ramakrishnankt and welcome to LQ,
this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.
With the worddocument it may help to save this file as a textfile and convert this into html. But be aware that all formatting will be lost.
Markus
|
|
|
11-11-2009, 03:37 AM
|
#3
|
LQ Newbie
Registered: Feb 2007
Posts: 8
Rep:
|
try using pdftohtml app.
|
|
|
11-11-2009, 03:38 AM
|
#4
|
Senior Member
Registered: Mar 2004
Location: UK
Distribution: CentOS 6/7
Posts: 1,375
|
Quote:
Originally Posted by markush
this will not work. PDF is a binary format from which it is not possible to extract a html-file. The only way I see to do this would be to copy the text from the PDF-file paragraph by paragraph into a html-file. If you have a more elaborate formatting such as tables you'll have to get your hands dirty.
|
I don't buy that one at all, while you'd never get an exact copy. You can atleast automate the text section. The reason I say that is that google automatically converts .PDFss it finds into HTML. So I decided to take a quick look and found that adobe have a tool for this facility. I personally am not a great fan of .PDFs so I do not know how good this is.
It seems it's designed more as an HTML viewer then as a standard convertor however.
|
|
|
11-12-2009, 12:16 AM
|
#5
|
Member
Registered: Nov 2009
Location: India
Posts: 32
Original Poster
Rep:
|
How to convert pdf,doc to html
Quote:
Originally Posted by vijaush
try using pdftohtml app.
|
how to get it
|
|
|
11-12-2009, 05:42 AM
|
#6
|
LQ Newbie
Registered: Feb 2007
Posts: 8
Rep:
|
|
|
|
11-12-2009, 06:10 AM
|
#7
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
pdftohtml is supplied by the poppler-tools package. The results won't preserve the formatting but you will preserve the links. The sections will have generic headers. From what I've seen on the web, a page from a pdf would be converted to a graphic image. It would be better to use the source of a pdf (e.g. texi, LaTeX, xml, TeX) and produce an output format from the appropriate tool, or to post links to pdf document instead.
|
|
|
11-12-2009, 08:12 AM
|
#8
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
|
|
|
11-13-2009, 04:06 AM
|
#10
|
Member
Registered: Nov 2009
Location: India
Posts: 32
Original Poster
Rep:
|
How to convert pdf to html
Thanks for all
how to use wv Utilities
And any utility for converting doc to html
Last edited by ramakrishnankt; 11-13-2009 at 04:09 AM.
Reason: Adding content
|
|
|
06-05-2012, 04:18 AM
|
#11
|
LQ Newbie
Registered: Feb 2012
Posts: 2
Rep:
|
Well I totally disagree with the saying that pdf format can not be converted to html page, becase I have tried it before and it works very well. what you need is a pdf to html converter, here are some instructions about how to convert pdf to html, hope it can help you solve your problem.
but about how to convert word to html,I never use that kind of tools before, but there must be one can solve the problem, just google your question to find one.
|
|
|
06-05-2012, 05:05 AM
|
#12
|
LQ Addict
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 23,491
|
MS Word can itself save files in html format, so using macros you can open a word document and save it as html. OpenOffice can do it also
|
|
|
All times are GMT -5. The time now is 02:49 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|