LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   CMD line tool for PDF -> HTML or JPG/PNG/GIF (https://www.linuxquestions.org/questions/linux-software-2/cmd-line-tool-for-pdf-html-or-jpg-png-gif-246129/)

ilhbutshm 10-22-2004 05:29 PM

CMD line tool for PDF -> HTML or JPG/PNG/GIF
 
Is there such a command line tool? That can convert PDF files to JPEG/PNG/GIF or perhaps even HTML?

Samsara 10-22-2004 06:25 PM

If the document is not secret, you could use Google to do it. Put it on a website (use an invisible link for obfuscation if needed, i.e. colour white), put the url in your forum sig, wait til the bot comes round.

Regards,

Samsara

UsualTuxpect 10-22-2004 07:53 PM

I dont know of any on linux .. as yet... u can be the first to get them to work on wine....
Tell us how it goes...


hmm!!! last i heard google re-cache's every 45 days or something.. and when the bot comes .. make sure u allow it to cache ur files by editing the robots.txt.... :p

Samsara 10-22-2004 08:50 PM

Some replies I stole off a mailing list, where the same question came up:

first reply> latest koffice can edit pdf's, not tried saving as doc though afterwards...

original poster> Thanks, tried it. But kword makes a bit of a mess of the PDF.

third reply> there's one breathtakingly horrible solution, but it might be the only practicable
one: write a quick script which uses Ghostscript to turn the PDF into a sequence of PNGs, then outputs an RTF file with said PNGs embedded, one to a page.
third reply>
third reply> RTFs (according to the v1.5 spec) can also contain "Enhanced Metafiles" or EMFs, which seems to be some kind of Microsoft vector format. I think OOo Draw can import these, but I don't know if there's any sensible way to convert PDF to EMF. EPS, it seems, is not supported :-(.

sadistic reply> Try reading "Digital Typography" by Donald Knuth.

fourth reply> If the issue is getting from LaTeX to word, then using Hevea
http://pauillac.inria.fr/~maranget/hevea/ to generate html as a common
format is a workable, if not perfect solution, even for moderately
complex documents, although some hand editing afterwards is likely.

fifth> Check out http://www.scansoft.com/pdfconverter/

sixth> Sure: pdftotext uses the xpdf code to extract the text from a pdf
document for instance. As I said originally, it's *potentially* hard,
but most of the time, you can get the text out.
sixth>
sixth>But the originally poster wanted the full formatting, which is a bit
more difficult & rtf doesn't really cut the mustard.

There are other tools for other situations, like latex2html...

UsualTuxpect 10-22-2004 09:09 PM

lol... man i love this site!!!!!!!!!!!!

ilhbutshm 10-23-2004 04:18 AM

Hi! Thanks for the replies, but the above solutions are kind of inappropriate for my situation:

a) Google really won't to any good becuase the PDF documents will be under password in intranet.
b) I tried the adress and it only offers the PDF -> MS .doc format... what I need is PDF -> JPEG/PNG/GIF or even better - HTML.

I am currently trying to accomplish something with Imagemagick that converts image files to other image files... but I'd prefer something that converts PDF to HTML/XML or some other less bandwidth consuming form.


All times are GMT -5. The time now is 06:50 AM.