CMD line tool for PDF -> HTML or JPG/PNG/GIF
Is there such a command line tool? That can convert PDF files to JPEG/PNG/GIF or perhaps even HTML?
|
If the document is not secret, you could use Google to do it. Put it on a website (use an invisible link for obfuscation if needed, i.e. colour white), put the url in your forum sig, wait til the bot comes round.
Regards, Samsara |
I dont know of any on linux .. as yet... u can be the first to get them to work on wine....
Tell us how it goes... hmm!!! last i heard google re-cache's every 45 days or something.. and when the bot comes .. make sure u allow it to cache ur files by editing the robots.txt.... :p |
Some replies I stole off a mailing list, where the same question came up:
first reply> latest koffice can edit pdf's, not tried saving as doc though afterwards... original poster> Thanks, tried it. But kword makes a bit of a mess of the PDF. third reply> there's one breathtakingly horrible solution, but it might be the only practicable one: write a quick script which uses Ghostscript to turn the PDF into a sequence of PNGs, then outputs an RTF file with said PNGs embedded, one to a page. third reply> third reply> RTFs (according to the v1.5 spec) can also contain "Enhanced Metafiles" or EMFs, which seems to be some kind of Microsoft vector format. I think OOo Draw can import these, but I don't know if there's any sensible way to convert PDF to EMF. EPS, it seems, is not supported :-(. sadistic reply> Try reading "Digital Typography" by Donald Knuth. fourth reply> If the issue is getting from LaTeX to word, then using Hevea http://pauillac.inria.fr/~maranget/hevea/ to generate html as a common format is a workable, if not perfect solution, even for moderately complex documents, although some hand editing afterwards is likely. fifth> Check out http://www.scansoft.com/pdfconverter/ sixth> Sure: pdftotext uses the xpdf code to extract the text from a pdf document for instance. As I said originally, it's *potentially* hard, but most of the time, you can get the text out. sixth> sixth>But the originally poster wanted the full formatting, which is a bit more difficult & rtf doesn't really cut the mustard. There are other tools for other situations, like latex2html... |
lol... man i love this site!!!!!!!!!!!!
|
Hi! Thanks for the replies, but the above solutions are kind of inappropriate for my situation:
a) Google really won't to any good becuase the PDF documents will be under password in intranet. b) I tried the adress and it only offers the PDF -> MS .doc format... what I need is PDF -> JPEG/PNG/GIF or even better - HTML. I am currently trying to accomplish something with Imagemagick that converts image files to other image files... but I'd prefer something that converts PDF to HTML/XML or some other less bandwidth consuming form. |
All times are GMT -5. The time now is 06:50 AM. |