LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   PDF to Text Conversion (https://www.linuxquestions.org/questions/programming-9/pdf-to-text-conversion-68016/)

limnephilidae 06-25-2003 01:50 PM

PDF to Text Conversion
 
I am currently working on an open source project called GOOP. GOOP chews up documents and other text forms in order to create metadata. This metadata is compared and shared over a P2P network (via JXTA) and the GOOP application automatically performs data comparisons (via the metadata) with nodes that it encounters. PDF files present a challenge because I need some way to convert them to text so an access them with GOOP.


What I need: An open source script or binary to convert PDF files to text. I would love it if it was in Java but at this point I'll take anything.


Many thanks to any and all who can help.....

Lim.

goop.jxta.org

**Reposted from Linux Software section***

rshaw 06-25-2003 02:05 PM

http://www.foolabs.com/xpdf/about.html it includes a utility to extract .pdf to .txt

rshaw 06-25-2003 02:06 PM

never used it, your mileage may vary

GtkUser 06-25-2003 04:40 PM

As far as I know, PDF is not a format that should be convertable to text, because that would allow people to tamper with the original document. You can publish a PDF or PS document with OOo Writer, but thankfully you can't do anything about editing an existing PDF.

lackluster 06-26-2003 03:24 PM

... unless you have the adobe writer ... even then poorly made PDFs (ie - scanned) won't be editable.

gare 01-03-2012 08:22 AM

solution here - pdftotext
 
http://www.cyberciti.biz/faq/convert...ormat-command/ has a walk-through on using pdftotext on linux.


All times are GMT -5. The time now is 03:10 PM.