-   Programming (
-   -   PDF to Text Conversion (

limnephilidae 06-25-2003 01:50 PM

PDF to Text Conversion
I am currently working on an open source project called GOOP. GOOP chews up documents and other text forms in order to create metadata. This metadata is compared and shared over a P2P network (via JXTA) and the GOOP application automatically performs data comparisons (via the metadata) with nodes that it encounters. PDF files present a challenge because I need some way to convert them to text so an access them with GOOP.

What I need: An open source script or binary to convert PDF files to text. I would love it if it was in Java but at this point I'll take anything.

Many thanks to any and all who can help.....


**Reposted from Linux Software section***

rshaw 06-25-2003 02:05 PM it includes a utility to extract .pdf to .txt

rshaw 06-25-2003 02:06 PM

never used it, your mileage may vary

GtkUser 06-25-2003 04:40 PM

As far as I know, PDF is not a format that should be convertable to text, because that would allow people to tamper with the original document. You can publish a PDF or PS document with OOo Writer, but thankfully you can't do anything about editing an existing PDF.

lackluster 06-26-2003 03:24 PM

... unless you have the adobe writer ... even then poorly made PDFs (ie - scanned) won't be editable.

gare 01-03-2012 08:22 AM

solution here - pdftotext has a walk-through on using pdftotext on linux.

All times are GMT -5. The time now is 03:10 PM.