LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-06-2009, 12:01 PM   #1
aarav2306
Member
 
Registered: Jan 2009
Posts: 55

Rep: Reputation: 15
PDF to ODF/Word converters


hi
I have some scanned PDF files, I wish to convert them to Word/ODf. Any suggestions, pls help. Tried PDFtotext, not working.
 
Old 02-06-2009, 01:04 PM   #2
rtspitz
Member
 
Registered: Jan 2005
Location: germany
Distribution: suse, opensuse, debian, others for testing
Posts: 307

Rep: Reputation: 33
Isn't a PDF document _the_ cross platform format for sharing documents ? Free viewers, looks the same everywhere. What's so bad about a PDF ?

I regularly curse people for sending me .docx files.

Last edited by rtspitz; 02-06-2009 at 01:06 PM.
 
Old 02-06-2009, 01:29 PM   #3
bmsiller
LQ Newbie
 
Registered: Mar 2006
Posts: 5

Rep: Reputation: 0
If you're looking to open a pdf for editing, you can use the OpenOffice PDF Import Extension to open your PDF in OpenOffice Draw (and save it as an odg). I don't know of any direct way to convert from odg to odf, however.
 
Old 02-06-2009, 02:15 PM   #4
TITiAN
Member
 
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392

Rep: Reputation: 49
the PDF and PS viewer evince (part of gnome) allows copying text
 
Old 02-06-2009, 02:19 PM   #5
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,247
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
if none of that works convert the document to a tiff, and OCR it with tesseract.

pdf2tiff
http://python.net/~gherman/pdf2tiff.html

tesseract
http://code.google.com/p/tesseract-ocr/

At least you'll get teh text that way, but probably without the formatting.
 
Old 02-13-2009, 05:42 AM   #6
aarav2306
Member
 
Registered: Jan 2009
Posts: 55

Original Poster
Rep: Reputation: 15
Hi,
Sorry was away for a week.
Tried Tesseract, downloaded and unzipped and extracted the contents to my home folder and when I try the command
$ sh ./configure
I get the following result
Quote:
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking for cl.exe... no
checking for g++... no
checking for C++ compiler default output file name... configure: error: C++ compiler cannot create executables
See `config.log' for more details.
The config.log is empty when I try editing with a Text Editor
Should I have tried to download tesseract-2.01.tar.gz instead of 2.03.

I had downloaded Evince using Synaptic Package Manager, but how do I start using Evince. When I try to open a PDF document using programs other than Document Viewer, it doesnt list Evince as an option and I am not able to locate Evince using the Applications menu.
 
Old 02-13-2009, 07:41 AM   #7
TITiAN
Member
 
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392

Rep: Reputation: 49
aarav: what distro do you use? for debian, you can install it with apt (according to this package search:deb search.
 
Old 02-13-2009, 08:15 AM   #8
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,247
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
When you go to open a document, you can right click and say "open with other application", if evince is not in the list, then select "use a custom command", then browse to the evince executable.

Code:
user@it-lenny:~$ which evince
/usr/bin/evince
after you open one pdf file with evince it should appear in the right click open with menu for future use.
 
Old 02-15-2009, 04:43 AM   #9
aarav2306
Member
 
Registered: Jan 2009
Posts: 55

Original Poster
Rep: Reputation: 15
Hi
I am using Ubuntu 8.10, was able to download Tesseract using Synaptic, thanks and am able to extract the text
Tried evince, not working with scanned images saved as pdf, works well with other pdf files. Was that the reason why pdftotext didnt work as well.
 
Old 02-15-2009, 09:02 AM   #10
TITiAN
Member
 
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392

Rep: Reputation: 49
there is also gocr (should also be available via apt)
could you try if it works better?
 
Old 02-15-2009, 12:48 PM   #11
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,247
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
Tesseract actually does a better job at OCR than GOCR does.. but tesseract requires the original doc to be a tiff file.

A scanned doc converted to PDF would be an image file. so extracting text from it would be pretty much impossible other than via a OCR program.
PDF files created through other means can be indexed and searched as they actually contain text.. so yes i would say that was part of your problem.

Glad you got it all working !
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Norway mandates use of open formats: HTML, ODF and PDF LXer Syndicated Linux News 0 12-21-2007 02:30 AM
PDF to MS Word conversion sshatz Linux - Software 5 10-25-2007 04:58 PM
LXer: Spread the word, share this ODF artwork. LXer Syndicated Linux News 0 07-26-2007 01:16 PM
LXer: Norway Moves Towards Mandatory use of ODF and PDF LXer Syndicated Linux News 0 05-14-2007 03:46 PM
tools transforming word to pdf? sunzen.w Linux - Software 6 05-30-2003 01:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration