Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
02-06-2009, 12:01 PM
|
#1
|
Member
Registered: Jan 2009
Posts: 55
Rep:
|
PDF to ODF/Word converters
hi
I have some scanned PDF files, I wish to convert them to Word/ODf. Any suggestions, pls help. Tried PDFtotext, not working.
|
|
|
02-06-2009, 01:04 PM
|
#2
|
Member
Registered: Jan 2005
Location: germany
Distribution: suse, opensuse, debian, others for testing
Posts: 307
Rep:
|
Isn't a PDF document _the_ cross platform format for sharing documents ? Free viewers, looks the same everywhere. What's so bad about a PDF ?
I regularly curse people for sending me .docx files.
Last edited by rtspitz; 02-06-2009 at 01:06 PM.
|
|
|
02-06-2009, 01:29 PM
|
#3
|
LQ Newbie
Registered: Mar 2006
Posts: 5
Rep:
|
If you're looking to open a pdf for editing, you can use the OpenOffice PDF Import Extension to open your PDF in OpenOffice Draw (and save it as an odg). I don't know of any direct way to convert from odg to odf, however.
|
|
|
02-06-2009, 02:15 PM
|
#4
|
Member
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392
Rep:
|
the PDF and PS viewer evince (part of gnome) allows copying text
|
|
|
02-13-2009, 05:42 AM
|
#6
|
Member
Registered: Jan 2009
Posts: 55
Original Poster
Rep:
|
Hi,
Sorry was away for a week.
Tried Tesseract, downloaded and unzipped and extracted the contents to my home folder and when I try the command
$ sh ./configure
I get the following result
Quote:
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking for cl.exe... no
checking for g++... no
checking for C++ compiler default output file name... configure: error: C++ compiler cannot create executables
See `config.log' for more details.
|
The config.log is empty when I try editing with a Text Editor
Should I have tried to download tesseract-2.01.tar.gz instead of 2.03.
I had downloaded Evince using Synaptic Package Manager, but how do I start using Evince. When I try to open a PDF document using programs other than Document Viewer, it doesnt list Evince as an option and I am not able to locate Evince using the Applications menu.
|
|
|
02-13-2009, 07:41 AM
|
#7
|
Member
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392
Rep:
|
aarav: what distro do you use? for debian, you can install it with apt (according to this package search: deb search.
|
|
|
02-13-2009, 08:15 AM
|
#8
|
LQ Guru
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Rep: 
|
When you go to open a document, you can right click and say "open with other application", if evince is not in the list, then select "use a custom command", then browse to the evince executable.
Code:
user@it-lenny:~$ which evince
/usr/bin/evince
after you open one pdf file with evince it should appear in the right click open with menu for future use.
|
|
|
02-15-2009, 04:43 AM
|
#9
|
Member
Registered: Jan 2009
Posts: 55
Original Poster
Rep:
|
Hi
I am using Ubuntu 8.10, was able to download Tesseract using Synaptic, thanks and am able to extract the text
Tried evince, not working with scanned images saved as pdf, works well with other pdf files. Was that the reason why pdftotext didnt work as well.
|
|
|
02-15-2009, 09:02 AM
|
#10
|
Member
Registered: Mar 2008
Location: NRW, Germany
Distribution: Arch Linux, using KDE/Plasma
Posts: 392
Rep:
|
there is also gocr (should also be available via apt)
could you try if it works better?
|
|
|
02-15-2009, 12:48 PM
|
#11
|
LQ Guru
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Rep: 
|
Tesseract actually does a better job at OCR than GOCR does.. but tesseract requires the original doc to be a tiff file.
A scanned doc converted to PDF would be an image file. so extracting text from it would be pretty much impossible other than via a OCR program.
PDF files created through other means can be indexed and searched as they actually contain text.. so yes i would say that was part of your problem.
Glad you got it all working !
|
|
|
All times are GMT -5. The time now is 11:44 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|