LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 05-18-2010, 08:56 PM   #1
damgar
Senior Member
 
Registered: Sep 2009
Location: dallas, tx
Distribution: Slackware - current multilib/gsb Arch
Posts: 1,949
Blog Entries: 8

Rep: Reputation: 201Reputation: 201Reputation: 201
I need OCR software.


I'm looking for OCR software. I've installed gocr, and I'm not sure if I'm missing something or not, but the resulting text file was completely illegible, even though the original document was just a few sentences typed my son's teacher.

Any recommendations
 
Old 05-18-2010, 10:19 PM   #2
kurwongbah
Member
 
Registered: Apr 2010
Posts: 82

Rep: Reputation: 23
This has always done a pretty darn good job for me!

http://code.google.com/p/tesseract-ocr/
 
Old 05-19-2010, 12:48 AM   #3
damgar
Senior Member
 
Registered: Sep 2009
Location: dallas, tx
Distribution: Slackware - current multilib/gsb Arch
Posts: 1,949
Blog Entries: 8

Original Poster
Rep: Reputation: 201Reputation: 201Reputation: 201
Thanks. I've been trying to get tesseract to work all night. I finally got it to build, but I get a seg fault each time I try to test it. I'm not really sure what the problem is.
 
Old 05-19-2010, 01:15 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
Quote:
Originally Posted by damgar View Post
Thanks. I've been trying to get tesseract to work all night. I finally got it to build, but I get a seg fault each time I try to test it. I'm not really sure what the problem is.
Try giving it a really simple input file name like x.tif
 
Old 05-19-2010, 05:38 AM   #5
kurwongbah
Member
 
Registered: Apr 2010
Posts: 82

Rep: Reputation: 23
Come to think of it, I believe it was in my distro...
"yum install tesseract" did the trick!
 
Old 05-19-2010, 08:43 AM   #6
damgar
Senior Member
 
Registered: Sep 2009
Location: dallas, tx
Distribution: Slackware - current multilib/gsb Arch
Posts: 1,949
Blog Entries: 8

Original Poster
Rep: Reputation: 201Reputation: 201Reputation: 201
Quote:
Originally Posted by catkin View Post
Try giving it a really simple input file name like x.tif
Same results.
Quote:
bash-4.1# tesseract /home/dtest/x.tif /home/dtest/x.txt -l eng
Tesseract Open Source OCR Engine
Segmentation fault
 
Old 05-19-2010, 10:37 AM   #7
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
Quote:
Originally Posted by damgar View Post
Same results.
tesseract 2.04 (built it using a slightly modified tesseract 2.03 SlackBuild) is working for me on Slackware 13.0 32-bit but did segfault on long input names. The command line that worked was tesseract z2.tif z2 (z2 is kind of catchy huh?).
 
Old 05-19-2010, 07:29 PM   #8
kurwongbah
Member
 
Registered: Apr 2010
Posts: 82

Rep: Reputation: 23
Gave it a go on my work pc. I was able to install from yum.
Still seems to work reasonably well.
I remembered it was very sensitive on the input resolution/file format.
The best results I'm getting are tif/600dpi.
How are you going?
 
Old 05-19-2010, 09:13 PM   #9
damgar
Senior Member
 
Registered: Sep 2009
Location: dallas, tx
Distribution: Slackware - current multilib/gsb Arch
Posts: 1,949
Blog Entries: 8

Original Poster
Rep: Reputation: 201Reputation: 201Reputation: 201
Quote:
Originally Posted by catkin View Post
tesseract 2.04 (built it using a slightly modified tesseract 2.03 SlackBuild) is working for me on Slackware 13.0 32-bit but did segfault on long input names. The command line that worked was tesseract z2.tif z2 (z2 is kind of catchy huh?).
Yes, I do like the name. I'm thinking it's probably a slackware-almost-current+tesseract issue. I had to do some manual patching to both the source and slackbuild, the slacky.eu packages give errors about libjpeg versions, and their slackbuild does a weird time out thing.. I don't have the time it would likely take me to figure this out, but with slack 13.1 just around the corner I'm hoping the slackbuild maintainers will know about it that I do.
 
Old 05-20-2010, 04:27 AM   #10
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
On standard Slackware 13.0 the build was very simple. All I did to modify the SlackBuild from 2.03 from 2.04 was edit tesseract.SlackBuild:
  • changed the version.
  • removed the patch commands.
I put the desired language file (tesseract-2.00.eng.tar.gz) in the build directory in the normal way and ran the modified tesseract.SlackBuild.

Hopefully you are right and all will be well for you on Slackware 13.1.
 
Old 09-30-2010, 04:56 PM   #11
ilnli
Member
 
Registered: Jul 2004
Location: Pakistan
Distribution: Slackware 10.0, SUSE 9.1, RH 7, 7.3, 8, 9, FC2
Posts: 413

Rep: Reputation: 32
Or you can use http://www.ocrconvert.com online to convert your pdf file into text, I've found it very fast and the conversion is quite accurate.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for a OCR software ufmale Linux - Software 1 10-13-2009 11:51 PM
OCR abdoh Linux - Newbie 3 06-28-2009 12:41 AM
How to add new font library to kooka ocr software shridhar005 Linux - Software 3 04-21-2009 03:54 PM
ocr John Master Linux - Software 7 06-12-2005 06:56 PM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 09:36 AM


All times are GMT -5. The time now is 08:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration