Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
08-04-2004, 09:16 PM
|
#1
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Rep:
|
Best OCR application?
Hi--
I would like to do a lot of document imaging in my small office.
What is the best OCR app for use with scanned images in Linux?
|
|
|
|
08-04-2004, 09:19 PM
|
#2
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
I only know gocr and the front-end kooka ...
Not overly exciting if you've ever used omnipage
or recognita.
[edit]
Did a freshmeat.net search:
http://freshmeat.net/search/?q=%2BOC...y_percent_DESC
[/edit]
Cheers,
Tink
Last edited by Tinkster; 08-04-2004 at 09:24 PM.
|
|
|
1 members found this post helpful.
|
08-04-2004, 09:33 PM
|
#3
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Original Poster
Rep:
|
Tink--
Thanks for doing that search! As I spend more time with Linux, I will know where to go to do those sorts of searches.
BTW, what is the reference to kiwis in your sig file?
|
|
|
|
08-04-2004, 09:43 PM
|
#4
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
New Zealanders are commonly referred to as Kiwis,
the Kiwi bird is their national symbol.
Cheers,
Tink
|
|
|
|
09-30-2010, 08:14 PM
|
#5
|
|
Member
Registered: Jul 2004
Location: Pakistan
Distribution: Slackware 10.0, SUSE 9.1, RH 7, 7.3, 8, 9, FC2
Posts: 406
Rep:
|
Quote:
Originally Posted by dgermann
Tink--
Thanks for doing that search! As I spend more time with Linux, I will know where to go to do those sorts of searches.
BTW, what is the reference to kiwis in your sig file?
|
Use tesseract-ocr or if you want some easy to use service then use this online optical character recognition tool.
|
|
|
1 members found this post helpful.
|
09-30-2010, 09:35 PM
|
#6
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Original Poster
Rep:
|
ilnli--
Thanks!
Glad you found this thread! I am still looking.
What I have done for the interim is to use a WinXP machine to support both the scanning and then the OCR work via AABBY Fine Reader. That has been satisfactory for me, but it is about the only daily reason I have to have any Windows based machine on the premises.
I'll have to check into the tesseract-ocr reliability currently. I cannot use the online service because of confidentiality needs. Do you use either?
:- Doug.
|
|
|
|
10-01-2010, 05:23 AM
|
#7
|
|
Member
Registered: Jul 2004
Location: Pakistan
Distribution: Slackware 10.0, SUSE 9.1, RH 7, 7.3, 8, 9, FC2
Posts: 406
Rep:
|
I've used tesseract-ocr which is good but it runs on Linux and you have to do some tweaks to your image to get good results from it, as the stuff I work on is confidential so I mainly used the ocrconvert.com which works for me.
|
|
|
|
10-01-2010, 05:46 AM
|
#8
|
|
Senior Member
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,843
Rep: 
|
The best out of the box solution I've found is WatchOCR. It's a liveCD distro whose sole purpose is OCR. You put your images in a watch directory, and then a little script converts them into searchable PDFs. With some tweaking, it ought to be possible to save the text as well as the searchable PDF. For OCR it uses Curneiform, and layout analysis is done with ExactCode.
It's presumably possible to get Cuneiform and ExactCode installed on an existing system, though my understanding is that Cuneiform is difficult to get working.
Otherwise, there's OCROpus, which I haven't used, but seems promising.
|
|
|
|
10-01-2010, 09:45 AM
|
#9
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,614
|
I've use tesseract and ocrad in the past, and you can get decent quality out of them if the input quality is good. Also check unpaper:
http://unpaper.berlios.de/
It will help the OCR work better. Sometimes you can also help it by using image filters like white balance and auto-levels, etc.
I don't think you can get as good as say AABBY, but it can be close if the input is good.
|
|
|
|
10-12-2010, 09:02 PM
|
#10
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Original Poster
Rep:
|
H_TeXMeX_H--
Thanks!
I had not heard of unpaper before. I see it is in the repos for Ubuntu.
All of this stuff together still looks a little much for our production environment. We scan some pages every day, maybe only a dozen or two on most days, but then there are some days when we need to scan a hundred or so in an hour. It is important to be able to do reliable searches on the scanned documents.
So far, it sounds like having the scanner attached to a WinXP machine using AABBY is still the easiest thing to have a non technical person running: she merely feeds the paper in, chooses in the gui whether to scan one side or two, then lets it rip. When all are scanned, she comes back to the AABBY main screen and saves them to a file after the OCR does its work. Pretty simple, and it allows some turning of pages and rearranging the order of pages.
I think CLI would blow a couple of my people away!
Oh well, another reason to keep at least one Windows box on the system for another year or two....
Thanks, H_TeXMeX_H!
|
|
|
|
10-13-2010, 02:17 PM
|
#11
|
|
Member
Registered: Jul 2006
Location: Belgrade, Serbia
Distribution: Debian
Posts: 571
Rep:
|
I tried tesseract but it was a disappointment. It couldn't OCR .png screenshot.
ABBYY finereader is probably the best, but not free. they even charge by page!
|
|
|
|
10-13-2010, 09:11 PM
|
#12
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Original Poster
Rep:
|
qrange--
Thanks for that tip about tesseract, qrange!
Have never had a per page charge from AABBY, so not sure what you're experiencing. It is a really good program. Just wish it were available in Linux. Hopefully some day soon--they have an SDK for Linux.
:- Doug.
|
|
|
|
10-14-2010, 01:17 AM
|
#13
|
|
Member
Registered: Jul 2006
Location: Belgrade, Serbia
Distribution: Debian
Posts: 571
Rep:
|
@dgermann
well I was talking about Linux version (it exists!):
http://www.ocr4linux.com/en ricing
there's trial version.
|
|
|
|
10-14-2010, 07:39 PM
|
#14
|
|
Member
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 291
Original Poster
Rep:
|
qrange--
OIC! Thanks!
It does appear to actually be an AABBY site. But I agree it is pretty pricey. At the 12,000 pages per year it prices out to 1.75 cents per page at current exchange rates.
That's a lot particularly since you can buy it for Windows and have it forever, for $400--about 2 years' cost of the Linux version.
Knowing that there is a Linux version gives me hope that there will be reasonable pricing and perhaps some other commercial products soon. And maybe a gui Linux version!
Thanks, grange, for pointing this out!
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 06:15 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|