LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 08-04-2004, 09:16 PM   #1
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Rep: Reputation: 30
Question Best OCR application?


Hi--

I would like to do a lot of document imaging in my small office.

What is the best OCR app for use with scanned images in Linux?
 
Old 08-04-2004, 09:19 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,965
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
I only know gocr and the front-end kooka ...
Not overly exciting if you've ever used omnipage
or recognita.

[edit]
Did a freshmeat.net search:
http://freshmeat.net/search/?q=%2BOC...y_percent_DESC
[/edit]


Cheers,
Tink

Last edited by Tinkster; 08-04-2004 at 09:24 PM.
 
1 members found this post helpful.
Old 08-04-2004, 09:33 PM   #3
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Tink--

Thanks for doing that search! As I spend more time with Linux, I will know where to go to do those sorts of searches.

BTW, what is the reference to kiwis in your sig file?
 
Old 08-04-2004, 09:43 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,965
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
New Zealanders are commonly referred to as Kiwis,
the Kiwi bird is their national symbol.


Cheers,
Tink
 
Old 09-30-2010, 08:14 PM   #5
ilnli
Member
 
Registered: Jul 2004
Location: Pakistan
Distribution: Slackware 10.0, SUSE 9.1, RH 7, 7.3, 8, 9, FC2
Posts: 413

Rep: Reputation: 32
Quote:
Originally Posted by dgermann View Post
Tink--

Thanks for doing that search! As I spend more time with Linux, I will know where to go to do those sorts of searches.

BTW, what is the reference to kiwis in your sig file?

Use tesseract-ocr or if you want some easy to use service then use this online optical character recognition tool.
 
1 members found this post helpful.
Old 09-30-2010, 09:35 PM   #6
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

ilnli--

Thanks!

Glad you found this thread! I am still looking.

What I have done for the interim is to use a WinXP machine to support both the scanning and then the OCR work via AABBY Fine Reader. That has been satisfactory for me, but it is about the only daily reason I have to have any Windows based machine on the premises.

I'll have to check into the tesseract-ocr reliability currently. I cannot use the online service because of confidentiality needs. Do you use either?

:- Doug.
 
Old 10-01-2010, 05:23 AM   #7
ilnli
Member
 
Registered: Jul 2004
Location: Pakistan
Distribution: Slackware 10.0, SUSE 9.1, RH 7, 7.3, 8, 9, FC2
Posts: 413

Rep: Reputation: 32
I've used tesseract-ocr which is good but it runs on Linux and you have to do some tweaks to your image to get good results from it, as the stuff I work on is confidential so I mainly used the ocrconvert.com which works for me.
 
Old 10-01-2010, 05:46 AM   #8
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
The best out of the box solution I've found is WatchOCR. It's a liveCD distro whose sole purpose is OCR. You put your images in a watch directory, and then a little script converts them into searchable PDFs. With some tweaking, it ought to be possible to save the text as well as the searchable PDF. For OCR it uses Curneiform, and layout analysis is done with ExactCode.

It's presumably possible to get Cuneiform and ExactCode installed on an existing system, though my understanding is that Cuneiform is difficult to get working.

Otherwise, there's OCROpus, which I haven't used, but seems promising.
 
Old 10-01-2010, 09:45 AM   #9
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
I've use tesseract and ocrad in the past, and you can get decent quality out of them if the input quality is good. Also check unpaper:
http://unpaper.berlios.de/

It will help the OCR work better. Sometimes you can also help it by using image filters like white balance and auto-levels, etc.

I don't think you can get as good as say AABBY, but it can be close if the input is good.
 
Old 10-12-2010, 09:02 PM   #10
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

H_TeXMeX_H--

Thanks!

I had not heard of unpaper before. I see it is in the repos for Ubuntu.

All of this stuff together still looks a little much for our production environment. We scan some pages every day, maybe only a dozen or two on most days, but then there are some days when we need to scan a hundred or so in an hour. It is important to be able to do reliable searches on the scanned documents.

So far, it sounds like having the scanner attached to a WinXP machine using AABBY is still the easiest thing to have a non technical person running: she merely feeds the paper in, chooses in the gui whether to scan one side or two, then lets it rip. When all are scanned, she comes back to the AABBY main screen and saves them to a file after the OCR does its work. Pretty simple, and it allows some turning of pages and rearranging the order of pages.

I think CLI would blow a couple of my people away!

Oh well, another reason to keep at least one Windows box on the system for another year or two....

Thanks, H_TeXMeX_H!
 
Old 10-13-2010, 02:17 PM   #11
qrange
Member
 
Registered: Jul 2006
Location: Belgrade, Yugoslavia
Distribution: Debian
Posts: 719

Rep: Reputation: 29
I tried tesseract but it was a disappointment. It couldn't OCR .png screenshot.
ABBYY finereader is probably the best, but not free. they even charge by page!
 
Old 10-13-2010, 09:11 PM   #12
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

qrange--

Thanks for that tip about tesseract, qrange!

Have never had a per page charge from AABBY, so not sure what you're experiencing. It is a really good program. Just wish it were available in Linux. Hopefully some day soon--they have an SDK for Linux.

:- Doug.
 
Old 10-14-2010, 01:17 AM   #13
qrange
Member
 
Registered: Jul 2006
Location: Belgrade, Yugoslavia
Distribution: Debian
Posts: 719

Rep: Reputation: 29
@dgermann

well I was talking about Linux version (it exists!):
http://www.ocr4linux.com/enricing

there's trial version.
 
Old 10-14-2010, 07:39 PM   #14
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

qrange--

OIC! Thanks!

It does appear to actually be an AABBY site. But I agree it is pretty pricey. At the 12,000 pages per year it prices out to 1.75 cents per page at current exchange rates.

That's a lot particularly since you can buy it for Windows and have it forever, for $400--about 2 years' cost of the Linux version.

Knowing that there is a Linux version gives me hope that there will be reasonable pricing and perhaps some other commercial products soon. And maybe a gui Linux version!

Thanks, grange, for pointing this out!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need OCR application for GNOME and FC4 moxieman99 Linux - Software 1 11-25-2005 06:52 AM
OCR Program for Linux RGummi Linux - Software 3 11-11-2005 05:18 PM
ocr John Master Linux - Software 7 06-12-2005 05:56 PM
Ocr apffal Linux - Software 1 06-12-2005 05:01 AM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 08:36 AM


All times are GMT -5. The time now is 07:56 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration