LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-12-2013, 04:55 PM   #1
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Older: Coherent, MacOS, Red Hat, Big Iron IXs: AIX, Solaris, Tru64
Posts: 2,181

Rep: Reputation: 365Reputation: 365Reputation: 365Reputation: 365
Looking for OCR software tips. Anybody got any?


I have some very old typewritten documents that I've been trying to scan/convert-to-text using xsane->PBM->ocrad and I've been getting really, really awful results. So bad, in fact, that re-typing the documents looks like a better way to go than waste any more time with OCR. I realize that some cleanup of the OCR output is almost a given but what I'm seeing is more like 99% would have to be rekeyed. I haven't used any OCR software since the Win3.11 days and, while OCR's results weren't 100% accurate back then, it was orders of magnitude better than what I'm seeing. I would have expected that typewritten text would be a piece of cake to convert than, say, a photocopied magazine article with proportional fonts, kerning, etc.

"ocrad" recommends having at least 20 pixels per character and I've scanned the original documents at resolutions ranging from 128bpi to 2400bpi (maybe excessive, I know) and the results stink no matter what.

Does anyone have any experience using this combination of software and has gotten reasonable results? Is there a better OSS OCR package than ocrad?

TIA...

--
Rick
 
Old 02-12-2013, 10:34 PM   #2
ArfaSmif
Member
 
Registered: Oct 2008
Location: Brisbane Australia
Distribution: Fedora, Centos
Posts: 298

Rep: Reputation: 65
I haven't used any of these, but the following are "popular" and noted in the literature. You can try "tesseract" and/or "gocr" both command line. There are a few guis for these command line ocrs, for example "gImageReader", OcrGui". You look like you use rpm based linux, so you may get lucky at rpmfind.net or rpm.pbone.net . Good luck. Hope it helps. Let us know how you go.
 
Old 02-12-2013, 10:42 PM   #3
jefro
Moderator
 
Registered: Mar 2008
Posts: 20,304

Rep: Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213Reputation: 3213
I've tried all the free linux ocr stuff (that I know of) without success. I have not tested any of the current windows apps either in wine or windows.


I too played with some very old windows OCR and a few very specialized ocr uses. I have no idea why the current linux ocr is so bad. Remember when you could actually see it trying to resolve each single character?

At one time we used special type disks or balls for typing into OCRX or some of the other fonts.

I gave up and found that a not for profit I help has a Xerox copier that somehow seems to get fantastic results in a few minutes or waiting.
 
Old 02-13-2013, 03:06 AM   #4
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
Use tesseract and try to make sure the images you scanned are lined up properly, and use GIMP to make the pages as white on black as possible and reduce noise if needed.

Possibly useful:
https://github.com/Flameeyes/unpaper
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
OCR Software and Slackware Woodsman Slackware 8 11-17-2012 10:16 PM
any good OCR software out there baronobeefdip Linux - Software 7 04-08-2011 05:43 PM
I need OCR software. damgar Linux - Software 10 09-30-2010 04:56 PM
Looking for a OCR software ufmale Linux - Software 1 10-13-2009 11:51 PM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 09:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration