LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-22-2010, 11:24 AM   #1
taylorkh
Senior Member
 
Registered: Jul 2006
Location: North Carolina
Distribution: CentOS 6, CentOS 7 (with Mate), Ubuntu 16.04 Mate
Posts: 2,127

Rep: Reputation: 174Reputation: 174
grscan2pdf - does an image at the top of a document prevent OCR?


Just installed gscan2pdf 0.9.20 on Ubuntu 10.04. It is hooked up to a Brother MFC 240c scanner. Scans fine with xsane. It also scans fine with gscan2pdf. I started with a document I printed to my laser printer - B&W at 300 dpi. GOCR produced very poor results. I switched to Tesseract and the OCR was 100%

Then I tried to scan the document I was actually interested in converting to a searchable PDF. NOTHING! The only differences I see between the two documents are as follows:

- document 2 has an image (State seal) at the top
- document 2 has various sizes of type
- document 2 has a signature in ink

I would have expected the program to at least OCR some of the document. It should have been able to find SOME text.

I have tried increasing the resolution of the scan to 400 then 600 dpi. No help. Set it to 1200 dpi - still waiting for the OCR to run.

I am at a loss. Any suggestions?

TIA,

Ken

p.s. I have an OLD version of Omnipage. I guess I will dig it out and install it on a VMWare XP guest

p.p.s. The 1200 DPI OCR just ran - nothing.
 
Old 11-22-2010, 12:12 PM   #2
taylorkh
Senior Member
 
Registered: Jul 2006
Location: North Carolina
Distribution: CentOS 6, CentOS 7 (with Mate), Ubuntu 16.04 Mate
Posts: 2,127

Original Poster
Rep: Reputation: 174Reputation: 174
Well I just installed not my OLD purchased full version of Omnipage but the free, stripped down teaser version which came with the MFC on an XP Virtual Machine. It scanned the offending document and a second on the same letterhead 99+ %. The only issue was with apostrophes and quotes. But that may have been a problem with the resulting text documents when I moved them to the Linux host - have not gone back and looked at them in Windows.

Boy am I depressed. I wish I could find a good OCR program for Linux.

Ken
 
Old 11-22-2010, 12:24 PM   #3
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Unfortunately there is no 100% solution, nor a 99%, nor 90%, but maybe 70-80%.

I would use:
http://unpaper.berlios.de/
http://www.gnu.org/software/ocrad/ocrad.html

You may also want to use imagemagick or gimp to run a few filters like white balance, maybe some brightness contrast. Sometimes unsharp mask, threshold, levels, despeckle.

You kinda have to give it a standard black text on white background almost perfectly aligned with no specs. Then it may work up to 80% or so, sometimes more.

Tesseract is ok too, what was wrong with it ? I don't get it.

Last edited by H_TeXMeX_H; 11-22-2010 at 12:25 PM.
 
Old 11-22-2010, 04:39 PM   #4
taylorkh
Senior Member
 
Registered: Jul 2006
Location: North Carolina
Distribution: CentOS 6, CentOS 7 (with Mate), Ubuntu 16.04 Mate
Posts: 2,127

Original Poster
Rep: Reputation: 174Reputation: 174
When I had GOCR selected the program ran unpaper first then OCR - at least that is what I recall seeing as the message windows flashed up.

Ken
 
Old 11-23-2010, 06:33 AM   #5
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
When you scan, make sure to run a preview scan first, it will auto-adjust some contrast and brightness settings that might help. Either way try ocrad.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Please help to prevent displaying image on web browser chobong Linux - Newbie 9 09-13-2010 09:03 PM
LXer: KDESC 4.3+: Video, Music, Image and Document Preview in Dolphin LXer Syndicated Linux News 0 04-25-2010 11:10 PM
View MDI (Microsoft Document Image) on linux? arobinson74 Linux - Software 0 11-24-2006 03:39 PM
How to prevent linux image crash yyang Linux - Software 4 01-17-2006 07:45 PM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 08:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration