LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 06-18-2006, 02:32 AM   #1
General
Member
 
Registered: Aug 2005
Distribution: Debian 7
Posts: 526

Rep: Reputation: 31
Computer ==> Paper ==> Computer


Computers seem to have a tough time recognizing text scanned from a paper document. Just look at any "Search inside this book" feature on Amazon.com and you will find a lot of gibberish that is typical of text scanners.

Now, if someone wanted to print out some records, what would be the best way to print this data, such that it could be scanned back to a computer, without flaw? Are there special fonts designed with this in mind?

Last edited by General; 06-18-2006 at 02:34 AM.
 
Old 06-18-2006, 02:44 AM   #2
Jeebizz
Senior Member
 
Registered: May 2004
Distribution: Slackware14.2 64-Bit Desktop, Devuan 2.0 ASCII Toshiba Satellite Notebook
Posts: 2,674

Rep: Reputation: 733Reputation: 733Reputation: 733Reputation: 733Reputation: 733Reputation: 733Reputation: 733
I would have thought that most scanners these days would have good OCR capability, but perhaps I could be wrong. I don't know if it is a font issue with the computer, because assuming that when a document is scanned it might give you the option to output it to plain text, or to a word processing format, PDF, etc. All I can really say is try searching around for a device that is specifically made to handle text scanning, since a scanner is more of a general device, for images as well.... I hope this might give you some kind of direction, but maybe somebody else here might have something to add?

[edit]

It also could be the software itself. I am sure that also there is OCR software out there that might be able to do a better job, rather than a program like photoshop, etc.

Last edited by Jeebizz; 06-18-2006 at 02:45 AM.
 
Old 06-18-2006, 03:12 AM   #3
aysiu
Senior Member
 
Registered: May 2005
Distribution: Ubuntu with IceWM
Posts: 1,775

Rep: Reputation: 86
The best way to do it is to scan it as an image.
 
Old 06-18-2006, 03:21 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,842

Rep: Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003
I'm not an expert at this, but I believe OCR does best with simple fonts that are evenly spaced. Any clear, large fixed-width san-serif font should do, such as a terminal font. The more regular the design, the better. I don't know if making it bold would help, but the characters need to be spaced widely enough that each character stands out clearly (not too widely though, or it might think there are spaces where there shouldn't be).

The quality of the scanning is also a factor. You need to have a fairly high resolution at a decent contrast. I don't know what the optimal settings would be though. Also, different OCR programs use different recognition algorithms, so they may perform differently on the same text. You may want to print out some samples in a few different fonts as a test.

The main point I guess is simply to remove anything that might confuse the reader and give it the simplest, clearest text you can. Unfortunately, OCR on Linux is one of those areas still in need of improvement. Just be thankful you're using a simple alphabet and don't need to scan a Japanese document or something.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why Scourge (2nd computer) isn't getting ip address from Misery (1st computer) dhcp? pslave Linux - Networking 2 04-09-2006 05:58 AM
Internet is slower on my win computer than on my linux computer eXor Linux - Networking 1 12-03-2004 08:58 AM
Can't copy files from an SCO Unix System V computer to a Linux computer gnppapas Linux - General 2 11-27-2004 01:39 PM
Why would a windows computer smoke a linux computer for download speed ? lostboy Linux - General 4 10-21-2003 05:20 PM
How can i portage Linux from computer with Celeron proc to computer with Pentium 166? gdi Linux - General 4 05-31-2003 01:11 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 10:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration