Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
Due to network maintenance being performed by our provider, LQ will be down starting at 05:01 AM UTC. The exact duration of the downtime isn't currently known. We apologize for the inconvenience.
|
 |
02-02-2009, 11:08 AM
|
#1
|
|
Senior Member
Registered: Sep 2003
Location: UK
Distribution: Debian
Posts: 1,215
Rep:
|
Does OCR work in practice, kooka ocrad?
Just started using ocrad to optical character recognition scanned images inside kooka. To put it mildly it does not work well: A clean letter (i.e. through the post) was just gibberish. Very occasionally it nearly gets a word right.
Gather it is possible to use ocrad with some success. Someone give me a starting point??
|
|
|
|
02-02-2009, 02:45 PM
|
#2
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,612
|
Well, I have to admit that it's not that good of an OCR. But, you can make it significantly better if you do a bit of image filtering / enhancing with GIMP or Imagemagick first. I did some tests a while ago and doing some basic color balancing, white balancing, and a few other filters can make a big difference. Another useful program is:
http://freshmeat.net/projects/unpaper/
You should also try messing around with the ocrad command line options to fine tune results.
In the end you can get quite decent results using these methods. Experiment and see what works best.
|
|
|
|
02-03-2009, 10:11 AM
|
#3
|
|
Senior Member
Registered: Sep 2003
Location: UK
Distribution: Debian
Posts: 1,215
Original Poster
Rep:
|
Discovered that most of the problem was kooka, the K graphical scanner thing. I think it is the most buggy program I have ever come accross - it has more bugs than my grandmother's pubes, as they say in Newcastle.
So lost kooka and used command line. Scanned with resolution of 600, converted to .pnm (this seems essential) and results improved dramatically. A clean letter is almost without error. I'll try unpaper.
(Incidentally is it the most effective bit of propaganda ever that anarchy = no law?)
Thanks.
|
|
|
|
02-03-2009, 10:44 AM
|
#4
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,612
|
I've never used kooka, and recently I have finally gotten rid of all kde programs and found replacement for them. I did this because many were either very buggy or very bloated and slow or very annoying (they would add themselves to the taskbar without my permission, would give me pop-ups and sounds that I didn't want, yuck).
[offtopic]
(Incidentally is it the most effective bit of propaganda ever that anarchy = no law?)
Indeed it is, that's one of the reasons I found that definition, the only definition in any dictionary (that I know of) that comes close to being accurate. Anarchy is not really about laws or chaos or destruction, but instead about government and especially coercive, hierarchical, bureaucratic, mechanistic, and corrupt government. The more power the more corruption.
|
|
|
|
02-03-2009, 11:08 AM
|
#5
|
|
Guru
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Rep: 
|
Might want to look at other OCR options as well.. I've always heard that Tesseract is the best OCR for Linux. .Tesseract was originally written by HP, but is now GPL and one of hte Google code projects available under the Apache license.
Quote:
http://www.mscs.dal.ca/~selinger/ocr-test/
* Tesseract gives extremely good output at a reasonable speed. It is the clear overall winner of the test.
* Ocrad gives reasonable output at extremely high speed. It can be useful in applications where speed is more important than accuracy.
* GOCR gives poor output at a slow speed.
|
Quote:
http://groundstate.ca/ocr
The combination of Tesseract and Ocropus is clearly the project we can most rely on to provide the missing elements of a full-featured Free OCR suite.
|
http://www.linuxjournal.com/article/9676
http://www.linux-archive.org/debian-...ocr-works.html
http://www.linux.com/articles/57222
Just figuired if you were going to put effort into working with Linux OCR, you might want to check out what is reported as one of the more accurate programs.
http://code.google.com/p/tesseract-ocr/
tesseract-ocr is in the Debian repositories.
|
|
|
|
02-03-2009, 12:00 PM
|
#6
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,612
|
Thanks, didn't know about Tesseract, must try it ...
|
|
|
|
02-04-2009, 11:52 AM
|
#7
|
|
Senior Member
Registered: Sep 2003
Location: UK
Distribution: Debian
Posts: 1,215
Original Poster
Rep:
|
I'll try tesseract again. Probably doing something wrong before. Thanks.
Tried it and got better results than ocrad.
For the referrence:
Way I do it: In Gimp; file, aquire, xsane, <scanner name>
In xsane; Select correct bit and clean it up a bit with eyedroppers. Use defaults except set to grey.
In Gimp clean it up some more if necessary (Colours -> Levels is useful), get rid of logos, and save as image.tif (note only one f).
Then:
$ tesseract image.tif image
It saves a file called image.txt. This has few errors but no layout.
Last edited by lugoteehalt; 02-07-2009 at 04:47 AM.
|
|
|
|
02-04-2009, 12:12 PM
|
#8
|
|
Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,612
|
Just to followup, I did manage to install tesseract and it works quite well. It only seems to work on tiff, but that's not too much of an issue.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 11:57 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|