moxieman99 11-22-2005 06:49 PM

OCR woes with Kooka
Tried Google for answers and saw that others had the problems I have, but no solutions.

I have Kooka .44 installed as part of FC4 (2.6.11-1 kernal)

I am trying to do some OCR (optical character recognition) stuff with the scanner.

I installed GOCR (a/k/a JOCR) and OCRAD.

Three questions:

1. Kooka insists on saving the scanned piece as an image/picture (jpeg or some other formats) before I can try OCR on it. Is this how it should be/How to change it?

2. Kooka does not use GOCR, even though I set it to use GOCR. It immediately fires up OCRAD. I have the path to GOCR in the settings and GOCR as the app to use, and even removed OCRAD entirely. No luck. It still tries for OCRAD and gives me no option to use GOCR. What am I overlooking?

3. What specific file from my OCRAD install should I put in as the final part of the OCRAD path? I figured out which file to use in GOCR (which I can't get kooka to use), but can't figure out which OCRAD file to use.

I get no error message. A screen comes up for OCRAD and the little KDE wheels spin, but it goes for hours with nothing happening.

Any help appreciated.


aikempshall 11-25-2005 08:03 AM

Possibly the file that you're trying to OCR is too noisey. Save the scanned file as a JPG format and try running OCRAD from the command line. I'm not at my machine right now but for noisey files I ran it through a filter to clean the file before putting it through OCRAD.

moxieman99 11-25-2005 12:22 PM

Thanks, I will try that (cleaning up the image first), but why no choice and no ability to use GOCR?


aikempshall 11-25-2005 04:01 PM

I soometimes scan the financial papers such as the Financial Times which is printed on "pink" paper.

I scan in Gray mode at a resolution of 600 and save as a jpeg
Clean the images with the following command
jpegtopnm kscan_0001.jpeg | pamditherbw -threshold -value 0.50 | pamtopnm > kscan_0001.pbm
then in kooka ocr the kscan_0001.pbm image

both gocr and ocrad work on my system. ocrad gives by far the better result. For me ocrad has a 3% errror rate whereas gocr on the same document has a 30% error rate.

I would suggest that you scan your document and try ocrad and gocr at the command line.


Looking at the code ksaneocr.cpp my inpression is that kooka will only ever use ocrad or, if available, Kadmos. This page suggests that there may be a change on the way

