Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803
Rep:
Looking for OCR software tips. Anybody got any?
I have some very old typewritten documents that I've been trying to scan/convert-to-text using xsane->PBM->ocrad and I've been getting really, really awful results. So bad, in fact, that re-typing the documents looks like a better way to go than waste any more time with OCR. I realize that some cleanup of the OCR output is almost a given but what I'm seeing is more like 99% would have to be rekeyed. I haven't used any OCR software since the Win3.11 days and, while OCR's results weren't 100% accurate back then, it was orders of magnitude better than what I'm seeing. I would have expected that typewritten text would be a piece of cake to convert than, say, a photocopied magazine article with proportional fonts, kerning, etc.
"ocrad" recommends having at least 20 pixels per character and I've scanned the original documents at resolutions ranging from 128bpi to 2400bpi (maybe excessive, I know) and the results stink no matter what.
Does anyone have any experience using this combination of software and has gotten reasonable results? Is there a better OSS OCR package than ocrad?
I haven't used any of these, but the following are "popular" and noted in the literature. You can try "tesseract" and/or "gocr" both command line. There are a few guis for these command line ocrs, for example "gImageReader", OcrGui". You look like you use rpm based linux, so you may get lucky at rpmfind.net or rpm.pbone.net . Good luck. Hope it helps. Let us know how you go.
I've tried all the free linux ocr stuff (that I know of) without success. I have not tested any of the current windows apps either in wine or windows.
I too played with some very old windows OCR and a few very specialized ocr uses. I have no idea why the current linux ocr is so bad. Remember when you could actually see it trying to resolve each single character?
At one time we used special type disks or balls for typing into OCRX or some of the other fonts.
I gave up and found that a not for profit I help has a Xerox copier that somehow seems to get fantastic results in a few minutes or waiting.
Use tesseract and try to make sure the images you scanned are lined up properly, and use GIMP to make the pages as white on black as possible and reduce noise if needed.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.