OCRAD returns gibberish EVERY time - is there a good HowTo?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
OCRAD returns gibberish EVERY time - is there a good HowTo?
i am running into issues with ocrad and kooka. No matter what image I feed it, regardless of format, typeface, DPI, color depth, etc. all ocrad returns for text is gibberish. Is there a howto out there which explains ocrad troubleshooting?
I am currently at:
ocrad 0.11
kooka 0.44
KDE 3.4.1
I have never been successful when trying to get any OCR package to work correctly under Linux. I've searched both Google and Yahoo for info on ocrad (searched for ocrad howto, ocrad how to, and ocrad troubleshooting) and all I can find is download sites for ocrad, and doorway pages (search engine spam)).
Can anyone point me in the right direction? Is there any sort of "benchmark" image used to calibrate ocrad so we can compare results?
Based on that I presume I have to switch the system to iso-8859-15 ? doing a quick search (man-k) I see that this is defined in /etc/sysconfig/langage and the setting I need to change is:
Correct - that would be my interpretation also. By design and if the output from ocrad is iso-8859 there may be a tool that will convert the output into UTF-8.
I think what you need to do, on a temporary basis, is set LC_ALL to whatever and then export LC_ALL then try locale.
When I did that for en_US.UTF-8 the change reflected in locale but made no difference to the output of ocrad!!!!!!!!!!!!!!!!!!!!
ocrad -help
returns
GNU Ocrad - Optical Character Recognition program.
Reads pbm file(s), or standard input, and sends text to standard output.
Usage: ocrad [options] [files]
Options:
-h, --help display this help and exit
-V, --version output version information and exit
-a, --append append text to output file
-b, --block=<n> process only the specified text block
-c, --charset=<name> try `--charset=help' for a list of names
-f, --force force overwrite of output file
-F, --format=<fmt> output format (byte, utf8)
-i, --invert invert image levels (white on black)
-l, --layout=<n> layout analysis, 0=none, 1=column, 2=full
-o <file> place the output into <file>
-s, --scale=[-]<n> scale input image by [1/]<n>
-t, --transform=<name> try `--transform=help' for a list of names
-v, --verbose be verbose
-x <file> export OCR Results File to <file>
I have not tried that (ocrad from the command line) - I'm trying to find a solution anyone can use. oh and I just exported LC_ALL with the iso setting. Here is the output from locale now:
Is it possible to reconfigure kooka, etc. to call ocrad or gocr with that command line argument to force unicode (is it defined in a config file somewhere), or is that bit hard-coded into kooka, requiring recompilation?
Also; if I were to set an alias for say, ocrad pointing to ocrad -F "en_us.UTF-8" will applications pick up the alias and use that, or will their command line arguments completely override the alias?
I suppose last resort I could rename or move ocrad and gocr and write a shell script to take kooka's arguments and pass them on, only substituting the desired argument for UTF-8.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.