LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 07-17-2005, 12:08 AM   #1
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Rep: Reputation: 46
OCRAD returns gibberish EVERY time - is there a good HowTo?


i am running into issues with ocrad and kooka. No matter what image I feed it, regardless of format, typeface, DPI, color depth, etc. all ocrad returns for text is gibberish. Is there a howto out there which explains ocrad troubleshooting?

I am currently at:

ocrad 0.11
kooka 0.44
KDE 3.4.1

I have never been successful when trying to get any OCR package to work correctly under Linux. I've searched both Google and Yahoo for info on ocrad (searched for ocrad howto, ocrad how to, and ocrad troubleshooting) and all I can find is download sites for ocrad, and doorway pages (search engine spam)).

Can anyone point me in the right direction? Is there any sort of "benchmark" image used to calibrate ocrad so we can compare results?
 
Old 07-17-2005, 05:28 PM   #2
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
I tried gocr as well after posting this and get the same results. I am thinking of trying OmniPage under wine, if it will run.
 
Old 08-17-2005, 05:32 PM   #3
iamjiwjr
LQ Newbie
 
Registered: Jan 2004
Posts: 1

Rep: Reputation: 0
I gave up on Kooka and went to Vuescan. It works perfectly for me. It wasn't free, but it is very good.

OCR is unformatted text only, but accurate.

Good luck.
 
Old 08-23-2005, 06:32 AM   #4
aikempshall
Member
 
Registered: Nov 2003
Location: Bristol, Britain
Distribution: Slackware
Posts: 379

Rep: Reputation: 37
See my reply to

http://www.linuxquestions.org/questi...87#post1814987

Regards
 
Old 08-23-2005, 08:36 PM   #5
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
Hmm so that is my problem. How do I fix it?
 
Old 08-24-2005, 08:42 AM   #6
aikempshall
Member
 
Registered: Nov 2003
Location: Bristol, Britain
Distribution: Slackware
Posts: 379

Rep: Reputation: 37
I assume you mean the UTF-8 / Suse issue?
 
Old 08-24-2005, 08:46 PM   #7
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
Yes
 
Old 08-25-2005, 05:03 AM   #8
aikempshall
Member
 
Registered: Nov 2003
Location: Bristol, Britain
Distribution: Slackware
Posts: 379

Rep: Reputation: 37
If I type locale at the command line I get -

LANG=en_GB.iso88591
LC_CTYPE="en_GB"
LC_NUMERIC="en_GB"
LC_TIME="en_GB"
LC_COLLATE="en_GB"
LC_MONETARY="en_GB"
LC_MESSAGES="en_GB"
LC_PAPER="en_GB"
LC_NAME="en_GB"
LC_ADDRESS="en_GB"
LC_TELEPHONE="en_GB"
LC_MEASUREMENT="en_GB"
LC_IDENTIFICATION="en_GB"
LC_ALL=en_GB

What response do you get?

Also at command line ocrad --charset=help

What response?

Last edited by aikempshall; 08-25-2005 at 05:09 AM.
 
Old 08-25-2005, 09:04 PM   #9
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
Code:
kim@kimp4:~> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
kim@kimp4:~> ocrad --charset=help
Valid charset names are:  ascii  iso-8859-9  iso-8859-15
kim@kimp4:~>
Based on that I presume I have to switch the system to iso-8859-15 ? doing a quick search (man-k) I see that this is defined in /etc/sysconfig/langage and the setting I need to change is:

Code:
RC_LANG="en_US.UTF-8"
to iso-8859-15

However, then it wouldn't be unicode, right?


Is this correct?

Last edited by KimVette; 08-25-2005 at 09:14 PM.
 
Old 08-26-2005, 04:03 AM   #10
aikempshall
Member
 
Registered: Nov 2003
Location: Bristol, Britain
Distribution: Slackware
Posts: 379

Rep: Reputation: 37
Correct - that would be my interpretation also. By design and if the output from ocrad is iso-8859 there may be a tool that will convert the output into UTF-8.
 
Old 08-26-2005, 09:11 PM   #11
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
Thanks, aikempshall
 
Old 08-26-2005, 09:46 PM   #12
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
still getting:

Code:
im@kimp4:~> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
kim@kimp4:~>
after changing /etc/sysconfing/language. . . and of course gibberish from gocr (and ocrad)

Trying RC_LANG="iso-8859-15" next.
 
Old 08-26-2005, 09:57 PM   #13
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
changed it again (this time to en_US.ISO-8859-1) and rebooted (Again!), no dice.

Code:
kim@kimp4:/etc/sysconfig> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
 
Old 08-27-2005, 06:34 AM   #14
aikempshall
Member
 
Registered: Nov 2003
Location: Bristol, Britain
Distribution: Slackware
Posts: 379

Rep: Reputation: 37
I think what you need to do, on a temporary basis, is set LC_ALL to whatever and then export LC_ALL then try locale.

When I did that for en_US.UTF-8 the change reflected in locale but made no difference to the output of ocrad!!!!!!!!!!!!!!!!!!!!

ocrad -help

returns

GNU Ocrad - Optical Character Recognition program.
Reads pbm file(s), or standard input, and sends text to standard output.

Usage: ocrad [options] [files]
Options:
-h, --help display this help and exit
-V, --version output version information and exit
-a, --append append text to output file
-b, --block=<n> process only the specified text block
-c, --charset=<name> try `--charset=help' for a list of names
-f, --force force overwrite of output file
-F, --format=<fmt> output format (byte, utf8)
-i, --invert invert image levels (white on black)
-l, --layout=<n> layout analysis, 0=none, 1=column, 2=full
-o <file> place the output into <file>
-s, --scale=[-]<n> scale input image by [1/]<n>
-t, --transform=<name> try `--transform=help' for a list of names
-v, --verbose be verbose
-x <file> export OCR Results File to <file>

the -F flag suggest output in utf8

Have you tried ocrad at the command line?
 
Old 08-27-2005, 12:03 PM   #15
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Original Poster
Rep: Reputation: 46
I have not tried that (ocrad from the command line) - I'm trying to find a solution anyone can use. oh and I just exported LC_ALL with the iso setting. Here is the output from locale now:

Code:
LC_ALL=en_US.ISO-8859-1
kim@kimp4:~> export LC_ALL
kim@kimp4:~> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
kim@kimp4:~>
Is it possible to reconfigure kooka, etc. to call ocrad or gocr with that command line argument to force unicode (is it defined in a config file somewhere), or is that bit hard-coded into kooka, requiring recompilation?

Also; if I were to set an alias for say, ocrad pointing to ocrad -F "en_us.UTF-8" will applications pick up the alias and use that, or will their command line arguments completely override the alias?

I suppose last resort I could rename or move ocrad and gocr and write a shell script to take kooka's arguments and pass them on, only substituting the desired argument for UTF-8.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cyrillic characters recognition with OCRAD? z-vet Linux - Software 0 08-06-2005 04:52 AM
xchm 0.9.6-1 displaying gibberish? nagromo Linux - Software 0 12-19-2004 04:43 PM
Console gibberish linmonkey Slackware 2 07-12-2004 09:14 AM
Where's a good Apache2 + SSL Howto groover Linux - Software 4 04-05-2004 07:04 PM
Anyone know a good winmodem howto guide? vdogvictor Linux - Hardware 1 03-15-2004 10:00 PM


All times are GMT -5. The time now is 12:50 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration