LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-16-2008, 09:49 AM   #1
modafine
LQ Newbie
 
Registered: May 2008
Posts: 3

Rep: Reputation: 0
Smile including GOCR in a C++ program.


Hello everybody,

I'm working under a C++ project.

I want to learn text informations within an image.

I should apply an ocr to my input image to convert it into a text document. So I want to integrate GOCR in my C++ program.

Could you help me to find the steps to be followed to integrate the gocr in my program.

Thank you for help.
 
Old 05-17-2008, 10:11 AM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
In my experience gocr doesn't give nearly as accurate results as tesseract. There's API documentation for tesseract here.
 
Old 05-19-2008, 04:45 AM   #3
modafine
LQ Newbie
 
Registered: May 2008
Posts: 3

Original Poster
Rep: Reputation: 0
thank you matthewg42.

I download tesseract-2.01 and i install it. the process of installation is like that:

./configure
make
make install
export TESSDATA_PREFIX="usr/local/share/"


but when i execute it "tesseract phototest.tif phototest -l eng"

I have this error message:

Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset



Can you help me to test this ocr because i have to choice one of these programs (tesseract or gocr) in order to integrate it in my c++ program.

thanks.

Last edited by modafine; 05-19-2008 at 04:47 AM.
 
Old 05-19-2008, 10:07 AM   #4
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
I only ever installed it from the Ubuntu repositories, and it 'just worked'.

To use the command line too, you need to convert input images into tiff format first (or "MDR", whatever that is). I used the ImageMagick convert program to do this. e.g. using a page of text from the distributed proofreaders project:
Code:
wget http://www.pgdp.net/projects/projectID47d3b81d1228b/005.png
convert 005.png 005.tiff
tesseract 005.tiff 005
This produces the file 005.txt containing the OCR'd text.

I don't know how easy or otherwise it will be to use it from a program, rather than with the command line program.

Like most all OCR programs, it's not perfect, but it's pretty good compared to other free software OCR software.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
including id3lib BackwardsDown Programming 8 01-31-2007 10:44 AM
Where to download rpm packages of "gocr" and "kooka" satimis Linux - Software 6 02-23-2006 11:35 PM
Including Libraries Quest101 Programming 1 01-01-2005 01:22 PM
gocr Johng Linux - Software 0 04-23-2004 04:59 AM
Compiling a c++ program including kmainwindow.h Haiyadragon Programming 8 04-03-2004 05:53 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration