LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-25-2016, 08:27 PM   #1
Pedroski
Senior Member
 
Registered: Jan 2002
Location: Nanjing, China
Distribution: Ubuntu 20.04
Posts: 1,928

Rep: Reputation: 70
command line tesseract-ocr


I'm trying to scan this image to Chinese. Something won't work.

What am I doing wrong?

Quote:
pedro@pedro-275E4E-275E5E:~$ tesseract -l chi-sim --tessdata-dir /usr/share/tesseract-ocr /home/pedro/Desktop/ghostInstructions1.jpg /home/pedro/Desktop/ghostInstructions1
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Error opening data file /usr/share/tesseract-ocr/tessdata/chi-sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'chi-sim'
Tesseract couldn't load any languages!
Could not initialize tesseract.
pedro@pedro-275E4E-275E5E:~$
 
Old 09-25-2016, 08:36 PM   #2
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,531

Rep: Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622Reputation: 2622
try converting the jpg to a tif or ppm

also if there is LOT of jpg artifacts it will not output a good text file

jpg should be illegal

also HOW did you install tesseract
your package manager should have set the system PATH's for it

Last edited by John VV; 09-25-2016 at 08:39 PM.
 
Old 09-25-2016, 11:50 PM   #3
Pedroski
Senior Member
 
Registered: Jan 2002
Location: Nanjing, China
Distribution: Ubuntu 20.04
Posts: 1,928

Original Poster
Rep: Reputation: 70
Can't remember, was a while ago, from a tarball I think.

How can I set TESSDATA_PREFIX ?

I tried like this

pedro@pedro-275E4E-275E5E:~$ $TESSDATA_PREFIX=/usr/share/tesseract-ocr
bash: =/usr/share/tesseract-ocr: No such file or directory
pedro@pedro-275E4E-275E5E:~$ $TESSDATA_PREFIX = /usr/share/tesseract-ocr
=: command not found
 
Old 09-26-2016, 05:35 AM   #4
Pedroski
Senior Member
 
Registered: Jan 2002
Location: Nanjing, China
Distribution: Ubuntu 20.04
Posts: 1,928

Original Poster
Rep: Reputation: 70
Turns out, I don't need the TESSDATA_PREFIX

I remembered I did this on an old laptop. I started it, called a terminal and looked through the history until I found the command: goes like this (which I will save in my brand new 'Linux command line' file for future reference)

Quote:
pedro@pedro-275E4E-275E5E:~$ tesseract /home/pedro/Desktop/ghostInstructions1.tif -l chi_sim /home/pedro/Desktop/ghostInstructions1
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Detected 68 diacritics
pedro@pedro-275E4E-275E5E:~$
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Tesseract OCR? gael33 Linux - Newbie 0 08-20-2015 01:45 PM
Use of tesseract ocr command rohaanembedded Programming 12 12-17-2013 11:42 PM
Tesseract - Bulk OCR whole folder of read only pdfs forbinproject Linux - Software 1 09-06-2012 11:17 PM
LXer: Optical Character Recognition With Tesseract OCR On Ubuntu 7.04 LXer Syndicated Linux News 0 08-30-2007 06:30 PM
OCR & Tesseract...Anyone tried it ? 2GNUBY Linux - Desktop 0 10-10-2006 03:39 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration