LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   command line tesseract-ocr (https://www.linuxquestions.org/questions/linux-software-2/command-line-tesseract-ocr-4175590141/)

Pedroski 09-25-2016 09:27 PM

command line tesseract-ocr
 
I'm trying to scan this image to Chinese. Something won't work.

What am I doing wrong?

Quote:

pedro@pedro-275E4E-275E5E:~$ tesseract -l chi-sim --tessdata-dir /usr/share/tesseract-ocr /home/pedro/Desktop/ghostInstructions1.jpg /home/pedro/Desktop/ghostInstructions1
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Error opening data file /usr/share/tesseract-ocr/tessdata/chi-sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'chi-sim'
Tesseract couldn't load any languages!
Could not initialize tesseract.
pedro@pedro-275E4E-275E5E:~$

John VV 09-25-2016 09:36 PM

try converting the jpg to a tif or ppm

also if there is LOT of jpg artifacts it will not output a good text file

jpg should be illegal

also HOW did you install tesseract
your package manager should have set the system PATH's for it

Pedroski 09-26-2016 12:50 AM

Can't remember, was a while ago, from a tarball I think.

How can I set TESSDATA_PREFIX ?

I tried like this

pedro@pedro-275E4E-275E5E:~$ $TESSDATA_PREFIX=/usr/share/tesseract-ocr
bash: =/usr/share/tesseract-ocr: No such file or directory
pedro@pedro-275E4E-275E5E:~$ $TESSDATA_PREFIX = /usr/share/tesseract-ocr
=: command not found

Pedroski 09-26-2016 06:35 AM

Turns out, I don't need the TESSDATA_PREFIX

I remembered I did this on an old laptop. I started it, called a terminal and looked through the history until I found the command: goes like this (which I will save in my brand new 'Linux command line' file for future reference)

Quote:

pedro@pedro-275E4E-275E5E:~$ tesseract /home/pedro/Desktop/ghostInstructions1.tif -l chi_sim /home/pedro/Desktop/ghostInstructions1
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Detected 68 diacritics
pedro@pedro-275E4E-275E5E:~$


All times are GMT -5. The time now is 06:24 PM.