gscan2pdf + tesseract error message not helping / applicable

JZL240I-U · 09-13-2020, 03:22 PM

This is on tumbleweed. Error after scanning a page is this:

Code:

[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.358755
[DS] Selected Device[1]: "(null)" (Native)
Error opening data file /usr/share/tessdata/[DS] Device[1] 0:(null) score is 0.358755.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language '[DS] Device[1] 0:(null) score is 0.358755'
Tesseract couldn't load any languages!
Could not initialize tesseract.

I did

Code:

export TESSDATA_PREFIX=/usr/share/tessdata

which is where all the language files reside. I even downloaded the newest eng.traineddata, all to no avail. There is a ton of complaints in the net with exactly this error, but none had solutions other than the two I already tried. Anyone here with ideas?

P.S.: The scanner is a brother MFC-L2710DW 4 in one.

business_kid · 09-14-2020, 08:47 AM

I had my best success without gscan2pdf

tesseract file.jpg >> something.txt
Open txt in word processor, correct & format
Export to pdf if you want to treble the size.

Tesseract sucks imo. But the options (gscan2pdf, gocr) suck much worse. Get tesseract 4.0+. If you have one or two projects where tesseract fails and you need the best ocr, Abbyy (A proprietary program) did a linux version with a one month free trial. It's probably the best option performance wise (only).

JZL240I-U · 09-14-2020, 10:58 AM

I have already tesseract 4.1.1 (AFAIR the last minor digit alright). Tesseract itself can't be too bad, Google uses it for its book scanning thingy. The allure of tesseract integrated into gscan2pdf is that one can (ahmm, could, in my case) convert a scan on the fly. And I hate it, when software throws a spoke into my wheels. It should work, darn it all (including misleading error messages since years and years).

business_kid · 09-14-2020, 12:37 PM

Don't try anything fancy, I'd advise. And don't expect it to work, except 1 page at a time.
As for the scanner, scan big. 600 dpi or bigger.I wrote a script to give it one page at a time.

I had one project recently, a play set in the 1950s in rural Ireland. There were hand edits over pale typewritten pages. For me, the word processor stage was essential. The error messages age a bit like hieroglyphics - you figure them, then forget.