gscan2pdf + tesseract error message not helping / applicable
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: openSuSE Tumbleweed-KDE, Mint 21, MX-21, Manjaro
Posts: 4,629
Rep:
gscan2pdf + tesseract error message not helping / applicable
This is on tumbleweed. Error after scanning a page is this:
Code:
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.358755
[DS] Selected Device[1]: "(null)" (Native)
Error opening data file /usr/share/tessdata/[DS] Device[1] 0:(null) score is 0.358755.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language '[DS] Device[1] 0:(null) score is 0.358755'
Tesseract couldn't load any languages!
Could not initialize tesseract.
I did
Code:
export TESSDATA_PREFIX=/usr/share/tessdata
which is where all the language files reside. I even downloaded the newest eng.traineddata, all to no avail. There is a ton of complaints in the net with exactly this error, but none had solutions other than the two I already tried. Anyone here with ideas?
P.S.: The scanner is a brother MFC-L2710DW 4 in one.
Tesseract sucks imo. But the options (gscan2pdf, gocr) suck much worse. Get tesseract 4.0+. If you have one or two projects where tesseract fails and you need the best ocr, Abbyy (A proprietary program) did a linux version with a one month free trial. It's probably the best option performance wise (only).
Distribution: openSuSE Tumbleweed-KDE, Mint 21, MX-21, Manjaro
Posts: 4,629
Original Poster
Rep:
I have already tesseract 4.1.1 (AFAIR the last minor digit alright). Tesseract itself can't be too bad, Google uses it for its book scanning thingy. The allure of tesseract integrated into gscan2pdf is that one can (ahmm, could, in my case) convert a scan on the fly. And I hate it, when software throws a spoke into my wheels. It should work, darn it all (including misleading error messages since years and years).
Don't try anything fancy, I'd advise. And don't expect it to work, except 1 page at a time.
As for the scanner, scan big. 600 dpi or bigger.I wrote a script to give it one page at a time.
I had one project recently, a play set in the 1950s in rural Ireland. There were hand edits over pale typewritten pages. For me, the word processor stage was essential. The error messages age a bit like hieroglyphics - you figure them, then forget.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.