Compiling tesseract-2.03: error: ‘INT32’ was not declared in this scope
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: PCLinuxOS2023 Fedora38 + 50+ other Linux OS, for test only.
Posts: 17,493
Rep:
'tesseract-2.03' compiles with no errors on PCLinuxOS2007,
g++-4.1, and on Suse 10.3, g++-4.2 .
I used this source, mainly to see the patching : http://packages.debian.org/lenny/tesseract-ocr
> > Links for tesseract-ocr >
[tesseract_2.03.orig.tar.gz], [tesseract_2.03-2.diff.gz],
but I didn't use the patch. ( No patching for dawg.cpp, it seems.)
Why not try again, from scratch, in /home/<username>/,
with a clean 'tesseract-2.03' ?
.....
( Libs used : zlib-devel, libjpeg-devel, libpng-devel )
.....
You were right, every single patch I applied, including the one recommended on the Release Notes page of Tesseract to replace unicharset_extractor.cpp) was the culprit of a compile error.
So now - after removing the patches - it compiles successfully, but - for the lack of the patches - it has all the bugs that made it e.g. to segfault when generating dawg files (wordlist2dawg segmentation fault when processing a wordlist during training a new language).
So, I am back to the beginning of this day
Code:
Building DAWG from word list in file, 'magyar.gyakori.txt'
Compacting the DAWG
Szegmens hiba
No way to get closer to Hungarian language support
So, the patch only creates a debian sub-directory under the tesseract-2.03 directory, putting there some files, too, but no other files are patched in the source tree. I checked that with diff. And the "segmentation error" is still there of course
I also tried to apply the patch to the sources in tesseract_2.03.orig.tar.gz downloaded from packages.debian.org, but with no success.
I did the same, and I just cannot imagine why the .cpp files mentioned in the .diff file remain unpatched.
When you run the patch, is a debian directory under tesseract-03 created?
If so, then I apply the patch at the right place.
Anyway, I think the patch does not concern dawg file creation, so it would be of no use for me.
Some Polish guys reported the same problem as me, and the developers answered that the bug will be addressed in the 3.x versions of tesseract. Since there is still no Polish support, I suppose they are waiting for that version, too.
I am really curious, how the most recent language support files were created (fra, deu, etc.), given the fact that this function of tesseract is broken. Did they do it with the Windows version? It would be a shame. Or with an older version of tesseract? Anyway, it is a shame to let this bug preventing to create new language support files unpatched until the 3.x versions.
If that function is really broken, I think the users should deserve a warning to avoid wasting their days on creation of training files that cannot be used at the end.
But what can we expect from developers, who issue a release that simply does not work? (The 2.02 version was such.) OCR support in linux is really slowly advancing, and I do not see any big change - compared to other areas - with the emerge of tesseract, either
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.