LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-01-2018, 03:01 AM   #1
joboy
Member
 
Registered: Jul 2009
Distribution: Debian, Ubuntu, Puppy, Mint
Posts: 655

Rep: Reputation: 7
Reliable OCR software


Hi there,

I am looking for a OCR app to covert PDF or graphic to text, by reliable I mean it will not crash or hang easily, I tried LIOS it worked but took forever to read a page, and sometimes it just seems loop reading never finish, so I want another one that actually work, regardless of accuracy of the recognition, any idea ?
 
Old 03-01-2018, 10:54 AM   #2
DavidMcCann
LQ Veteran
 
Registered: Jul 2006
Location: London
Distribution: PCLinuxOS, Debian
Posts: 6,140

Rep: Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314Reputation: 2314
Have a look here
http://linuxappfinder.com/graphics/ocr
I've used Tesseract successfully.
 
Old 03-01-2018, 11:51 AM   #3
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by DavidMcCann View Post
Have a look here
http://linuxappfinder.com/graphics/ocr
I've used Tesseract successfully.
Agreed; I've had good luck with tesseract as well. But "reliable" and "OCR" don't really belong in the same sentence.
 
1 members found this post helpful.
Old 03-01-2018, 01:40 PM   #4
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,980

Rep: Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624Reputation: 3624
The best I have found is in business copier machines. If you want to use this for a limited tries then maybe go to a copier place.

I've never tried the online sites.

Any linux program you use will require you to check it's results.

Last edited by jefro; 03-02-2018 at 01:16 PM.
 
Old 03-01-2018, 11:22 PM   #5
joboy
Member
 
Registered: Jul 2009
Distribution: Debian, Ubuntu, Puppy, Mint
Posts: 655

Original Poster
Rep: Reputation: 7
Thanks for the tip I'll check that out,
as long as it does not crash or hang it should solve my problem.

Quote:
Originally Posted by DavidMcCann View Post
Have a look here
http://linuxappfinder.com/graphics/ocr
I've used Tesseract successfully.
 
Old 03-02-2018, 07:09 AM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940Reputation: 3940
I agree that OCR usually does not work very well.

However, if you have an un-encrypted PDF, there might be tools which can extract the text out of that file. (Essentially, PDF is a page-description language which tells the printer or display how to "draw" the text, and it customarily does so through actual (Unicode) text with reference to fonts that may or may not be embedded in the PDF. You wouldn't have to OCR it, because the text is actually in there ... as text.

(But PDFs can also contain raster images of pages – bitmaps.)

Last edited by sundialsvcs; 03-02-2018 at 07:10 AM.
 
Old 03-02-2018, 09:34 AM   #7
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by sundialsvcs View Post
I agree that OCR usually does not work very well.

However, if you have an un-encrypted PDF, there might be tools which can extract the text out of that file. (Essentially, PDF is a page-description language which tells the printer or display how to "draw" the text, and it customarily does so through actual (Unicode) text with reference to fonts that may or may not be embedded in the PDF. You wouldn't have to OCR it, because the text is actually in there ... as text.

(But PDFs can also contain raster images of pages – bitmaps.)
Indeed, but THAT can be a thorny issue too. Multi-column text is....interesting...to try to deal with. The software doesn't delineate between a space, or a column-break, in most cases. You *CAN* monkey with things, but your mileage may vary.
 
Old 03-02-2018, 11:35 PM   #8
joboy
Member
 
Registered: Jul 2009
Distribution: Debian, Ubuntu, Puppy, Mint
Posts: 655

Original Poster
Rep: Reputation: 7
If there is any text embedded in the PDF I would not have asked that question, it is purely graphics.

Quote:
Originally Posted by sundialsvcs View Post
I agree that OCR usually does not work very well.

However, if you have an un-encrypted PDF, there might be tools which can extract the text out of that file. (Essentially, PDF is a page-description language which tells the printer or display how to "draw" the text, and it customarily does so through actual (Unicode) text with reference to fonts that may or may not be embedded in the PDF. You wouldn't have to OCR it, because the text is actually in there ... as text.

(But PDFs can also contain raster images of pages – bitmaps.)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
OCR software biosboy4 Linux - Software 6 06-15-2016 05:58 PM
any good OCR software out there baronobeefdip Linux - Software 7 04-08-2011 04:43 PM
I need OCR software. damgar Linux - Software 10 09-30-2010 03:56 PM
Looking for a OCR software ufmale Linux - Software 1 10-13-2009 10:51 PM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 08:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration