LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-17-2014, 01:39 PM   #1
bscho
Member
 
Registered: Nov 2012
Location: London
Distribution: Mint 20, Kali, Peppermint, Ubuntu, MakuluFlash, Fedora 32, Windows 12 Lite, MakuluLinux
Posts: 821

Rep: Reputation: 28
What software for scanning OCR


Hi,
I have Linux Mint 17 and had my PC stolen with all my valuable writings.
I have saved a lot of my writings on A4 sheets.

I want to scan in my pages and recreate my writings.
I am told that I can get Optical Character Software (OCR) to do this.

What if any software is available for scanning in my A4 pages?
 
Old 10-17-2014, 03:34 PM   #2
jefro
Moderator
 
Registered: Mar 2008
Posts: 22,130

Rep: Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639Reputation: 3639
Yes and no. There are about 4 good OCR apps. One is commercial and may be best. The free ones are pretty crummy. I gave up and just used a business copy machine that seemed to be almost perfect.

tesseract was promised by google but seemed to stop.

ABBYY seems to be tested with best results.

A number of other apps around.

https://help.ubuntu.com/community/OCR
 
Old 10-17-2014, 04:58 PM   #3
c0d3d
Member
 
Registered: Aug 2012
Posts: 74

Rep: Reputation: 12
Try Tesseract. It should be able to read your files.

@jefro: Tesseract is still being developed. The last code change was on October 14, 2014.

Last edited by c0d3d; 10-17-2014 at 05:04 PM.
 
Old 10-17-2014, 05:57 PM   #4
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Tesseract is the best FLOSS one. I recommend doing some preprocessing of the images before feeding them in, or use a program that does that. Try to get them black text on pure white background.
 
Old 10-17-2014, 06:02 PM   #5
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,639

Rep: Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653
I have used "Tesseract" in the past
for almost all normal fonts it has no problems

handwriting ?????
if it is hand wrote "block" text that will work well ( mostly)
 
Old 10-18-2014, 04:28 PM   #6
bscho
Member
 
Registered: Nov 2012
Location: London
Distribution: Mint 20, Kali, Peppermint, Ubuntu, MakuluFlash, Fedora 32, Windows 12 Lite, MakuluLinux
Posts: 821

Original Poster
Rep: Reputation: 28
How do I install Tesseract

Quote:
Originally Posted by John VV View Post
I have used "Tesseract" in the past
for almost all normal fonts it has no problems

handwriting ?????
if it is hand wrote "block" text that will work well ( mostly)
It seems that Tesseract is popular I tried to download it using apt-get
but it does not find it when I try to start it as sudo apt-get install Tesseract.

Is this not available with apt-get?
 
Old 10-18-2014, 05:29 PM   #7
c0d3d
Member
 
Registered: Aug 2012
Posts: 74

Rep: Reputation: 12
@bscho: The command should be "sudo apt-get install tesseract-ocr"
 
Old 10-18-2014, 05:31 PM   #8
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,639

Rep: Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653Reputation: 2653
do not know about ubuntu / mint
but on suse
Code:
su -
zypper in tesseract
just do a search
" zypper se tesseract"
displays a TON of downloads including different languages


from
https://help.ubuntu.com/community/AptGet/Howto
Code:
apt-get update
apt-cache search tesseract
 
Old 10-18-2014, 06:01 PM   #9
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
Code:
sudo install tesseract-ocr
67 languaeges too
Code:
tesseract-ocr-afr - tesseract-ocr language files for Afrikaans
tesseract-ocr-ara - tesseract-ocr language files for Arabic
tesseract-ocr-aze - tesseract-ocr language files for Azerbaijani
tesseract-ocr-bel - tesseract-ocr language files for Belarusian
tesseract-ocr-ben - tesseract-ocr language files for Bengali
tesseract-ocr-bul - tesseract-ocr language files for Bulgarian
tesseract-ocr-cat - tesseract-ocr language files for Catalan
tesseract-ocr-ces - tesseract-ocr language files for Czech
tesseract-ocr-chi-sim - tesseract-ocr language files for Simplified Chinese
tesseract-ocr-chi-tra - tesseract-ocr language files for Traditional Chinese
tesseract-ocr-chr - tesseract-ocr language files for Cherokee
tesseract-ocr-dan - tesseract-ocr language files for Danish
tesseract-ocr-deu - tesseract-ocr language files for German
tesseract-ocr-deu-frak - tesseract-ocr language files for German Fraktur
tesseract-ocr-dev - transitional dummy package
tesseract-ocr-ell - tesseract-ocr language files for Greek
tesseract-ocr-eng - tesseract-ocr language files for English
tesseract-ocr-enm - tesseract-ocr language files for Middle English
tesseract-ocr-epo - tesseract-ocr language files for Esperanto
tesseract-ocr-equ - tesseract-ocr language files for equations
tesseract-ocr-est - tesseract-ocr language files for Estonian
tesseract-ocr-eus - tesseract-ocr language files for Basque
tesseract-ocr-fin - tesseract-ocr language files for Finnish
tesseract-ocr-fra - tesseract-ocr language files for French
tesseract-ocr-frk - tesseract-ocr language files for Frankish
tesseract-ocr-frm - tesseract-ocr language files for Middle French
tesseract-ocr-glg - tesseract-ocr language files for Galician
tesseract-ocr-grc - tesseract-ocr language files for ancient Greek
tesseract-ocr-heb - tesseract-ocr language files for Hebrew
tesseract-ocr-hin - tesseract-ocr language files for Hindi
tesseract-ocr-hrv - tesseract-ocr language files for Croatian
tesseract-ocr-hun - tesseract-ocr language files for Hungarian
tesseract-ocr-ind - tesseract-ocr language files for Indonesian
tesseract-ocr-isl - tesseract-ocr language files for Icelandic
tesseract-ocr-ita - tesseract-ocr language files for Italian
tesseract-ocr-ita-old - tesseract-ocr language files for Old Italian
tesseract-ocr-jpn - tesseract-ocr language files for Japanese
tesseract-ocr-kan - tesseract-ocr language files for Kannada
tesseract-ocr-kor - tesseract-ocr language files for Korean
tesseract-ocr-lav - tesseract-ocr language files for Latvian
tesseract-ocr-lit - tesseract-ocr language files for Lithuanian
tesseract-ocr-mal - tesseract-ocr language files for Malayalam
tesseract-ocr-mkd - tesseract-ocr language files for Macedonian
tesseract-ocr-mlt - tesseract-ocr language files for Maltese
tesseract-ocr-msa - tesseract-ocr language files for Malay
tesseract-ocr-nld - tesseract-ocr language files for Dutch
tesseract-ocr-nor - tesseract-ocr language files for Norwegian
tesseract-ocr-osd - tesseract-ocr language files for script and orientation
tesseract-ocr-pol - tesseract-ocr language files for Polish
tesseract-ocr-por - tesseract-ocr language files for Portuguese
tesseract-ocr-ron - tesseract-ocr language files for Romanain
tesseract-ocr-rus - tesseract-ocr language files for Russian
tesseract-ocr-slk - tesseract-ocr language files for Slovak
tesseract-ocr-slk-frak - tesseract-ocr language files for Slovak Fractur
tesseract-ocr-slv - tesseract-ocr language files for Slovenian
tesseract-ocr-spa - tesseract-ocr language files for Spanish
tesseract-ocr-spa-old - tesseract-ocr language files for Old Spanish
tesseract-ocr-sqi - tesseract-ocr language files for Albanian
tesseract-ocr-srp - tesseract-ocr language files for Serbian
tesseract-ocr-swa - tesseract-ocr language files for Swahili
tesseract-ocr-swe - tesseract-ocr language files for Swedish
tesseract-ocr-tam - tesseract-ocr language files for Tamil
tesseract-ocr-tel - tesseract-ocr language files for Telugu
tesseract-ocr-tgl - tesseract-ocr language files for Tagalog
tesseract-ocr-tha - tesseract-ocr language files for Thai
tesseract-ocr-tur - tesseract-ocr language files for Turkish
tesseract-ocr-ukr - tesseract-ocr language files for Ukranian
tesseract-ocr-vie - tesseract-ocr language files for Vietnamese
 
Old 10-18-2014, 07:43 PM   #10
c0d3d
Member
 
Registered: Aug 2012
Posts: 74

Rep: Reputation: 12
You probably only want the english language files for tesseract (tesseract-ocr-eng).
 
Old 10-19-2014, 07:07 AM   #11
bscho
Member
 
Registered: Nov 2012
Location: London
Distribution: Mint 20, Kali, Peppermint, Ubuntu, MakuluFlash, Fedora 32, Windows 12 Lite, MakuluLinux
Posts: 821

Original Poster
Rep: Reputation: 28
How do I install Tesseract

Quote:
Originally Posted by c0d3d View Post
@bscho: The command should be "sudo apt-get install tesseract-ocr"
I have tried that and the terminal says:
tesseract-ocr is already the newest version
tesseract-ocr set to manually installed.

It recognizes the program but doesn't download.
Any suggestions?
 
Old 10-19-2014, 01:02 PM   #12
c0d3d
Member
 
Registered: Aug 2012
Posts: 74

Rep: Reputation: 12
It means that it is already downloaded. Are you sure you downloaded the English language files ("apt-get install tesseract-ocr-eng")?

Keep in mind that this software has no GUI. It is terminal-based only (you might be able to get a custom GUI for it from here if you really want one).
 
Old 10-19-2014, 02:33 PM   #13
bscho
Member
 
Registered: Nov 2012
Location: London
Distribution: Mint 20, Kali, Peppermint, Ubuntu, MakuluFlash, Fedora 32, Windows 12 Lite, MakuluLinux
Posts: 821

Original Poster
Rep: Reputation: 28
GUI

Quote:
Originally Posted by c0d3d View Post
It means that it is already downloaded. Are you sure you downloaded the English language files ("apt-get install tesseract-ocr-eng")?

Keep in mind that this software has no GUI. It is terminal-based only (you might be able to get a custom GUI for it from here if you really want one).
No thanks I did not know it had no gui. You are right it is loaded and gives 10 command options.
I found that its output is in English.

Can you tell me if there is a help file for it somewhere?
 
Old 10-19-2014, 02:46 PM   #14
c0d3d
Member
 
Registered: Aug 2012
Posts: 74

Rep: Reputation: 12
"man tesseract-ocr" should give a help file.
 
Old 10-19-2014, 03:41 PM   #15
bscho
Member
 
Registered: Nov 2012
Location: London
Distribution: Mint 20, Kali, Peppermint, Ubuntu, MakuluFlash, Fedora 32, Windows 12 Lite, MakuluLinux
Posts: 821

Original Poster
Rep: Reputation: 28
ocr software

Quote:
Originally Posted by c0d3d View Post
"man tesseract-ocr" should give a help file.
Thanks for your prompt reply. I have tried that and I get
No manual entry for tesseract-ocr.

Any other way of getting the manual?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
OCR Software and Slackware Woodsman Slackware 8 11-17-2012 09:16 PM
any good OCR software out there baronobeefdip Linux - Software 7 04-08-2011 04:43 PM
I need OCR software. damgar Linux - Software 10 09-30-2010 03:56 PM
Looking for a OCR software ufmale Linux - Software 1 10-13-2009 10:51 PM
OCR initialization failed accessing OCR device: PROC-26 cheeku Linux - Software 0 09-19-2004 08:36 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration