LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-24-2014, 07:45 PM   #1
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
imagemagick or something that will leave only highest width characters in image


Let's say I've got an image with letters in it (captcha), is there a function to specifically leave only the highest width characters (the ones in the foreground) ? Maybe something to eat away at the edges of non-white (usually black) characters and thus leave only the highest width characters ?

I've tried other things like convolve blur and sharpen, but these are unreliable. No java programs or proprietary programs please. I'm just testing and trying to learn about captcha.
 
Old 10-25-2014, 06:14 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
sorry, i don't understand.
can you give some examples? convert command, before and after images?
 
Old 10-25-2014, 11:22 AM   #3
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Original Poster
Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Well, I was looking at one example in particular and I actually found something that works well. I was looking at this example because I'm actually looking into image filtering that will help me clear up images before using an OCR like tesseract. I figure if this will work on captchas the same method tweaked a bit should work on documents too. I have attached an example that I made using GIMP and here is the imagemagick filter and tesseract command.
Code:
convert owntest.jpg -convolve 0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0 -level 0,30% -unsharp 30x900 t2.jpg
tesseract -psm 7 t2.jpg t2 config
tesseract -psm 8 t2.jpg t2 config
The config file has
Code:
load_system_dawg F
load_freq_dawg F
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
for captchas at least, for documents this can be fixed or not used.

The output is
Code:
 D O J C Y
Attached Thumbnails
Click image for larger version

Name:	owntest.jpg
Views:	15
Size:	6.8 KB
ID:	16757   Click image for larger version

Name:	t2.jpg
Views:	8
Size:	5.8 KB
ID:	16758  

Last edited by metaschima; 10-25-2014 at 11:23 AM.
 
Old 10-25-2014, 03:22 PM   #4
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
i think i understand what you mean with "highest width" - you want to graphically seperate and isolate the actual letters.
fwiw, i tried your convert command with owntest.jpg from above, but the result is very different.
Code:
convert --version
Version: ImageMagick 6.8.9-8 Q16 x86_64 2014-10-09 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC HDRI Modules OpenCL OpenMP
Delegates: bzlib cairo fontconfig freetype gslib jng jp2 jpeg lcms lqr ltdl lzma pangocairo png ps rsvg tiff webp wmf x xml zlib
Attached Images
 
 
Old 10-25-2014, 05:28 PM   #5
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Original Poster
Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Hmm, that's not good, I was hoping this would be portable at least within imagemagick. I have version
Code:
bash-4.2$ convert --version
Version: ImageMagick 6.8.6-10 2013-09-18 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2013 ImageMagick Studio LLC
Features: DPC
Delegates: bzlib cairo djvu fftw fontconfig freetype jng jp2 jpeg lcms lzma openexr pango pangocairo png png rsvg tiff x xml zlib
The command is correct, I checked it.
 
Old 10-27-2014, 01:35 PM   #6
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
my bad, i used the wrong file - i used the thumbnail instead of the actual image.
however, that shows that the command works differently with different resolutions.

attaching the proper result.
Attached Thumbnails
Click image for larger version

Name:	t2.jpg
Views:	15
Size:	5.5 KB
ID:	16774  
 
Old 10-27-2014, 03:00 PM   #7
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Original Poster
Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Yeah, it's not too reliable, but can be tweaked for similar inputs. It's really no good for captcha, but may be of some use for document OCR.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
imageMagick line width RudraB Linux - Software 0 03-21-2011 12:52 AM
[SOLVED] Search for directories with the highest characters in their names Kenny_Strawn Programming 2 08-29-2010 05:50 PM
ImageMagick-Convert, resize width, keep aspect cadj Linux - Software 2 06-18-2010 05:10 PM
imagemagick problem: height-and-width not deterministic when I crop an png file. centguy Linux - Software 2 09-09-2009 05:31 AM
Trying to get width and hieght of screen in characters Cynagen Linux - General 1 07-28-2006 08:54 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration