LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-26-2012, 08:00 PM   #1
LAPIII
Member
 
Registered: Mar 2009
Location: Virginia, US
Distribution: Ubuntu 10.10 & Debian 6.0.3,
Posts: 350

Rep: Reputation: 7
Is there a way to extract pictures from image files?


I want to extract text from image files that also hae pictures. I rather not manually revmoe the pictures using an image editor, I want to automate this to avoid that.

Last edited by LAPIII; 01-27-2012 at 11:15 AM.
 
Old 01-26-2012, 09:59 PM   #2
MS3FGX
LQ Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 361Reputation: 361Reputation: 361Reputation: 361
An image file that has a picture? Isn't that the same thing? I'm not sure I follow you here.
 
Old 01-26-2012, 10:51 PM   #3
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
I'm with MS3FGX...

What file format are you trying to work with?

When you say text in the file, do you mean metadata?

Multiple images in one file with text... sounds like a PDF or a web page... maybe a tiff? All of those have different approaches to extract the components.

So, let us know what file format you're using, and we can probably point you in the right direction.
 
Old 01-27-2012, 11:13 AM   #4
LAPIII
Member
 
Registered: Mar 2009
Location: Virginia, US
Distribution: Ubuntu 10.10 & Debian 6.0.3,
Posts: 350

Original Poster
Rep: Reputation: 7
Quote:
Originally Posted by Dark_Helmet View Post
I'm with MS3FGX...
What file format are you trying to work with?
Tiff's and PDF's.
Quote:
Originally Posted by Dark_Helmet View Post
When you say text in the file, do you mean metadata?
No

From Tiff's and PDF's, I can finalized these two texts with Tesseract OCR. I just want to know if there's an easier way to extract the pictures from the images, so that has Tesseract can do its job better.

-EDIT-

I'm reading, from Understanding the PDF File format – images, that:

Quote:
Images are not stored inside a PDF file as Tiff or PNG or JPG images. They are stored as the binary pixel data along with the Colorspace used by that data.
I've also read that ImageMagick has PDF handling that will extract images, but like most PDF image extractors, it probably goes by btmp, png, jpeg, etc.

There is a command line utility called pdfimages and, as with IM, I guess the same.

Last edited by LAPIII; 01-27-2012 at 12:02 PM. Reason: Updated info
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Image Magick - Resize pictures in all subdirectories McDuck Linux - Newbie 4 04-26-2010 04:29 PM
how to extract the files from a vfat .img image file, without mounting zoombee Linux - General 13 11-23-2008 05:39 AM
Gnome pictures screensaver changing image size ziggy25 Linux - Software 2 02-23-2008 01:07 PM
HOWto extract embedded pictures from evolution email? WildDrake! Linux - Software 3 11-10-2007 05:04 AM
Looking to extract image properties in C++ plisken Programming 1 11-24-2005 08:22 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration