Is there a way to extract pictures from image files?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Is there a way to extract pictures from image files?
I want to extract text from image files that also hae pictures. I rather not manually revmoe the pictures using an image editor, I want to automate this to avoid that.
When you say text in the file, do you mean metadata?
Multiple images in one file with text... sounds like a PDF or a web page... maybe a tiff? All of those have different approaches to extract the components.
So, let us know what file format you're using, and we can probably point you in the right direction.
I'm with MS3FGX...
What file format are you trying to work with?
Tiff's and PDF's.
Quote:
Originally Posted by Dark_Helmet
When you say text in the file, do you mean metadata?
No
From Tiff's and PDF's, I can finalized these two texts with Tesseract OCR. I just want to know if there's an easier way to extract the pictures from the images, so that has Tesseract can do its job better.
Images are not stored inside a PDF file as Tiff or PNG or JPG images. They are stored as the binary pixel data along with the Colorspace used by that data.
I've also read that ImageMagick has PDF handling that will extract images, but like most PDF image extractors, it probably goes by btmp, png, jpeg, etc.
There is a command line utility called pdfimages and, as with IM, I guess the same.
Last edited by LAPIII; 01-27-2012 at 12:02 PM.
Reason: Updated info
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.