Solved: Question about ImageMagick's convert utility and high quality output
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Solved: Question about ImageMagick's convert utility and high quality output
Hello,
I am trying to split a PDF into component pages that are of equal quality to the original.
If I use display mypdf.pdf, then the PDF is split into pages of acceptable quality. The problem here is that I have to save each page individually.
If, however, I use convert mypdf.pdf mypdf.bmp, I get the individual pages of the PDF in .BMP format (which is fine, but not exactly what I want), but the quality is substantially less than the original.
I've tried dozens of combinations of commands to try to increase this quality, but to no avail.
Even if I do convert mypdf.pdf mypdfagain.pdf, there is a big loss of quality.
Anyone familiar with splitting a PDF into individual pages without suffering a loss in quality?
Ideally, I would just save all the "scenes/frames" from display, but that feature unfortunately does not exist (though I may endeavor myself to add it if no formal solution exists).
NOTE: I think part of my problem might be: by using identify mypdf.pdf I can see that the resolution is specified, and when I convert it the resolution is much lower. This could be a source of quality loss, but I'm not familiar enough with image conversion to say that for sure.
Solution----------------
Oh, might help to read the man-page all the way through.
display -write outfile.pdf infile.pdf
It will do an entire book at once.
Whatever this command does, it removes the extra layer or whatever it is that prevents OCR from succeeding. I'd really like to understand that technology.. What is it about a PDF that allows an individual to embed some meta-data into every page of the PDF so that the only thing seen, say, through OCR, or a text search function, is the embedded text?
Last edited by Dogs; 06-29-2011 at 12:59 AM.
Reason: SOLVED
I bought an ebook with DRM software required to use it. I have found a way to get around the DRM software, but the quality issue prevents me from satisfactorily using OCR software to make image into text.
My current point is: I have a free PDF that is of high quality, but I am unable to OCR the PDF directly because of some kind of layering mechanism...
This, as far as I can tell, layer is the only thing the OCR software is able to "see", and the only thing on this layer is an embedded e-mail address: Thus, OCR gives me pages upon pages that contain only an e-mail address, when what I'm looking at is clearly pages in the book I purchased (which conveniently left out the part about DRM until AFTER the purchase. It is only available from the publisher anyway, so it's not like I have a choice if I want an ebook)...
However, if I split the PDF into pages and/or flatten it and/or convert it to image files, then I can OCR that just fine if the quality is sufficient.
What's cool is: If I open the PDF in the ghostscript viewer, I can save individual pages as excellent copies with the layering mechanism mitigated. Now just to figure out how to automatically split 675 pages...
the gs command provided by Mr. Smoker seems to be just what the doctor ordered, however, I haven't had time to figure out which device to use if pdfwrite isn't available.
What's cool is: If I open the PDF in the ghostscript viewer, I can save individual pages as excellent copies with the layering mechanism mitigated. Now just to figure out how to automatically split 675 pages...
the gs command provided by Mr. Smoker seems to be just what the doctor ordered, however, I haven't had time to figure out which device to use if pdfwrite isn't available.
Just write a script for it that will extract all those pages.
the gs command provided by Mr. Smoker seems to be just what the doctor ordered, however, I haven't had time to figure out which device to use if pdfwrite isn't available.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.