LinuxQuestions.org - rasterize pdf made up of multiple bitmap slices / page

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - rasterize pdf made up of multiple bitmap slices / page (https://www.linuxquestions.org/questions/linux-software-2/rasterize-pdf-made-up-of-multiple-bitmap-slices-page-776748/)

rasterize pdf made up of multiple bitmap slices / page

Hi,

I have a PDF file made up of 4 vertically-stacked bitmap images per page (output from a scan job). Running "pdfimages" on it produces 4 separate images for each page.

I want to produce a single bitmap image for each page, WITHOUT resampling the embedded bitmaps.

ImageMagick's "convert" is not a solution, because it re-samples (resizes) the embedded bitmaps. By default it resamples at 72dpi. The dpi can be changed with the '-density' option, but the "true" dpi is a fractional value (205.72685432...) and so the resolution would always be inexact.

Is there a way to just preserve the resolution of the embedded bitmaps when rendering each page? Both 'convert' and ghostscript annoyingly ignore that intrinsic resolution of the bitmaps and default to 72 dpi.

It seems like such a simple operation, equivalent to displaying each page at 100% in xpdf or acroread. But I don't see any command-line tools to do it.

Hi

Why not use pdfimages to pull out the 4 images, and then imagemagick to combine them? If the PDF files always have the same sizes, it shouldn't be very difficult to write a little script to do the job.

For information about doing this with imagemagick:

http://www.imagemagick.org/Usage/montage/

Maybe play with the montage command until you find one that will do the job? And then put it in a little script.

*sigh*

I ended up doing something like that... However it seems that the slices are not necessarily vertically contiguous (i.e. where there was a vertical gap on the page, the scanning software skipped it). Therefore, after assembling the slices, I obtained pages of varying height.

The fact that pdfimages doesn't report which images came from which page was a big nuisance (the original PDF had some pages with 4 slices and some with a single slice).

At the very least I wish pdfimages could report the location (page, x and y) of the various images it pulls out of a PDF, instead of dumping a heap of sequentially-numbered image files. It seems like such a silly omission. The other thing that I found annoying is imagemagick's propensity to silently resample images.

Oh well, I guess we're lucky to have even these tools...