LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-08-2022, 07:44 AM   #1
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Rep: Reputation: 35
Question [GhostScript] Any way to shrink scanned page further?


Hello,

This is a question for GhostScript experts.

Is it possible to seriously shrink an 800-page PDF of a scanned book + OCRed text layer from its original ~100MB? FWIW, the whole book is in black and white.

The following test on a single page only reduces the size by ~15% (100kB → 85kB):
Code:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=single.output.pdf single.input.pdf
Thank you.

Last edited by littlebigman; 02-08-2022 at 08:22 AM.
 
Old 02-08-2022, 01:32 PM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,286

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
I've done a bit of that in my time, and the thing leaping out at me is: Get out of the pdf format if you possibly can.

Furthermore, if you have and need an OCR'ed layer for text, I presume it's because the book isn't very readable in it's present format. Now I suppose you were up to at least 400dpi for OCR, but that work is done, so why not reduce the dpi of the image now? They don't need such a great image of the book if you have text overlaid. You seem good with gs, but I would use console tools - ffmpeg or something.

Last edited by business_kid; 02-08-2022 at 01:34 PM.
 
Old 02-08-2022, 02:59 PM   #3
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
The text layer is to make the text copy/pastable.

I guess there's no way to make it much smaller with no access to the source pictures.

Thank you.
 
Old 02-09-2022, 04:36 AM   #4
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,286

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
You should be able to copy a pdf with no text layer, afaik. PDFs also reduce the resolution of pictures automatically in making the pdf. If you put a 1200dpi picture into a pdf, the resolution will be reduced.If you go back to your gs man page, you can probably set how much it's reduced by. Make sure to keep a high resolution copy but the sizes I've seen are 600dpi(highest) and 300dpi (normal). You could probably set your chosen resolution.
 
Old 02-09-2022, 08:50 AM   #5
littlebigman
Member
 
Registered: Aug 2008
Location: France
Posts: 658

Original Poster
Rep: Reputation: 35
Thanks.

I have another idea: Since there's a text layer that was created by the OCR, can I just remove the "picture" layer, and see if the text is in a readable shape? That would make for a much smaller PDF.

---
Edit: Easy enough :-p

Code:
gs -sDEVICE=txtwrite -o output.txt input.pdf
As expected, it's raw text, so some post-editing is required to get something that looks close to the original.

Last edited by littlebigman; 02-09-2022 at 08:55 AM.
 
Old 02-09-2022, 09:31 AM   #6
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,137
Blog Entries: 6

Rep: Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826Reputation: 1826
Let me see, cert.pdf is already a small 17 page text .pdf, no images.
Code:
du -c cert.pdf
212     cert.pdf
212     total

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -sOutputFile=Out.pdf cert.pdf
...
Page 13
Page 14
Page 15
Page 16
Page 17

du -c Out.pdf
240     Out.pdf
240     total

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
240     Out.pdf
240     total

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
244     Out.pdf
244     total

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
240     Out.pdf
240     total
So for that .pdf, which is already tight and small, that's about as good as you can get with -dPDFSETTINGS

Make the page size smaller.
Code:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sPAPERSIZE=a6 -dFIXEDMEDIA -dPDFFitPage -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
244     Out.pdf
244     total
Change the dpi
Code:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -r200 -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
236     Out.pdf
236     total
Ok Lets make a poor quality pdf with a small page size
Code:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -r100 -sPAPERSIZE=a10 -dFIXEDMEDIA -dPDFFitPage -sOutputFile=Out.pdf cert.pdf

du -c Out.pdf
224     Out.pdf
224     total
It was already a real small pdf to start with. Probably get more reduction in size with a busy pdf.

Last edited by teckk; 02-09-2022 at 09:32 AM.
 
  


Reply

Tags
pdf



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
xsane reduces size of image of scanned page, how to get back to full size Jijimon Linux - Software 6 05-02-2019 03:54 PM
[SOLVED] xsane reduces size of image of scanned page - how to get back to full size? taylorkh Linux - Software 7 11-12-2010 04:37 PM
Ghostscript and oowriter stretched page on screen berk0081 Linux - Software 1 03-21-2005 10:54 PM
View more than 25 threads on the "view new posts" page lfslinux LQ Suggestions & Feedback 2 02-02-2002 12:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 03:04 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration