LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-28-2017, 02:45 PM   #1
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,718
Blog Entries: 3

Rep: Reputation: 107Reputation: 107
seeking help processing PDF documents


Can someone help me make some sense out of processing PDF documents so that they all have a set of PDF details that are mostly the same? I want something that I can use from a script and then "scrub" a group of PDF files as a batch under control of 'cron' or 'incron' or somehow that you might suggest.

If you have a favorite desktop application that is wonderful at PDF file normalization, I'd like to learn about that, but I'm much more interested in script-able utilities and filters.

I get a large number of PDF documents. It seems that every application that has some sort of save as PDF option chooses to use a differnt set of parameters and options for the PDF internal details. For example,
  • is the file 'hybrid' or 'tagged'?
  • Does it 'export bookmarks' or 'export place holders'?
  • Is it 'single page' or 'continuous'?
  • ..and so on...
I know that some options have extreme effects on the meat of the document content. However, where possible I'd like to
Code:
# The following is pseudo-code
    while ( there_are_files_to_process )
    }
        select_a_file
        if ( required )
            discover( PDF_parameters }
        change ( PDF_parameters )
        export ( new_PDF_file, changed_PDF_parameters )
    {
NOTE -- please don't critique the pseudo-code. Instead, use it to understand what I'm trying to accomplish. Feel free to question what I mean.

Thanks in advance,
~~~ 0;-Dan
 
Old 06-28-2017, 06:24 PM   #2
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 20 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917Reputation: 2917
Regarding metadata, you can use exiftool to change/clean PDF metadata, mat to clean it.

Regarding content/format, have a look at ghostscript.

I'm sure that there are plenty more (pdftk and cpdf, for example), but these are the command line tools I use.
 
1 members found this post helpful.
Old 06-28-2017, 08:09 PM   #3
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,513

Rep: Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009Reputation: 1009
Poppler-utils is a PDF rendering suite that might come in handy.
 
1 members found this post helpful.
Old 06-29-2017, 03:21 PM   #4
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,718

Original Poster
Blog Entries: 3

Rep: Reputation: 107Reputation: 107

Follow-Up
I found the following description of Adobe(R) PDF Conversion Settings. In this article is a description to "Create a Custom PDF Settings File".

If someone knows a linux utility please tell me. I want a utility that will scan a PDF file and simply report the status of whatever "settings" or "options" that are present. I would use this utility to extract details and create these custom PDF settings files as a record of the original document state. I'd then 'convert' the document using my own custom PDF settings file -- the original could be used to recover the document if needed -- to my desired normalized format.


Thanks in advance,
~~~ 0;-Dan

Last edited by SaintDanBert; 06-29-2017 at 03:26 PM.
 
Old 06-29-2017, 04:15 PM   #5
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,718

Original Poster
Blog Entries: 3

Rep: Reputation: 107Reputation: 107
[/i]
More Follow-Up:
There is a utility called identify (ImageMajik(R) Tools) that will scan an image file -- JPG, PNG, etc -- and report about the internal details.
Code:
prompt$  identify  logo_Beavers.jpg

logo_Beavers.jpg JPEG 80x58 80x58+0+0 8-bit sRGB 2.48KB 0.000u 0:00.010

prompt$  identify -verbose logo_Beavers.jpg

Image: logo_Beavers.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Mime type: image/jpeg
  Class: DirectClass
  Geometry: 80x58+0+0
  Units: Undefined
  Type: TrueColor
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Pixels: 4640
    Red:
      min: 0 (0)
      max: 255 (1)
      mean: 179.76 (0.70494)
      standard deviation: 82.3175 (0.322814)
      kurtosis: -1.06973
      skewness: -0.643877
    Green:
      min: 0 (0)
      max: 255 (1)
      mean: 152.345 (0.597432)
      standard deviation: 98.2931 (0.385463)
      kurtosis: -1.60369
      skewness: -0.274345
    Blue:
      min: 0 (0)
      max: 255 (1)
      mean: 134.707 (0.528261)
      standard deviation: 105.818 (0.414973)
      kurtosis: -1.75612
      skewness: -0.116158
  Image statistics:
    Overall:
      min: 0 (0)
      max: 255 (1)
      mean: 155.604 (0.610211)
      standard deviation: 95.9777 (0.376383)
      kurtosis: -1.39202
      skewness: -0.398804
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Background color: white
  Border color: srgb(223,223,223)
  Matte color: grey74
  Transparent color: black
  Interlace: None
  Intensity: Undefined
  Compose: Over
  Page geometry: 80x58+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  Quality: 75
  Orientation: Undefined
  Properties:
    comment: CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 75

    date:create: 2017-04-20T18:18:02-05:00
    date:modify: 2011-05-25T09:48:09-05:00
    jpeg:colorspace: 2
    jpeg:sampling-factor: 2x2,1x1,1x1
    signature: b17f85a6b045243debc93945a4c6d3b855193b1ba8cb67f56143124416e5fed1
  Artifacts:
    filename: logo_Beavers.jpg
    verbose: true
  Tainted: False
  Filesize: 2.48KB
  Number pixels: 4.64K
  Pixels per second: 4.64EB
  User time: 0.000u
  Elapsed time: 0:01.000
  Version: ImageMagick 6.8.9-9 Q16 x86_64 2017-05-26 http://www.imagemagick.org
[/i]

I'm looking for a similar, command-line tool for PDF files.

Thanks in advance,
~~~ 0;-Dan
 
Old 06-29-2017, 04:23 PM   #6
SaintDanBert
Senior Member
 
Registered: Jan 2009
Location: "North Shore" Louisiana USA
Distribution: Mint-20.1 with Cinnamon
Posts: 1,718

Original Poster
Blog Entries: 3

Rep: Reputation: 107Reputation: 107
Does anyone have any experience with Origami-PDF Utility. It claims to be able to slice and dice PDF files.

Thanks in advance,
~~~ 0;-Dan
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Seeking a clever RegEx for text processing danielbmartin Programming 12 10-17-2012 11:32 AM
[SOLVED] Scanning Documents (into PDF?) wmeler Linux - Software 10 04-28-2011 06:22 PM
LXer: Linux PDF editor for manipulating PDF documents LXer Syndicated Linux News 0 12-19-2007 09:50 AM
PDF processing elfoozo Linux - Desktop 1 02-03-2007 10:51 PM
converting documents to pdf Chijtska Linux - General 5 02-05-2002 05:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration