-   Linux - Software (
-   -   PDF Editing and Archiving (

Toonses82 07-22-2010 12:34 PM

PDF Editing and Archiving

I'm looking for an application that will give me some advanced tools for editing PDFs. Here are the features I'm looking for:
  • Editing metadata (tagging with keywords)
  • Merge multiple PDFs
  • Rearrange page order
  • PDF bookmarking
  • Optical Character Recognition

I can give further clarification on these items if needed. My goal is to convert all of my paper files into digital files that I can store on my server. In order to effectively do this, I need the tools listed above.

Is there anything in the linux world that will give me these PDF editing abilities?

b0uncer 07-22-2010 01:45 PM

This Wikipedia page lists some common PDF software, which you may find useful. Altough you might be after graphical user interfaces, I recommend getting familiar with command line tools (e.g. pdftk) as well, because they're often more powerful and perhaps more advanced than the GUIs, which might depend on the very same utilies anyway.

Personally I think it's funny how often people want to edit PDF files, considering that (or so it seems to me) the format was designed to be a sort of a "final product", something to view and maybe fill, but not edit. Office documents, TeX files and such are for editing; PDF files ought to be the distribution copies of those. Therefore, if you're doing digital-to-digital work, I recommend obtaining the original non-PDF file, modifying it and re-exporting it as PDF. Scanning from paper is then a whole other story, and I gather you'll have some work to do if you intend to get a clean, organized result. But good luck anyway...I'd recommend first getting a good scanner and good OCR software (you may have to pay for it), those will solve the biggest problems. Things like moving pages around should then be easy.

Take a look at pdftk at first, if you haven't already.

Toonses82 07-22-2010 09:58 PM

Thanks for the recommendations.

PDFs aren't really any more of a final product than printing something to paper, and we need to do things like this to paper documents all the time. PDF stands for Portable Document Format and it is designed to be the equivalent of electronic paper.

For example, I'm in sales and I need to compile things like brochures, quotes, spec sheets (all of which are PDF) into one bookmarked document that I can send to the customer. Then when I close a deal, I need to create a sales packet that has the credit approval, lease documents, product configuration, etcetera. If I'm going to truly work in a paperless environment, I need to be able to merge and bookmark these various PDF files. I can't submit a sales packet as 10 separate files. If I can't merge PDFs, my only other alternative is to print everything out and then scan it together right back to my computer. I'm sure I don't need to tell you that printing something for the purpose of scanning it is a bit silly.

With regard to OCR and metadata tagging, these features are critical for a decent document management system. My goal is to have all of my files stored digitally on my server and most of these things will be scanned from paper (mail, invoices, etc.). I need to be able to retrieve the appropriate document on demand and tags are the only way to effectively do that.

I do need a GUI. I've worked with enterprise solutions in the past and I know I can't do the types of functions I need from command line. I'll check out your link. It's very possible this kind of thing doesn't exist for free. Small businesses pay thousands for this type of software in a Windows environment.

Guttorm 07-23-2010 05:35 AM


Slashdot just had this story:

Toonses82 07-23-2010 12:17 PM

Interesting. It's not what I'm looking for, but it's better than the options I've seen so far. I might give it a try.

All times are GMT -5. The time now is 08:29 PM.