LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   compare two pdf files (https://www.linuxquestions.org/questions/linux-software-2/compare-two-pdf-files-4175600767/)

alaios 02-28-2017 08:58 AM

compare two pdf files
 
Hi there,
what type of software do we have for 13.1 opensuse to compare two pdf files? There are some minor text differences between the two documents I would like to compare. An X environment application is really appreciated.

I would like to thank you for your reply
Regards
Alex

TB0ne 02-28-2017 09:19 AM

Quote:

Originally Posted by alaios (Post 5677112)
Hi there,
what type of software do we have for 13.1 opensuse to compare two pdf files? There are some minor text differences between the two documents I would like to compare. An X environment application is really appreciated.

You're talking about two different things here. There are many X/GUI applications for displaying PDF's (evince, okular, etc.), but they are not going to be able to compare two documents. It is more difficult still without know what KIND of PDF file; for example, one can scan a page of a book and save it as a PDF, but you will NOT be able to read the text automatically, because that's an image wrapped in a PDF container.

The best way to do this would be to use a utility called pdftotext, and convert the PDF into text, which *IS* readable/comparable. Then a tool like diff can easily scan for differences. Tools like meld or kdiff are GUI based and can compare two text files.

alaios 02-28-2017 09:26 AM

but my pdf files are pure text produced by latex

TB0ne 02-28-2017 09:35 AM

Quote:

Originally Posted by alaios (Post 5677127)
but my pdf files are pure text produced by latex

Great...so did you read what I told you above???? And if they're LATEX now, why not compare them BEFORE putting them to PDF format?

alaios 02-28-2017 09:47 AM

because if you have some small typo in latex you might not be able to spot it in the source. For me is easier to compare the ¨end" product.
Or am I wrong?

hydrurga 02-28-2017 10:18 AM

You may want to look at the package diffpdf. It's available in the Ubuntu 16.04 repos but I don't know if it is available in binary format for OpenSuse 13.1.

TB0ne 02-28-2017 10:44 AM

Quote:

Originally Posted by alaios (Post 5677144)
because if you have some small typo in latex you might not be able to spot it in the source. For me is easier to compare the ¨end" product. Or am I wrong?

A small typo will REMAIN a small typo, the format doesn't make it better/correct it. And you're asking how to spot differences in files...diff'ing source will still do that. Again, you can convert to text easily and compare those files easily.
Quote:

Originally Posted by hydrurga
You may want to look at the package diffpdf. It's available in the Ubuntu 16.04 repos but I don't know if it is available in binary format for OpenSuse 13.1.

Nice one; never heard of that before. Would save a step, but I couldn't find it for 13.1. Available for Leap 42 and Tumbleweed, but not in the factory repos.

hydrurga 02-28-2017 10:52 AM

Quote:

Originally Posted by TB0ne (Post 5677162)
Nice one; never heard of that before. Would save a step, but I couldn't find it for 13.1. Available for Leap 42 and Tumbleweed, but not in the factory repos.

13.1 (unsupported distribution) version here: https://software.opensuse.org/packag...h_term=diffpdf (at the bottom of the page).

Would be interested to hear if it's any good. Text-wise, it probably works in a very similar fashion to what you're suggesting (and what I would have done in the OP's situation) vis-à-vis first extracting the text from both pdf's, then diffing it.

shane25119 03-04-2017 12:23 AM

You could run those documents through a plagiarism check software. WCopyfind comes to mind. It does not have a Linux version, but the Windows version works fine under WINE.


All times are GMT -5. The time now is 12:48 AM.