LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   PDF to XML. But how do I convert xmlpdf back to another pdf? (https://www.linuxquestions.org/questions/linux-software-2/pdf-to-xml-but-how-do-i-convert-xmlpdf-back-to-another-pdf-4175605032/)

dedec0 05-01-2017 03:15 PM

PDF to XML. But how do I convert xmlpdf back to another pdf?
 
Today I have discovered PDFEdit. Great program! It allows me to fix (or change, if you prefer) some PDFs I have around.

It also has the menu "Tools -> Pdf to xml", which makes a XML from a PDF!

Great! With this tool and a bit of skimming in the generated file, I saw that I can (probably and easily) make a shell script (or, easier than that, a :%s command in Vim) to remove an unwanted signature on every page of a PDF with 250+ pages.

But then I would need to convert the new XML back to PDF. Is there a way to do this? PDFEdit does not seem to do that - or I have missed it!

An edited XML of one converted page is:

Code:

<?xml version='1.0' encoding='utf-8'?>
<!-- jmisutka -->

<xmlpdf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://jm.ignac.org/pdfedit/schema/pdftoxml.xml">
<page number="1" >
<column bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
        <line bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
                <word bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
                <!-- there was something here. I removed -->
                </word>
        </line>
</column>
</page>
</xmlpdf>


Didier Spaier 05-01-2017 03:42 PM

Here (Slackware) the xmlto application is shipped in the linuxdoc-tools package.

Else to get a package see here:
https://admin.fedoraproject.org/pkgd...ge/rpms/xmlto/

dedec0 05-01-2017 03:55 PM

Quote:

Originally Posted by Didier Spaier (Post 5704801)
Here (Slackware) the xmlto application is shipped in the linuxdoc-tools package.

Thank you for this quick reply. I am on an old Ubuntu. It has the package xmlto (described as "XML-to-any converter") that says:

Quote:

XML-to-any converter

xmlto is a front-end to an XSL toolchain. It chooses an appropriate
stylesheet for the conversion you want and applies it using an external
XSLT processor (currently, only xsltproc is supported). It also performs
any necessary post-processing.

It supports converting from DocBook XML to DVI, XSL-FO, HTML (multiple
pages), HTML (one page), man page, PDF, PostScript and plain text. It
supports converting from XSL-FO to DVI, PDF and PostScript.

DVI output requires dblatex or PassiveTeX. Other formats can be produced
with any of the supported toolchains - dblatex, PassiveTeX or
docbook-xsl/fop (but may require some extensions).
Will it work for that? The generated XML I showed here is DocBook XML? /-: It does not look like.

Didier Spaier 05-01-2017 04:32 PM

The answer is: try.

Unfortunately for me this is a dead as Slackware doesn't ship pdfxmltex that seems to be needed in this case.

dedec0 05-01-2017 04:51 PM

In debianish distros, the package that contains pdfxmltex is xmltex:

Quote:

Originally Posted by xmltex
XMLTeX is a "non-validating, namespace-aware XML parser" written TeX.
It allows TeX directly process XML files.

This package also contains the extension PassiveTeX, see
http://www.tei-c.org.uk/Software/passivetex/ for more details.


And I need to install most TeX packages too...

Didier Spaier 05-01-2017 05:09 PM

Quote:

Originally Posted by dedec0 (Post 5704831)
And I need to install most TeX packages too...

Don't let that fall on your feet, that's heavy :D

dedec0 05-01-2017 06:22 PM

Quote:

Originally Posted by Didier Spaier (Post 5704838)
Don't let that fall on your feet, that's heavy :D

hahaha... it is not too light, I checked.

I just imagined that someone would know if those would work (or not) to rebuild a converted PDF. I will not try it today.


All times are GMT -5. The time now is 07:15 PM.