LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-01-2017, 03:15 PM   #1
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,143

Rep: Reputation: 43
Question PDF to XML. But how do I convert xmlpdf back to another pdf?


Today I have discovered PDFEdit. Great program! It allows me to fix (or change, if you prefer) some PDFs I have around.

It also has the menu "Tools -> Pdf to xml", which makes a XML from a PDF!

Great! With this tool and a bit of skimming in the generated file, I saw that I can (probably and easily) make a shell script (or, easier than that, a :%s command in Vim) to remove an unwanted signature on every page of a PDF with 250+ pages.

But then I would need to convert the new XML back to PDF. Is there a way to do this? PDFEdit does not seem to do that - or I have missed it!

An edited XML of one converted page is:

Code:
<?xml version='1.0' encoding='utf-8'?>
<!-- jmisutka -->

<xmlpdf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://jm.ignac.org/pdfedit/schema/pdftoxml.xml">
<page number="1" >
<column bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
	<line bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
		<word bbox="xleft: 210 yleft: 786.63 xright: 293.91 yright: 777.38" >
                <!-- there was something here. I removed -->
		</word>
	</line>
</column>
</page>
</xmlpdf>

Last edited by dedec0; 05-01-2017 at 04:02 PM.
 
Old 05-01-2017, 03:42 PM   #2
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1.2 on Lenovo Thinkpad W520
Posts: 9,994

Rep: Reputation: Disabled
Here (Slackware) the xmlto application is shipped in the linuxdoc-tools package.

Else to get a package see here:
https://admin.fedoraproject.org/pkgd...ge/rpms/xmlto/

Last edited by Didier Spaier; 05-01-2017 at 03:50 PM.
 
Old 05-01-2017, 03:55 PM   #3
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,143

Original Poster
Rep: Reputation: 43
Question

Quote:
Originally Posted by Didier Spaier View Post
Here (Slackware) the xmlto application is shipped in the linuxdoc-tools package.
Thank you for this quick reply. I am on an old Ubuntu. It has the package xmlto (described as "XML-to-any converter") that says:

Quote:
XML-to-any converter

xmlto is a front-end to an XSL toolchain. It chooses an appropriate
stylesheet for the conversion you want and applies it using an external
XSLT processor (currently, only xsltproc is supported). It also performs
any necessary post-processing.

It supports converting from DocBook XML to DVI, XSL-FO, HTML (multiple
pages), HTML (one page), man page, PDF, PostScript and plain text. It
supports converting from XSL-FO to DVI, PDF and PostScript.

DVI output requires dblatex or PassiveTeX. Other formats can be produced
with any of the supported toolchains - dblatex, PassiveTeX or
docbook-xsl/fop (but may require some extensions).
Will it work for that? The generated XML I showed here is DocBook XML? /-: It does not look like.

Last edited by dedec0; 05-01-2017 at 03:58 PM.
 
Old 05-01-2017, 04:32 PM   #4
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1.2 on Lenovo Thinkpad W520
Posts: 9,994

Rep: Reputation: Disabled
The answer is: try.

Unfortunately for me this is a dead as Slackware doesn't ship pdfxmltex that seems to be needed in this case.
 
Old 05-01-2017, 04:51 PM   #5
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,143

Original Poster
Rep: Reputation: 43
In debianish distros, the package that contains pdfxmltex is xmltex:

Quote:
Originally Posted by xmltex
XMLTeX is a "non-validating, namespace-aware XML parser" written TeX.
It allows TeX directly process XML files.

This package also contains the extension PassiveTeX, see
http://www.tei-c.org.uk/Software/passivetex/ for more details.

And I need to install most TeX packages too...
 
Old 05-01-2017, 05:09 PM   #6
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-14.2.1.2 on Lenovo Thinkpad W520
Posts: 9,994

Rep: Reputation: Disabled
Quote:
Originally Posted by dedec0 View Post
And I need to install most TeX packages too...
Don't let that fall on your feet, that's heavy
 
Old 05-01-2017, 06:22 PM   #7
dedec0
Senior Member
 
Registered: May 2007
Posts: 1,143

Original Poster
Rep: Reputation: 43
Quote:
Originally Posted by Didier Spaier View Post
Don't let that fall on your feet, that's heavy
hahaha... it is not too light, I checked.

I just imagined that someone would know if those would work (or not) to rebuild a converted PDF. I will not try it today.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can convert MSOffice .xml files to .pdf? Felipe Linux - Software 6 01-23-2012 03:09 AM
LXer: DocBook XML to PDF on Debian Lenny LXer Syndicated Linux News 0 12-18-2008 07:20 PM
LXer: How to convert PDF files to HTML or XML files in openSUSE LXer Syndicated Linux News 0 08-20-2008 08:40 AM
Convert pdf to html or txt or remaster the pdf? jago25_98 Linux - Software 1 12-13-2005 01:11 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration