LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-16-2019, 08:06 AM   #1
Michael Uplawski
Member
 
Registered: Dec 2015
Location: Normandy, France
Distribution: Debian “testing” with very little “unstable”
Posts: 780
Blog Entries: 25

Rep: Reputation: 489Reputation: 489Reputation: 489Reputation: 489Reputation: 489
XML/XSD Schemavalidation of an OOXML document


Good afternoon.

I am generating programmatically OOXML-documents for routine-use. As my knowledge of OOXML bases entirely on online-resources, I make errors and would like to validate the code in my template files (document.xml, header.xml) and styles (styles.xml) against the referenced schema-definitions, which are for now only these three:
  • xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  • xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  • xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"

My knowledge of xmllint is insufficient and online-validators appear to validate each of my documents as valid, even where I close a container-tag before one of the elements that it must include. The only thing that they achieve is assure the “well-formedness” of the XML.

Can you point me at a resource which explains how this type of document is best validated against the named schemas? Or where I can download the xsd for each schema, if I want to feed them to xmllint?

Amongst others, I have seen:
  1. http://www.datypic.com/sc/ooxml/ss.html gives the same kind of overview that I find elsewhere, but no XSD-code.
  2. https://www.ecma-international.org/p...s/Ecma-376.htm - I do not know what to download, here, an attempt to validate against “ECMA-376 5th edition Part 1” fails with the following errors:
    Code:
    user@machine:~$ xmllint --schema wml.xsd styles.xml 
    shared-math.xsd:154: element attribute: Schemas parser error : attribute use (unknown), attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}space' does not resolve to a(n) attribute declaration.
    wml.xsd:1663: element attribute: Schemas parser error : attribute use (unknown), attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}space' does not resolve to a(n) attribute declaration.
    WXS schema wml.xsd failed to compile
    
    (...)

Last edited by Michael Uplawski; 03-16-2019 at 08:34 AM. Reason: three schemas, not four.
 
Old 03-17-2019, 11:37 AM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 3,803

Rep: Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287
Could you please give an example xml (or a link to it)?
 
Old 03-17-2019, 01:10 PM   #3
Michael Uplawski
Member
 
Registered: Dec 2015
Location: Normandy, France
Distribution: Debian “testing” with very little “unstable”
Posts: 780

Original Poster
Blog Entries: 25

Rep: Reputation: 489Reputation: 489Reputation: 489Reputation: 489Reputation: 489
Quote:
Originally Posted by NevemTeve View Post
Could you please give an example xml (or a link to it)?
Here is an archive with authentic “templates”, i.e. a document.xml which is completed during the execution of my routine, a styles.xml, a header.xml and a _rels directory, which links header.xml and document.xml.

I do not know if that is the kind of example you wish to see.
 
Old 03-18-2019, 12:22 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 3,803

Rep: Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287
Ok, let's try document.xml
Code:
<?xml version="2.0" encoding="utf-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:type w:val="continuous" />
    <w:sectPr>
      <w:headerReference w:type="default" r:id="rId1" />
...
Well, here is the first problem reported by xmllint:
Code:
document.xml:1: parser error : Unsupported version '2.0'
<?xml version="2.0" encoding="utf-8" standalone="yes"?>
the second one:
Code:
document.xml:6: namespace error : Namespace prefix r for id on headerReference is not defined
      <w:headerReference w:type="default" r:id="rId1" />
The trivial fix would be this:
Code:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <w:body>
...
But that's not enough: as xmllint accepts only one schema, one should create an xml-catalog. I'll try to give more details.

PS: I found the xsd files here: https://jar-download.com/cache_jars/.../jar_files.zip

Last edited by NevemTeve; 03-18-2019 at 01:19 AM.
 
1 members found this post helpful.
Old 03-18-2019, 05:56 AM   #5
Michael Uplawski
Member
 
Registered: Dec 2015
Location: Normandy, France
Distribution: Debian “testing” with very little “unstable”
Posts: 780

Original Poster
Blog Entries: 25

Rep: Reputation: 489Reputation: 489Reputation: 489Reputation: 489Reputation: 489
Quote:
Originally Posted by NevemTeve View Post
But that's not enough: as xmllint accepts only one schema, one should create an xml-catalog. I'll try to give more details.

PS: I found the xsd files here: https://jar-download.com/cache_jars/.../jar_files.zip
Thank you for your time and effort. I was impatient to read your response.

Are you accustomed to this kind of problem or how did you make a connection to jar-download.com? Even if XML and Java are close friends, I always hope for a generally applicable procedure and would not have thought of searching for a jar-archive, of all choices... called zip, if it must. Anyway.

Last edited by Michael Uplawski; 03-18-2019 at 06:11 AM.
 
Old 03-18-2019, 06:58 AM   #6
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 3,803

Rep: Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287
Well, I'd suggest this:

As root
1. If you don't have file /usr/local/etc/xml/catalog, create it:
Code:
$ mkdir -p /usr/local/etc/xml
$ cat >/usr/local/etc/xml/catalog <<DONE
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <nextCatalog catalog="file:///etc/xml/catalog"/>
</catalog>
DONE
2. If you don't have a line in this file referring to name="http://www.w3.org/XML/1998/namespace", then insert line like this (before the nextCatalog line):
Code:
  <uri name="http://www.w3.org/XML/1998/namespace" uri="file:///usr/local/etc/xml/xml_2009_01.xsd"/>
Also actually download this file:
Code:
wget -O /usr/local/etc/xml/xml_2009_01.xsd http://www.w3.org/2009/01/xml.xsd
Switch back to normal user.
3. Put the OOXML-xsd files into a sub-directory of your work-dir, eg ooxml_xsd.

4. Some modifications are required to let xmllint work:
4.1. wml.xsd -- missing schemaLocation
Code:
-  <xsd:import id="xml" namespace="http://www.w3.org/XML/1998/namespace" />
+  <xsd:import id="xml" namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/XML/1998/namespace"/>
4.2. dml-wordprocessingDrawing.xsd -- duplicate import for the same namespace
Code:
-  <xsd:import schemaLocation="dml-graphicalObject.xsd"    namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />
-  <xsd:import schemaLocation="dml-documentProperties.xsd" namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />
+  <xsd:import schemaLocation="dml-wordprocessingDrawing_import.xsd" namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />
Then create this dml-wordprocessingDrawing_import.xsd file:
Code:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema targetNamespace="http://schemas.openxmlformats.org/drawingml/2006/main"
   elementFormDefault="qualified"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:include schemaLocation="dml-graphicalObject.xsd"/>
  <xsd:include schemaLocation="dml-documentProperties.xsd"/>
</xsd:schema>
Now you can invoke xmllint:
Code:
$ export XML_CATALOG_FILES=/usr/local/etc/xml/catalog
$ xmllint -noout -debugent -schema ooxml_xsd/wml.xsd document.xml

new input from file: ooxml_xsd/wml.xsd
new input from file: ooxml_xsd/shared-customXmlSchemaProperties.xsd
new input from file: ooxml_xsd/shared-math.xsd
new input from file: ooxml_xsd/dml-wordprocessingDrawing.xsd
new input from file: ooxml_xsd/dml-wordprocessingDrawing_import.xsd
new input from file: ooxml_xsd/dml-graphicalObject.xsd
new input from file: ooxml_xsd/dml-documentProperties.xsd
new input from file: ooxml_xsd/dml-baseTypes.xsd
new input from file: ooxml_xsd/shared-relationshipReference.xsd
new input from file: ooxml_xsd/dml-shapeGeometry.xsd
new input from file: file:///usr/local/etc/xml/xml_2009_01.xsd
new input from file: document.xml
document.xml:6: element type: Schemas validity error :
Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type':
This element is not expected.
Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}sectPr ).
document.xml fails to validate
DOCUMENT
No entities in internal subset
No entities in external subset

Last edited by NevemTeve; 03-18-2019 at 07:04 AM.
 
2 members found this post helpful.
Old 03-20-2019, 04:34 PM   #7
Michael Uplawski
Member
 
Registered: Dec 2015
Location: Normandy, France
Distribution: Debian “testing” with very little “unstable”
Posts: 780

Original Poster
Blog Entries: 25

Rep: Reputation: 489Reputation: 489Reputation: 489Reputation: 489Reputation: 489
Thank you so much!

I validate.
And promptly, my “templates” must be revised as I have skipped some namespaces, at least for the attributes of the <w:headerReference/>. Up to now I was lucky that the text-processor, which reads my final documents, corrects errors upon saving and my requirements were simple.

It is, however, surprising that the validation needs so much preparation.

Last edited by Michael Uplawski; 03-23-2019 at 03:20 AM. Reason: upon
 
1 members found this post helpful.
Old 07-28-2019, 05:30 PM   #8
Ricky Rocker
LQ Newbie
 
Registered: Jul 2019
Posts: 3

Rep: Reputation: Disabled
Question

Hi @nevemTeve

I "think" I've got everything right (I've modified your approach slightly by placing the xml_2009_01.xsd file in the same folder as the word docs for testing hopefully without the requirement for the catalog)

and I'm getting the following...
Code:
root@dev:/Development # xmllint --schema /Development/OfficeOpenXML-XMLSchema-Strict/wml.xsd testdoc.xml --noout --debugent
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/wml.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-wordprocessingDrawing.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-main.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-relationshipReference.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-commonSimpleTypes.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-diagram.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-chart.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-chartDrawing.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-picture.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-lockedCanvas.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-math.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/xml_2009_01.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-customXmlSchemaProperties.xsd
new input from file: testdoc.xml
testdoc.xml:1: element document: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}document': No matching global declaration available for the validation root.
testdoc.xml fails to validate
DOCUMENT
No entities in internal subset
No entities in external subset
root@dev:/Development #

...so the schemas all seem happy, but for some reason testdoc.xml with the following w:document node is failing as above (XMLspy validates it fine)

Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:wpc="http://schemas.openxmlformats.org/office/word/2010/wordprocessingCanvas" 
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
xmlns:v="urn:schemas-microsoft-com:vml" 
xmlns:wp14="http://schemas.openxmlformats.org/office/word/2010/wordprocessingDrawing" 
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
xmlns:w10="urn:schemas-microsoft-com:office:word" 
xmlns:w14="http://schemas.openxmlformats.org/office/word/2010/wordml" 
xmlns:w15="http://schemas.openxmlformats.org/office/word/2012/wordml" 
xmlns:wpg="http://schemas.openxmlformats.org/office/word/2010/wordprocessingGroup" 
xmlns:wpi="http://schemas.openxmlformats.org/office/word/2010/wordprocessingInk" 
xmlns:wne="http://schemas.openxmlformats.org/office/word/2006/wordml" 
xmlns:wps="http://schemas.openxmlformats.org/office/word/2010/wordprocessingShape">

Any ideas would be greatly appreciated!

thanks so much

Ricky

Last edited by Ricky Rocker; 07-29-2019 at 03:36 PM.
 
Old 07-29-2019, 02:56 PM   #9
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 3,803

Rep: Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287Reputation: 1287
You might want to edit your post to add [code] and [/code] tags.
 
Old 07-29-2019, 03:40 PM   #10
Ricky Rocker
LQ Newbie
 
Registered: Jul 2019
Posts: 3

Rep: Reputation: Disabled
Smile

Hi,

Thanks very much for the formatting tip.

I've tidied up but no longer need assistance as have resolved the issue I explain above. It was actually a PHP DOMDocument issue caused by this PHP bug.


schemaValidate ignores namespaces dynamically added to a DOMDocument https://bugs.php.net/bug.php?id=78352

Last edited by Ricky Rocker; 07-29-2019 at 03:52 PM.
 
1 members found this post helpful.
  


Reply

Tags
ooxml, validate, word-processor, xml, xsd


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Rob Weir's OOXML Update, Part III - Making OOXML Conform to Office 2007 LXer Syndicated Linux News 0 10-30-2009 01:20 AM
LXer: South Africa Files Official Appeal Re OOXML - OOXML in Limbo Now LXer Syndicated Linux News 0 05-24-2008 08:50 AM
Java XML client from xsd true_atlantis Programming 1 09-13-2007 06:25 PM
trying to execute a .xsd file kag291 Linux - Newbie 1 07-31-2006 10:25 AM
java - verifying XSD 0raven0 Programming 1 03-08-2006 12:02 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration