LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   XML/XSD Schemavalidation of an OOXML document (https://www.linuxquestions.org/questions/programming-9/xml-xsd-schemavalidation-of-an-ooxml-document-4175650279/)

Michael Uplawski 03-16-2019 08:06 AM

XML/XSD Schemavalidation of an OOXML document
 
Good afternoon.

I am generating programmatically OOXML-documents for routine-use. As my knowledge of OOXML bases entirely on online-resources, I make errors and would like to validate the code in my template files (document.xml, header.xml) and styles (styles.xml) against the referenced schema-definitions, which are for now only these three:
  • xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  • xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  • xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"

My knowledge of xmllint is insufficient and online-validators appear to validate each of my documents as valid, even where I close a container-tag before one of the elements that it must include. The only thing that they achieve is assure the “well-formedness” of the XML.

Can you point me at a resource which explains how this type of document is best validated against the named schemas? Or where I can download the xsd for each schema, if I want to feed them to xmllint?

Amongst others, I have seen:
  1. http://www.datypic.com/sc/ooxml/ss.html gives the same kind of overview that I find elsewhere, but no XSD-code.
  2. https://www.ecma-international.org/p...s/Ecma-376.htm - I do not know what to download, here, an attempt to validate against “ECMA-376 5th edition Part 1” fails with the following errors:
    Code:

    user@machine:~$ xmllint --schema wml.xsd styles.xml
    shared-math.xsd:154: element attribute: Schemas parser error : attribute use (unknown), attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}space' does not resolve to a(n) attribute declaration.
    wml.xsd:1663: element attribute: Schemas parser error : attribute use (unknown), attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}space' does not resolve to a(n) attribute declaration.
    WXS schema wml.xsd failed to compile

    (...)


NevemTeve 03-17-2019 11:37 AM

Could you please give an example xml (or a link to it)?

Michael Uplawski 03-17-2019 01:10 PM

Quote:

Originally Posted by NevemTeve (Post 5974725)
Could you please give an example xml (or a link to it)?

Here is an archive with authentic “templates”, i.e. a document.xml which is completed during the execution of my routine, a styles.xml, a header.xml and a _rels directory, which links header.xml and document.xml.

I do not know if that is the kind of example you wish to see.

NevemTeve 03-18-2019 12:22 AM

Ok, let's try document.xml
Code:

<?xml version="2.0" encoding="utf-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:type w:val="continuous" />
    <w:sectPr>
      <w:headerReference w:type="default" r:id="rId1" />
...

Well, here is the first problem reported by xmllint:
Code:

document.xml:1: parser error : Unsupported version '2.0'
<?xml version="2.0" encoding="utf-8" standalone="yes"?>

the second one:
Code:

document.xml:6: namespace error : Namespace prefix r for id on headerReference is not defined
      <w:headerReference w:type="default" r:id="rId1" />

The trivial fix would be this:
Code:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <w:body>
...

But that's not enough: as xmllint accepts only one schema, one should create an xml-catalog. I'll try to give more details.

PS: I found the xsd files here: https://jar-download.com/cache_jars/.../jar_files.zip

Michael Uplawski 03-18-2019 05:56 AM

Quote:

Originally Posted by NevemTeve (Post 5974911)
But that's not enough: as xmllint accepts only one schema, one should create an xml-catalog. I'll try to give more details.

PS: I found the xsd files here: https://jar-download.com/cache_jars/.../jar_files.zip

Thank you for your time and effort. I was impatient to read your response.

Are you accustomed to this kind of problem or how did you make a connection to jar-download.com? Even if XML and Java are close friends, I always hope for a generally applicable procedure and would not have thought of searching for a jar-archive, of all choices... called zip, if it must. Anyway.

NevemTeve 03-18-2019 06:58 AM

Well, I'd suggest this:

As root
1. If you don't have file /usr/local/etc/xml/catalog, create it:
Code:

$ mkdir -p /usr/local/etc/xml
$ cat >/usr/local/etc/xml/catalog <<DONE
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <nextCatalog catalog="file:///etc/xml/catalog"/>
</catalog>
DONE

2. If you don't have a line in this file referring to name="http://www.w3.org/XML/1998/namespace", then insert line like this (before the nextCatalog line):
Code:

  <uri name="http://www.w3.org/XML/1998/namespace" uri="file:///usr/local/etc/xml/xml_2009_01.xsd"/>
Also actually download this file:
Code:

wget -O /usr/local/etc/xml/xml_2009_01.xsd http://www.w3.org/2009/01/xml.xsd
Switch back to normal user.
3. Put the OOXML-xsd files into a sub-directory of your work-dir, eg ooxml_xsd.

4. Some modifications are required to let xmllint work:
4.1. wml.xsd -- missing schemaLocation
Code:

-  <xsd:import id="xml" namespace="http://www.w3.org/XML/1998/namespace" />
+  <xsd:import id="xml" namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/XML/1998/namespace"/>

4.2. dml-wordprocessingDrawing.xsd -- duplicate import for the same namespace
Code:

-  <xsd:import schemaLocation="dml-graphicalObject.xsd"    namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />
-  <xsd:import schemaLocation="dml-documentProperties.xsd" namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />
+  <xsd:import schemaLocation="dml-wordprocessingDrawing_import.xsd" namespace="http://schemas.openxmlformats.org/drawingml/2006/main" />

Then create this dml-wordprocessingDrawing_import.xsd file:
Code:

<?xml version="1.0" encoding="utf-8"?>
<xsd:schema targetNamespace="http://schemas.openxmlformats.org/drawingml/2006/main"
  elementFormDefault="qualified"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:include schemaLocation="dml-graphicalObject.xsd"/>
  <xsd:include schemaLocation="dml-documentProperties.xsd"/>
</xsd:schema>

Now you can invoke xmllint:
Code:

$ export XML_CATALOG_FILES=/usr/local/etc/xml/catalog
$ xmllint -noout -debugent -schema ooxml_xsd/wml.xsd document.xml

new input from file: ooxml_xsd/wml.xsd
new input from file: ooxml_xsd/shared-customXmlSchemaProperties.xsd
new input from file: ooxml_xsd/shared-math.xsd
new input from file: ooxml_xsd/dml-wordprocessingDrawing.xsd
new input from file: ooxml_xsd/dml-wordprocessingDrawing_import.xsd
new input from file: ooxml_xsd/dml-graphicalObject.xsd
new input from file: ooxml_xsd/dml-documentProperties.xsd
new input from file: ooxml_xsd/dml-baseTypes.xsd
new input from file: ooxml_xsd/shared-relationshipReference.xsd
new input from file: ooxml_xsd/dml-shapeGeometry.xsd
new input from file: file:///usr/local/etc/xml/xml_2009_01.xsd
new input from file: document.xml
document.xml:6: element type: Schemas validity error :
Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type':
This element is not expected.
Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}sectPr ).
document.xml fails to validate
DOCUMENT
No entities in internal subset
No entities in external subset


Michael Uplawski 03-20-2019 04:34 PM

Thank you so much!

I validate.
And promptly, my “templates” must be revised as I have skipped some namespaces, at least for the attributes of the <w:headerReference/>. Up to now I was lucky that the text-processor, which reads my final documents, corrects errors upon saving and my requirements were simple.

It is, however, surprising that the validation needs so much preparation.

Ricky Rocker 07-28-2019 05:30 PM

Hi @nevemTeve

I "think" I've got everything right (I've modified your approach slightly by placing the xml_2009_01.xsd file in the same folder as the word docs for testing hopefully without the requirement for the catalog)

and I'm getting the following...
Code:

root@dev:/Development # xmllint --schema /Development/OfficeOpenXML-XMLSchema-Strict/wml.xsd testdoc.xml --noout --debugent
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/wml.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-wordprocessingDrawing.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-main.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-relationshipReference.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-commonSimpleTypes.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-diagram.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-chart.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-chartDrawing.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-picture.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/dml-lockedCanvas.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-math.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/xml_2009_01.xsd
new input from file: /Development/OfficeOpenXML-XMLSchema-Strict/shared-customXmlSchemaProperties.xsd
new input from file: testdoc.xml
testdoc.xml:1: element document: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}document': No matching global declaration available for the validation root.
testdoc.xml fails to validate
DOCUMENT
No entities in internal subset
No entities in external subset
root@dev:/Development #


...so the schemas all seem happy, but for some reason testdoc.xml with the following w:document node is failing as above (XMLspy validates it fine)

Code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:wpc="http://schemas.openxmlformats.org/office/word/2010/wordprocessingCanvas"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp14="http://schemas.openxmlformats.org/office/word/2010/wordprocessingDrawing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w14="http://schemas.openxmlformats.org/office/word/2010/wordml"
xmlns:w15="http://schemas.openxmlformats.org/office/word/2012/wordml"
xmlns:wpg="http://schemas.openxmlformats.org/office/word/2010/wordprocessingGroup"
xmlns:wpi="http://schemas.openxmlformats.org/office/word/2010/wordprocessingInk"
xmlns:wne="http://schemas.openxmlformats.org/office/word/2006/wordml"
xmlns:wps="http://schemas.openxmlformats.org/office/word/2010/wordprocessingShape">


Any ideas would be greatly appreciated!

thanks so much

Ricky

NevemTeve 07-29-2019 02:56 PM

You might want to edit your post to add [code] and [/code] tags.

Ricky Rocker 07-29-2019 03:40 PM

Hi,

Thanks very much for the formatting tip.

I've tidied up but no longer need assistance as have resolved the issue I explain above. It was actually a PHP DOMDocument issue caused by this PHP bug.


schemaValidate ignores namespaces dynamically added to a DOMDocument https://bugs.php.net/bug.php?id=78352


All times are GMT -5. The time now is 04:59 PM.