LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Blogs > Michael Uplawski
User Name
Password

Notices


Rate this Entry

(OOXML/XSL) Write man-pages in a modern text-processor

Posted 09-08-2018 at 12:56 AM by Michael Uplawski
Updated 09-08-2018 at 05:54 AM by Michael Uplawski (Corrections. Style sheet improved.)

On the topic of writing man-pages as reStructuredText, see also simple-write-man-pages-docutils-and-tweaks (in this blog).
Introduction
In short, I have written an XSL-stylesheet to transform the XML-code from an OOXML document (M$ Word® format docx or SoftMaker®'s tmdx) into a reStructuredText file.

The current edition produces enough formatting for a man-page, although some enhancements should allow the inclusion of tables.
Text-Processor Templates
In the original OOXML document, formatted paragraphs or words must use paragraph- or character-styles from a list of known styles. These are identified during the xsl transformation.
To provide the styles, I have first created a template for my text-processor that I use for each new documentation-project.

Apart from the transformation to reStructuredText and then the troff-code for a man-page, this procedure allows to add styles and text to the original document, which will thus *NOT* be honored in the transformation but can be used in other contexts, e.g. to create READMEs, HTML or whatever document, from the same source-file...

As I cannot upload authentic document-templates, nor compressed archives to LQ, here are the simple rules for a template that corresponds to the xsl-stylesheet included in this blog:
  • Paragraph-styles in the styles.xml of the template are para1, para2 ... to para7
  • In the OOXML-document, paragraph styles are used as follows
    • para1 - Global header
    • para2 - Sub-title or short description
    • para3 - Section header
    • para4 - Bold text
    • para5 - Definition list (option list in a man-page)
    • para6 - Examples (normal text, probably)
    • para7 - Field list
    • para0 - “Ordinary” text. Not formatted in the RST output, it should correspond to the default paragraph style in the text processor.
  • Character-styles are
    • char1 - An option in the option-list
    • char2 - A field in a field list
    • char0 - Normal text, default character style in the text processor.
XSL-Processor
As an xsl-processor you can use xsltproc, which should be available in the package-ressources for your Linux-distribution. If you use Apache-FOP for xsl/fo transformations, use the -foout option to skip the FO-processing.
The Style-Sheet

Code:
<?xml version="1.0" encoding="UTF-8"?>
<!-- 
  ©2018-2018 Michael Uplawski <michael.uplawski@uplawski.eu>
  Created with nothing.  
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" >
  <xsl:output method="txt" omit-xml-declaration="yes"/>

  <!-- match the root-tag -->
  <xsl:template match="/">
    <!-- matches the root-tag -->
    <xsl:message>
      <xsl:text>handling root</xsl:text>
    </xsl:message>
    <xsl:apply-templates select="//w:body"/>
  </xsl:template>

  <!-- traverse the xml-structure to where we want to go.
      This can serve other purposes, later -->
  <xsl:template match="w:body">
    <xsl:message>
      <xsl:text>handling w:body</xsl:text>
    </xsl:message>
    <xsl:apply-templates select="w:p"/>
  </xsl:template>

  <!-- match an actual paragraph -->
  <xsl:template match="w:p">
    <xsl:message>
      <xsl:text>handling w:p </xsl:text>
    </xsl:message>

    <!-- While we only want to process the w:t tag,
         the styles are identified in child structures
         under the w:pPr- and w:r-tags. 

        I just communicate them to the next template. -->
    <xsl:apply-templates select="w:r/w:t">
      <xsl:with-param name="paraStyle" select="w:pPr/w:pStyle/@w:val"/>
      <xsl:with-param name="charStyle" select="w:r/w:rPr/w:rStyle/@w:val"/>
    </xsl:apply-templates>
  </xsl:template> 

  <!-- *** Real Work Starts Here *** -->
  <xsl:template match="w:t">
    <!-- expect two parameters -->
    <xsl:param name="paraStyle"/> 
    <xsl:param name="charStyle"/>
    <!-- Get the actual content of this tag -->
    <xsl:variable name="content" select="."/>

    <xsl:message>
      <xsl:text>handling w:t</xsl:text>
      <xsl:text> pStyle is |</xsl:text>
      <xsl:value-of select="normalize-space($paraStyle)"/> 
      <xsl:text>|</xsl:text>
    </xsl:message>

    <!-- decide, what to do for each of the 
    set style-names. Put another way:
    Produce Restructured Text -->
    <xsl:choose>
      <xsl:when test="$paraStyle='para1'">
        <xsl:text>&#xA;============&#xA;</xsl:text> 
        <xsl:value-of select="$content"/>
        <xsl:text>&#xA;============&#xA;</xsl:text> 
      </xsl:when>
      <xsl:when test="$paraStyle='para2'">
        <xsl:text>&#xA;----------------------------------------------------&#xA;</xsl:text> 
        <xsl:value-of select="normalize-space($content)"/>
        <xsl:text>&#xA;----------------------------------------------------&#xA;</xsl:text> 
      </xsl:when>
      <xsl:when test="$paraStyle='para3'">
        <xsl:text>&#xA;</xsl:text> 
        <xsl:value-of select="normalize-space($content)"/>
        <xsl:text>&#xA;----------------------------------------------------&#xA;</xsl:text> 
      </xsl:when>
      <xsl:when test="$paraStyle='para4'">
        <xsl:text>&#xA;**</xsl:text> 
        <xsl:value-of select="normalize-space($content)"/>
        <xsl:text>**&#xA;</xsl:text> 
      </xsl:when>
      <!-- Definition-lists are always a chore.
      You must know the original xml-file a bit to understand this
      when-branch. An option-list, first -->
      <xsl:when test="$paraStyle='para5'">
        <xsl:choose>
          <!-- Definition Item -->
          <xsl:when test="position()=1 and $charStyle='char1'">
            <xsl:text>&#xA;</xsl:text>
            <xsl:text>**</xsl:text> 
            <xsl:value-of select="normalize-space($content)"/>
            <xsl:text>** </xsl:text> 
          </xsl:when>
          <!-- Not a Definition Item -->
          <xsl:otherwise>
            <xsl:choose>
              <!-- Definition -->
              <xsl:when test="./preceding-sibling::w:tab">
                <xsl:text>           </xsl:text>
                <xsl:value-of select="normalize-space($content)"/>
                <xsl:text>&#xA;</xsl:text> 
              </xsl:when>
              <!-- Something following the item, but not a definition -->
              <xsl:otherwise>
                <xsl:text> </xsl:text>
                <xsl:value-of select="normalize-space($content)"/>
                <xsl:text> </xsl:text>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <!-- Now just a field-list. -->
      <xsl:when test="$paraStyle='para7'">
        <xsl:choose>
          <xsl:when test="position()=1 and $charStyle='char2'">
            <xsl:text>&#xA;</xsl:text>
            <xsl:text>*</xsl:text> 
            <xsl:value-of select="normalize-space($content)"/>
            <xsl:text>* </xsl:text> 
          </xsl:when>
          <xsl:otherwise>
            <xsl:choose>
              <xsl:when test="./preceding-sibling::w:tab">
                <xsl:text>           </xsl:text>
                <xsl:value-of select="normalize-space($content)"/>
                <xsl:text>&#xA;</xsl:text> 
              </xsl:when>
              <xsl:otherwise>
                <xsl:text> </xsl:text>
                <xsl:value-of select="normalize-space($content)"/>
                <xsl:text> </xsl:text>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <!-- Anything else ist just text.
          paraStyle is in reality == 'para0', but
          only implicitly applied. A test would fail.
      -->
      <xsl:when test="string-length($paraStyle) = 0">
        <xsl:message><xsl:text>    assuming normal text</xsl:text>
        </xsl:message>
        <xsl:text>&#xA;</xsl:text> 
        <xsl:value-of select="normalize-space($content)"/>
        <xsl:text>&#xA;</xsl:text> 
      </xsl:when>
      <!-- The remainder may be unsupported styles. 
      Ignore these -->
      <xsl:otherwise>
        <xsl:message><xsl:text>    Skipping unsupported style</xsl:text></xsl:message>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- This is unused... TODO: Keep or remove -->
  <xsl:template match="w:tab">
    <xsl:text>        </xsl:text>
  </xsl:template>
</xsl:stylesheet>
<!-- EOF -->
Ω
« Prev     Main     Next »
Total Comments 0

Comments

 

  



All times are GMT -5. The time now is 10:07 AM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration