Paradigms exist to be broken

Posted 10-29-2016 at 12:39 AM by Michael Uplawski
Updated 11-16-2016 at 05:02 PM by Michael Uplawski (typos, Kraut2English)

Tags html, paradigm, pdf, programming, xslt

Paradigms exist to be broken
or:
How to create a Dynamic bookmark-tree with Apache-FOP

Introduction

This page will eventually explain how you can dynamically generate a bookmark-tree in the PDF-documents by use of the Apache-FOP xsl/fo processor.

But before I show you the mere technicalities, you need to realize what's special in this procedure and what it means to break a paradigm. In fact, the alternative titles on this page should appeal to those who need a bookmark-tree just as much as to those who seek in their creations, in code or other media, the maximal possible freedom of expression.

XSL/FO

Maybe you need a quick introduction or recapitulation of the basics. If not, jump to the next chapter, below.
An XSL/FO processor like FOP reads files in XML-format and produces a document of arbitrary type. FOP is called an XSL/FO-processor because it does two of these transformations at a time:

First it reads the XML of the original file and creates a different one. This new file is in FO-format (Formatting Objects) which describes how the final document is presenting its contents. This first transformation follows user-defined rules to ensure that an arbitrary number of different documents will be formatted similarly. The rules are read from an XSL-style-sheet that needs to be provided.
During the second transformation, the FO-file is transformed into one of the output-types which the FO-processor supports. FOP can create many different documents from one and the same FO-file.
In the context of this page and the generation of the bookmark-tree, the second transformation from FO to PDF is of no interest. We concentrate on the XSL-style-sheet and the formatting-rules defined therein.

XML-code defines a hierarchical structure in which the document-content is organized. XSL in contrast defines how the nodes of the XML-structure shall be treated to generate the final PDF-output. As the input-document is organized hierarchically, the author of the XSL-style-sheet must honor the relation between the XML-tags. When a container-tag is handled, this implies that the children of the container be handled, too.

A container-structure can include more hierarchically structured sub-divisions and child-tags can be the parent-tags to others. The rule-set in the XSL-style-sheet must take this possibility into account and delegate automatically the processing of sub-tags to the pertinent formatting-rule, be it to itself, if a child-tag is of the same type as the one, just about to be processed. Recursion is not simply tolerated but actually sought for during the creation of the XSL-style-sheet.

Recursion helps to keep the number of processing-rules small and it ensures that the same kind of content will be formatted in the always same way, no matter how big the original XML-document is or at which position and how often the same kind of content needs to be produced.

XSL is a language and it lets you choose your style to express the formatting-rules. There are even alternative ways to define the same rule.

The paradigm: XSL-templates

An XSL-template defines a processing-rule to be applied, when the xsl-processor finds in a source-document

a certain tag,

a certain tag within a defined neighbourhood of other tags

or a tag carrying certain attributes

For example, you can define that the text within a tag <house/> shall always be of red color in the resulting PDF-file. The template will match all the tags <house/> and transform the included text-node into red text in the PDF.

Code:

<xsl:template match="house">
    <fo:inline color="#ff0000"><xsl:value-of select="."/></fo:inline>
</xsl:template>

Once defined, the same template will automatically become active for each and every <house/>-tag in the XML-file.

An alternative way to handle one kind of tag in the always same way is a for-each statement. To the beginner, it appears to be simpler to just state clearly that the content of each <house/>-tag shall become printed red and that would be it...

Code:

<xsl:template name="redhouse">
    <xsl:for-each select="//house">
        <inline color="#ff0000"><xsl:value-of select="."/></inline>
    <xsl:for-each>
</xsl:template>

Unfortunately, to honor the hierarchical structure of the original XML, many nested for-each statements would have to be written and if there is a doll's house anywhere in the big house, real problems begin.

So XSL-templates are not just best practice-, they are really useful and necessary to keep the XSL-code efficiant and readable.

As happens often, when incomparable techniques are evaluated as if they were just two ways to do the same, one of both gets the blessing of the experts' opinion and the label right, the other contempt and the label wrong. Ask in a XSL-centric community which way to process XML-tags is the best one, -for-each or a template-, and without having to explain in much detail your current XSL/FO project, you will receive the answer: “Use a template!”

Bookmark-tree

The bookmark-tree is the vertical structure which, when you see it in a PDF-reader, references the chapters of the document by naming their headers. You can even click on any bookmark to access the pertinent chapter directly.

The definition of such a bookmark includes the following details:

The target header in the text

The title of the bookmark which is identical to the header in the text

The hierarchical position, below or above one or more other bookmarks

Bookmark-tree with Apache-FOP

With Apache-FOP, bookmark-trees are usually created apart from the remainder of the PDF-document that is: in a template which is called to create the structure, and not in a succession of templates which are triggered by tags in the source-document.

Although the opposite is possible, there is hardly any advantage to be taken from it. The reason is that bookmarks are mostly constructed explicitly with a jump target AND the text of the bookmark in mind.

You have to comprehend my last statement and I therefore resume shortly and put it another way: While the XSL/FO-processor is there to automate the production of a PDF-document from an arbitrary XML-file, the bookmark-tree is defined explicitly by naming the text of the bookmark AND the position in the content, where a mouse-click shall catapult us. I hope that you notice, how this is rather dumb. But here is an excerpt of an exemplary template like dozens that you find on the Internet. And they are often published to explain how you create bookmark-trees with Apache-FOP:

Code:

<fo:bookmark-tree >
   <fo:bookmark internal-destination="toc" >
      <fo:bookmark-title> Table des matières </fo:bookmark-title>
      <fo:bookmark internal-destination="chapitre1">
            <fo:bookmark-title>Mon premier chapitre </fo:bookmark-title>
      </fo:bookmark>
      <fo:bookmark internal-destination="chapitre2">
            <fo:bookmark-title>Mon deuxieme chapitre </fo:bookmark-title>
      </fo:bookmark>
   </fo:bookmark>
</fo:bookmark-tree>

So, where is the problem? Why is this dumb?

This way of creating the bookmarks is like buying a fine power-drill but to pound on it with a mallet when you want it to pierce a wall. With each new document a new definition of the bookmark-tree is due, no matter if you keep the template for its generation in a separate style-sheet. Even then, you are obliged to make sure that the connection between style-sheets does each time correspond to the document that you are about to create. Also, you must adapt the bookmark-tree in case that a chapter is added to or removed from the source-document or its header is changed. Unnerving work, when you consider that all the information that you need in the bookmark-tree is always present in the source-document!

Bookmark-tree with Apache-FOP, the real thing

What I want from an XSL-style-sheet is that it relieves me of the obligation to read the source-document. Furthermore, I want to blindly apply the XSL/FO-transformation to just any suitable file and expect that a bookmark-tree is created in the resulting PDFs.

Now the drop of bitterness: XSL-templates are just not up to the task. It appears to be impossible, -as I have not been shown any example that could convince me of the opposite-, to map dynamically the headers of the chapters in the contents of a document to the hierarchical structure of the bookmark-tree by means of matching templates.

My alternate approach takes into consideration that the three properties of any bookmark, target, title and position, must be anticipated before the XSL/FO-processor has had an opportunity to identify them in the document. Here is the excerpt of a style-sheet, which will provide a bookmark-tree for XHTML-files and transforms the headers from <h1> to <h5> into bookmarks:

Code:

 <!-- template for the bookmark-tree -->
  <xsl:template name="bookmark">
	  <xsl:for-each select="//h1">
		  <xsl:variable name="id">
			  <xsl:value-of select="concat('h1_', .)" />
		  </xsl:variable>
		  <fo:bookmark internal-destination="{$id}">
			  <fo:bookmark-title>
				  <xsl:value-of select="." />
			  </fo:bookmark-title>
			  <xsl:variable name="h1_text">
				  <xsl:value-of select="." />
			  </xsl:variable>
			  <xsl:for-each select="//h2">
				  <xsl:if test="$h1_text = preceding-sibling::h1[1]">
					  <xsl:variable name="id">
						  <xsl:value-of select="concat('h2_', .)" />
					  </xsl:variable>
					  <fo:bookmark internal-destination="{$id}">
						  <fo:bookmark-title>
							  <xsl:value-of select="self::node()" />
						  </fo:bookmark-title>
						  <xsl:variable name="h2_text">
							  <xsl:value-of select="." />
						  </xsl:variable>
						  <xsl:for-each select="//h3">
							  <xsl:if test="$h2_text = preceding-sibling::h2[1]">
								  <xsl:variable name="id">
									  <xsl:value-of select="concat('h3_', .)" />
								  </xsl:variable>
								  <fo:bookmark internal-destination="{$id}">
									  <fo:bookmark-title>
										  <xsl:value-of select="self::node()" />
									  </fo:bookmark-title>
									  <xsl:variable name="h3_text">
										  <xsl:value-of select="." />
									  </xsl:variable>
									  <xsl:for-each select="//h4">
										  <xsl:if test="$h3_text = preceding-sibling::h3[1]">
											  <xsl:variable name="id">
												  <xsl:value-of select="concat('h4_', .)" />
											  </xsl:variable>
											  <fo:bookmark internal-destination="{$id}">
												  <fo:bookmark-title>
													  <xsl:value-of select="self::node()" />
												  </fo:bookmark-title>
												  <xsl:variable name="h4_text">
													  <xsl:value-of select="." />
												  </xsl:variable>
												  <xsl:for-each select="//h5">
													  <xsl:if test="$h4_text = preceding-sibling::h4[1]">
														  <xsl:variable name="id">
															  <xsl:value-of select="concat('h5_', .)" />
														  </xsl:variable>
														  <fo:bookmark internal-destination="{$id}">
															  <fo:bookmark-title>
																  <xsl:value-of select="self::node()" />
															  </fo:bookmark-title>
														  </fo:bookmark>
													  </xsl:if>
												  </xsl:for-each>
											  </fo:bookmark>
										  </xsl:if>
									  </xsl:for-each>
								  </fo:bookmark>
							  </xsl:if>
						  </xsl:for-each>
					  </fo:bookmark>
				  </xsl:if>
			  </xsl:for-each>
		  </fo:bookmark>
	  </xsl:for-each>
  </xsl:template>

Note that the target of each bookmark is identified by means of a unique id. This id is simply created upon handling any header-tag from <h1> to <h5>, in the templates which appear somewhere else in the XSL-file:

Code:

 <!-- Exemplary generation of ids for the headers h1 to h2 -->
  <xsl:template match="h1">
          <xsl:variable name="h1_id">
                  <xsl:value-of select="concat('h1_', .)" />
          </xsl:variable>
          <fo:block font-size="3em" font-style="italic" color="#003050" margin="1em 2em 1em 0"
                  id="{$h1_id}">
                  <xsl:value-of select="." />
          </fo:block>
  </xsl:template>
  <xsl:template match="h2">
          <xsl:variable name="h2_id">
                  <xsl:value-of select="concat('h2_', .)" />
          </xsl:variable>
          <fo:block font-size="2em" id="{$h2_id}" keep-with-next="always" font-style="italic"
                  color="#005050" margin="1em 2em 1em 0">
                  <xsl:value-of select="." />
          </fo:block>
  </xsl:template>

Finally

I have used many words to explain the creation of a bookmark-tree, but they are not worth much, if I failed to demonstrate something else... Use a for-each loop, where a for-each loop is due! This is just one example for a situation that I have encountered often during my professional career as a software-developer. A structure/expression/design-pattern/custom which has once been recognized as useful at one point in time, is recommended to anybody who is confronted to a new task which may bare only slight resemblance to the previous. I can list a few of these annoying paradigms, as that's what they represent:

“Use a xsl-template, no matter what. ”
While a for-each statement may be more adapted to the task at hand!

“Make your class inherit (from any other arbitrary class). ”
There are people who state cold-bloodedly that inheritance were the best and original core-feature of object oriented programming (and OOP began with Java, you know? I hope, you did not). Avoid them like the plague.

“Avoid threads (to avoid problems with threads). ”
And, by all means, avoid to demonstrate that with a little investment in good documentation, anybody can master threads and write more efficient software.

“Initialize a value to just any variable(, to avoid the variable to be NULL). ”
Because the fact that a variable has not yet had an opportunity to adopt a value is generally not interesting to anybody. Base your programming work on assumptions that is what everybody does. And see in which state they left IT. Do not be like anybody.

“Make sure that each function returns a value. ”
It does not matter which value, just return anything, because someone might have a use for it, some day, somewhere... or find a use for it... and pay... What do I know?
[Edit: Likewise, try to understand the difference between procedure and function and try to integrate that knowledge in your programming activity; see comments-section, below. On the other hand, people who do it right, are anyway not concerned by this whole topic]

“Always try to catch and handle each possible exception. ”
Let's say an OutOfMemory-Exception... You do not want your users to note, what your program just achieved on their system, do you? Read the history of the Apache-Tomcat Web-Container and hear from me how anybody came to accept the frequent crashes as part of their everyday-work. Do not be like anybody.

It rests with you to note where the recommendation fails to meet the requirements. Dare!

To conclude, I present you a PDF-file that may look familiar to you, apart from the fact, that you have not yet had opportunity to admire the bookmark-tree..: Paradigms exist to be broken.

Ω
©2012-2016 Michael Uplawski <michael.uplawski@uplawski.eu>

Posted in Incomprehension and Antagonism, Boundless Praise, Programming and creativity

Views 314 Comments 2

« Prev Main Next »

Total Comments 2

Comments

I think the difficulty comes from the fact your headers are not nested, right? e.g.

Quote:

Originally Posted by Michael Uplawski

No, it is rather that the XML or XHTML documents are read from top to bottom and the templates first applied in the order of the appearance of triggering tags, then within a template for the same reason and in the same way.

This means, that you would have to process the same XML-document twice to first apply the templates on all the headers, to produce the bookmark-tree, than once again to produce the transformed output. What I do, is rather “anticipate” the headers. Or even: presume that my template for headers will add id-attributes. These are my own creation and thus my for-each (or group) loop trusts me that the ids will be there. I decouple bookmark-tree from content and am still “dynamic” in that an arbitrary number of arbitrarily structured headers will be honored...

if this is English.

Posted 10-29-2016 at 04:57 AM by ntubski ntubski is offline

Updated 10-29-2016 at 06:02 PM by ntubski (Attribution)

Quote:

Originally Posted by ntubski

I have seen things like "every function returns a value", meaning if you write something that doesn't return a value, it's not a function, it's a "procedure" (i.e., descriptive, not prescriptive). There's also the idea from the ML-family of languages, that every function returns a value, even if it's the "unit" value which contains 0 bits of information (it's equivalent to "void" in C, but treating it as a value helps reduce special cases in the way functions are handled, so I think it's rather nice).

As your background is complete and your reflection organ functional and oiled, you can ignore this point. ;-)

I have worked with C/C++, Java and Ruby, only and only a few years with each. All three universes are populated with people who do not know the difference or have been educated to ignore them. Ada, VB and JavaScript could actually help to enlighten those with a restricted knowledge of programming languages, like myself.

In the context of the document, above, your statement is important and well placed below my text. I may be erring but believe the general character of my explanations appeals to a few types of people with diverse programming experience.

There is also a chance of cultural differences between Europe/Germany and your own professional environment. In this case, I would have to adapt the entry and enlarge on procedures against functions. We have had this kind of discussion in a forum of linguists and they never end with a satisfactory conclusion...

But hey. Maybe the others just read this comments-section and it will be fine, too... ;-)

Posted 10-29-2016 at 05:59 AM by Michael Uplawski Michael Uplawski is offline

Updated 10-29-2016 at 06:00 AM by Michael Uplawski