Monday 18 February 2008

Re-arrange the order of elements in an XML document

Often people need to reorganize their XML for further processing, and this is a great opportunity to clarify available techniques.

You can use xsl:copy and/or copy-of, although I doubt your goal in practice will always be as simple as that.


Input xml:
<ROOT>
<A1>A</A1>
<B1>B</B1>
<C1>C</C1>
<D1>D</D1>
</ROOT>

Required output xml:
<ROOT>
<A1>A</A1>
<D1>D</D1>
<B1>B</B1>
<C1>C</C1>
</ROOT>


If you need to restructure the children of the root element for example, and you're confident about the simplicity of the XML you're trying to restructure, then you'd do something like this:
<xsl:template match="/ROOT">
<xsl:copy>
<xsl:copy-of select="A1"/>
<xsl:copy-of select="D1"/>
<xsl:copy-of select="B1"/>
<xsl:copy-of select="C1"/>
</xsl:copy>
</xsl:template>

The above code will output what we've planned for. Now, if you have a much larger document, and what you're really after is moving some of the nodes around while maintaining an abstract hierarchy, you could use the technique in example code below.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="*">
<xsl:apply-templates select="self::*" mode="copy"/>
</xsl:template>

<xsl:template match="A1">
<xsl:apply-templates select="self::*" mode="copy"/>
<xsl:apply-templates select="../D1" mode="copy"/>
</xsl:template>


<xsl:template match="B1">
<xsl:apply-templates select="self::*" mode="copy"/>
<xsl:apply-templates select="../C1" mode="copy"/>
</xsl:template>

<xsl:template match="D1"/>
<xsl:template match="C1"/>

<xsl:template match="*" mode="copy">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>


In this case you are copying an abstract document structure and repositioning some of the elements in strategic places.

Thursday 14 February 2008

When to use xsl:if and xsl:when

In the last few weeks I have been approached by work colleagues for advice on this subject a couple of times, so I guess there must be more people out there that will either be searching the web for answers or aren't entirely clear about when xsl:if or xsl:when should be used, and the constraints of using either.


I am sure most people understand that xsl:if represents a conditional statement, just like most languages do, in fact I can't thing of a language that doesn't. Even XPath 2 now offers conditional statements.

<xsl:if test="/pub[@name='The Elephants Head']/@status = 'closed'">dammit, what do we do now?</xsl:if>

If The Elephants Head is closed, we have ourselves a good question as the text node output of this conditional statement, "dammit, what do we do now?".

The difference in conditional statements between XSL and other languages is that elseif and else are represented in a different construct by xsl:when and xsl:otherwise, both children of xsl:choose.

<xsl:choose>
<xsl:when test="/pub[@name='Elefants Head']/status = 'opened'">top banana, I'll have a pint of Krony please</xsl:when>
<xsl:otherwise>pants, we'll drink at The Mixer instead</xsl:otherwise>
</xsl:choose>

There we go, in this case we have a far more structured conditional statement if The Elephants Head pub is opened "top banana, I'll have a pint of Krony please", although if it is closed closed, I know exactly what to do next, "pants, we'll drink at The Mixer instead".

So lets say I am an indecisive guy, and am not quite sure what pub to go to, but have some solid perquisites to help my decision, alongside a psychic bit of information in the form of an XML document:


<?xml version="1.0" encoding="UTF-8"?>
<pubs>
<pub name="The Elephants Head" matesInside="0" status="opened"/>
<pub name="The Devenshire Arms" matesInside="0" status="closed"/>
<pub name="The Worlds End" matesInside="2" status="opened"/>
<pub name="The Good Mixer" matesInside="3" status="opened"/>
</pubs>


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/pubs">
<xsl:choose>
<xsl:when test="pub[@name='The Elephants Head'][@status = 'opened'][@matesInside &gt; 0]">top banana, I'll have a pint of Krony please</xsl:when>
<xsl:when test="pub[@name='The Devenshire Arms'][@status='opened'][@matesInside &gt; 0]">head off to the Devensire, and see dead people</xsl:when>
<xsl:when test="pub[@name='The Worlds End'][@status='opened'][@matesInside &gt; 0]">head off to the Worlds End, to check out the gigs</xsl:when>
<xsl:otherwise>pants, we'll drink at The Mixer instead, someone will turn up soon enough</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>


So here I go walking down Camden High Street, as I approach The Elephants Head, I look inside and see the pub is opened and check if there are any of my mates inside, if this condition is true, mate, I'm on my way to the bar to get that cold pint of Krony.

Now lets consider the XML document above, and as you can see Billy no mates me at the Elephants, there's no one in, dammit. In this case XSL has an hierarchical decision making engine built in, that helps prioritizing the options in ascending position order.

First I check if the Devenshire Arms is opened, and weather I'm facing yet another Billy no mates scenario here. In this case as defined by the XML document the Dev is closed, so no one's there at all, bloody goths, probably sitting in a corner at home feeling sorry for themselves.

Ok . . . whats' the next prospect here . . . we have The Worlds End, it's opened, and guess what, 3 of my mates are inside enjoying the beverage . . . hurrah, I guess "head off to the Worlds End, to check out the gigs" it's the option I'd go for in this instance.

Obviously as oblivious as I was (bless), there were already about 3 other mates comfortably on the booze for some time at the Mixer, and failing all of the previous options eventually my fate would indeed be "pants, we'll drink at The Mixer instead, someone will turn up soon enough".

So to recap xsl:if provides a mechanism to make a boolean evaluation of one or more xpath expressions, and xsl:choose, provides the same boolean evaluation mechanism hierarchically via a collection of xsl:when with the addition of a graceful fail over option.

I have also been asked frequently about combining xsl:if and xsl:choose/xsl:when. Is this bad practice?, I am asked. The answer is:It depends on the circumstances, it's frequently required to have both in the same algorithm, to assure you deliver an accurate output. Often the combination of xsl:if and xsl:choose might improve performance.
For example I knew the code above would not need to be executed unless I knew at least one of the pubs was opened, so I could wrap the xsl:choose in an xsl:if

<xsl:if test="pub[@status = 'opened']">
<xsl:choose>
<xsl:when test="....
</xsl:choose>
</xsl:if>

This case would probably be past 3:00am or something, and I wouldn't even be considering any of those options. Saves processing power and clearly documents that unless at least one of the pus is opened, there isn't any reason to even consider any of those options.

Sometimes is also useful to nest xsl:if inside an xsl:when, to further filter your options:

<xsl:choose>
<xsl:when test="...
<xsl:when test="pub[@name='The Worlds End'][@status='opened'][@matesInside &gt; 0]">
<xsl:if test="@timeToShut = 'last orders'">get to the bar as quickly as possible</xsl:if>
</xsl:when>
<xsl:otherwise...
</xsl:choose>

or even xsl:choose inside xsl:when

<xsl:choose>
<xsl:when test="...
<xsl:when test="pub[@name='The Worlds End'][@status='opened'][@matesInside &gt; 0]">
<xsl:choose>
<xsl:when test="pub/mate/pint[@status='nearely finished']">
<xsl:text>order </xsl:text>
<xsl:value-of select="count(pub/mate/pint[@status='nearely finished']) + 1"/>
<xsl:text>pints.</xsl:text>
</xsl:when>
<xsl:otherwise>it's only one pint for me, thank you.</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise...
</xsl:choose>

Tuesday 12 February 2008

XPath expressions special characters in XSL

There is a common misunderstanding about the use of special characters by Xpath expressions in XSL and Xpath expressions in other languages, I'll try to help clarify some of the myths in this blog.

If you were using Java or C#, the actual expression would be either passed as a string parameter to a constructor method or any other method, or even just stored in a string variable to then be used as a parameter to one of those methods. Simple. Although, if you'd consider the following example, you'd have to escape the double quote characters with a back slash character.

String xPathExpr = "/root/my/node[@attribute='\"this & that value in quotes\"']";

In XML you have similar constraints, if you try to put a double quote character inside the value of an attribute, the XML parser thinks the value of the attribute ends on the first occurrence of the double quote character, therefore the document won't be a valid XML document. Unless of coarse you replace the character with the equivalent character reference &#34;.

Because XSL documents are XML documents, some of these rules need to be taken into account. By default most XSLT Procesors allocate entities to at least 3 of those character references:
<!ENTITY amp CDATA "&#38;" -- ampersand, U+0026 ISOnum -->
<!ENTITY lt CDATA "&#60;" -- less-than sign, U+003C ISOnum -->
<!ENTITY gt CDATA "&#62;" -- greater-than sign, U+003E ISOnum -->

Meaning you can use those three by preceding the entity name with an "&" character and terminate the entity refference with an ";" character.
&amp; = &
&lt; = <
&gt; = >

So if you were to use the earlier XPath expression in XSL it'd look like this instead:

<xsl:variable name="myNode" select="/root/my/node[@attribute='&#34;this &amp; that value in quotes&#34;'"/>
or
<xsl:variable name="myNode" select="/root/my/node[@attribute='&#34;this &#38; that value in quotes&#34;'"/>

The characters > and < are often used in XPath expressions that evaluate numerical values.

<xsl:variable name="isMyOtherAttributeGreater" select="/root/my/node/@attribute &gt; /root/my/other/node/@otherAttribute"/>
This expression returns a Boolean value, of true when the value of @attribute of the first XPath expression is greater than value of the @otherAttribute in the second expression.

You could also combine < and > with the "=" character to construct the greater than or equal and the less than or equal evaluation expressions.
@attribute &gt;= @otherAttribute
@attribute &lt;= @otherAttribute

<xsl:variable name="myNode" select="/root/my/node[@attribute &gt;= ../node[1]/@otherAttribute]"/>


see following links for further info on entities and character encoding in XML:
http://www.xml.com/pub/a/98/08/xmlqna0.html
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
http://www.w3.org/TR/html401/sgml/entities.html