Discover the Wonders of XSLT

By Benoît Marchal This is the first article in a new series introducing XSLT. XSLT is an acronym for XML Stylesheet Language Transformations, but I believe the W3C should change it into XML Scripting Language. Over the years, I have used XSLT to publish Web sites, to generate PDFs from documentation, to prepare e-commerce transactions, to build Web services, to import documents in databases, to construct UML models, to pre- or post-process articles, to generate Java code, ... you name it. If it involves manipulating an XML document, chances are XSLT is my favorite solution. Obviously, there's nothing you can do with XSLT that can't be done with straight Java or C#. Why bother learning a new language, then? Because XSLT is highly specialized, you will find that coding is faster and more maintainable.

Getting the Tools
Before going any further, you need to install an XSLT processor. Chance are there's already one on your machine because both Microsoft and Java ship with one. Microsoft's XSLT processor is MSXML. There's a command line interface that is great for testing, or you can call the processor from your application through the .NET run-time. On Java 1.4 or above, the XSLT processor is available via the javax.xml.transform package. For this series, I recommend that you install Eclipse and the ananas.org XM plugin. Eclipse is an IDE available on most platforms (Windows, Linux, and MacOS X). Refer to "Using XML for Web Publishing" for more details.

XSLT Basics
Listing 1 is a very simple stylesheet to show you what XSLT looks like. It takes an XML article and publishes it as an HTML page. Download the listings for a sample XML document. Listing 1: basic.xsl <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:a="http://psol.com/2004/article" version="1.0"> <xsl:output method="html"/> <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> <xsl:template match="a:body"> <body> <xsl:apply-templates/> <p>This page was made with XML and XSLT.</p> </body> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template>

that's an element name. as we will see in a minute). Look at the following template: <xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> It inserts an horizontal line after the section content. the template: <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> specifies that <a:article> in the source becomes the <html> in the result. the line would appear before the section so that <xsl:apply-templates/> represents the section content. make sure the namespace has been declared properly. In the above example. It needs a version attribute and the value must be "1. The a:section/a:title path selects the <a:title> elements as a child of <a:section>. and not the opposite. In other words. In most cases. The root of the stylesheet is the <xsl:stylesheet> element.<xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> <xsl:template match="a:info/a:title"> <head><title><xsl:apply-templates/></title></head> </xsl:template> <xsl:template match="a:section/a:title"> <h1><xsl:apply-templates/></h1> </xsl:template> </xsl:stylesheet> An XSLT stylesheet is an XML document itself (this has several implications. it's the number one cause of problems that my students have. The instructions must appear in the http://www. the processor inserts the article content between the <html> tags. When there's a risk of confusion. For example. If you encounter problems with a stylesheet. The match attributes select to which source elements the template applies.w3. Then come the templates." Below the root comes the <xsl:output> element that specifies whether the result is an HTML. or text document. The position of <xsl:apply-templates/> in the template is important because it determines where the element content appears in the result. as we'll see next month) to test on the element ancestor. The <xsl:apply-templates/> instruction is a placeholder for the content of the element. the root of the XML document becomes the HTML root.org/1999/XSL/Transform namespace. Note that it's <a:title> as a child of <a:section>. you can specify a path (or conditions. If <hr/> is placed before the <xsl:applytemplates/>.0. XML. . Each template is a rule that transforms one or more elements from the source document into one or more elements in the result.

A stylesheet is an XML document and it must respect the XML syntax.xsl" type="text/xsl"?> It tells the processor which stylesheet applies to the document. The path to the document titles is the following: /a:article/a:info/a:title . let me stress that the processing instruction appears in the XML document. It offers instructions to create elements. which is a querying language XSLT itself. download the listings and open the sample2. it allows you to retrieve values from the input document. so <hr> is written as <hr/>. I'd like to draw your attention to syntax issues. which is a scripting language with an XML syntax A style sheet describes how to convert the input document into the output. but are adapted to the XML syntax. The first part was about tools and the basic syntax. XSLT deals with generating the output. Next month. The listings also includes a small exercise so you can practice what you have learned. you need to modify the document. For example. the processor will remove the trailing slash. if you want to apply another stylesheet to a document. Make sure you download the updated listings before reading any further. you will notice that the XML documents start with the following processing instruction: <?xml-stylesheet href="basic. Discover the Wonders of XSLT: XPaths By Benoît Marchal Go to page: 1 2 Next This is Part 2 of the developer. I recommend you read it first.Finally. not the stylesheet! So.xml file. In case it is being misunderstood. XPath deals with the input. and more XSLT instructions. An empty element follows the XML convention. As you work with the listings. (Don't worry.com introduction to XSLT. XPaths are not unlike file paths and URLs. attributes and other XML markup in the output.) Testing and Exercise I encourage you to download the listing and run the example for yourself. XPaths The style sheet language is made up of two W3C recommendations: • • XPath. which means that: • • Elements need both a starting and ending tag (in HTML you often dispense with the ending tag). attributes. we will cover XPath.

as opposed to the children. change the current node. in most cases a relative one. a list of nodes that match the XPath. The single and double dot (./a:para selects all the paragraphs in the section. Assuming the current node is /a:article. or more nodes. Make sure you declare the namespace prefix in the style sheet as well (see the example below). the template match attribute contains an XPath. Indeed. Again. the children of . as many nodes as paragraphs in the section. just like the way that a file path lists all the directories leading to the file you're interested in.e.. you still need the forward slash between the attribute name and its parent.. they must include both the namespace prefix and the local name. XPaths may also be relative to the current node. The descendants include the children. the XPath selects all the paragraphs in the section. the XSLT processor keeps track of the current node. If the current element is a paragraph. /. such as xsl:apply-templates and xsl:for-each (see below). To select all the paragraphs in the body. from there. prefix its name with the @ character. selects the paragraph's parent (the section). The separator is the forwards slash. Therefore.Essentially. Some instructions. one... use this XPath: . Note that this XPath may return a node set with several nodes. an XPath lists all the elements that lead to the one you're interested into. the concept is very similar to file paths that can either start from the root (or a disk under Windows) or be relative to the current directory. a:info/a:title You may recognize this XPath from the style sheet in the previous article. The node set for the XPath above contains only one node (the article title). The element names in an XPath must be fully qualified. The .) represent the current element and the parent of the current element respectively./. i.. An XPath returns a node set. the children of the children. and . in fact. relative XPaths start with an element name. As it interprets the style sheet. of the element. Relative XPaths The previous example was for an absolute path because it starts from the root of the document. . i. Absolute XPaths start with the forward slash. The following (relative) XPath selects the link's URI if the current node is a section: a:para/a:link/@uri The @ is not a separator but a prefix identifying attributes.e. the following XPath points to the article title. Attributes and other special cases To include an attribute in an XPath./a:section/a:para Using two slashes as a separator // selects amongst the descendants. A node set may contain zero (which most likely indicates an error in the XPath).

The curly brackets are part of XSLT. Used together. /. Regular Structure When working on a style sheet. they indicate that the content of the attribute is an XPath.the children of the children. The at symbol is part of XPath. the processor assumes that the content is a literal. [ and ]. mark the content of the attribute as an XPath. immediately after the element on which the condition applies. [ and ]. If the curly brackets are missing. The following absolute XPath selects all the titles (article and section titles): /a:article//a:title Predicates To conclude this section on XPaths. If it returns several nodes. and so on. Predicates allow you to specify conditions that must apply to an element. For example. { and }. it indicates the path points to an attribute. xsl:for-each loops over the node set. . this XPath uses the count function to select the paragraph from a section that has only one paragraph: //a:section[count(a:para) = 1]/a:para Note that the predicate appears after the element on which it applies. The XPath should return one node only. make sure you have not forgotten the curly brackets. it may be easier to use the xsl:for-each and xsl:value-of instructions. A whole set of functions also is available (see an XSLT reference for a complete list of functions). let's look at predicates. The predicate appears between square brackets. with the predicate indicators.org/TR/xslt'] Predicates allow you to compare an XPath (@uri in this example) with a literal or another XPath. but they serve completely different roles.w3. xsl:value-of prints the content of the first element in a node set. Students of XSLT often confuse the curly brackets and the at symbol. Then. A quick debugging tip: If you can't get what you want in an attribute. they allow you to loop over and format the result of an XPath. which is not necessarily the last element in the XPath. The following XPath selects links pointing to the XSLT recommendation: //a:link[@uri='http://www. the processor retains only the first one. Be careful not to confuse the separator. Attributes Attributes have a weird syntax in XSLT: <a href="{@uri}"> The curly brackets. Both are related to attributes. the output may be structured and repetitive.

.0" xmlns:xsl="http://www./a:info/a:title"/></h1> <h2>Table of Contents</h2> <ul> <xsl:for-each select="a:section"> <li><xsl:value-of select="a:title"/></li> </xsl:for-each> </ul><hr/> <xsl:apply-templates/> <p>This page was made with XML and XSLT.w3."/></p> </xsl:for-each> Be warned that xsl:for-each changes the current node.org/1999/XSL/Transform" xmlns:a="http://psol. to print the paragraphs.</p> </body> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template> <xsl:template match="a:section"> <xsl:apply-templates/><hr/> </xsl:template> <xsl:template match="a:info/a:title"> <head><title><xsl:apply-templates/></title></head> </xsl:template> <xsl:template match="a:section/a:title"> <h2><xsl:apply-templates/></h2> . Listing 1: updated style sheet <?xml version="1.For example. A predicate differentiates the templates for bold and italics. A New Style Sheet Listing 1 is an updated style sheet that demonstrates the techniques introduced in this article: • • • The template for the body now prints the article title using an xsl:value-of and a table of content through an xsl:for-each. which is most likely not what you want. you could write: <xsl:for-each select="/a:article/a:section/a:para"> <p><xsl:value-of select=".com/2004/article"> <xsl:output method="html"/> <xsl:template match="a:article"> <html><xsl:apply-templates/></html> </xsl:template> <xsl:template match="a:body"> <body> <h1><xsl:value-of select=". so it is crucial that you use a relative XPath in the loop! An absolute path would select data outside of the loop.0"?> <xsl:stylesheet version="1. The style sheet inserts hyperlinks by using the special syntax for attribute contents.

we will cover more XSLT instructions.com's introduction to XSLT. if you change the style sheet. The listings also include a small exercise so that you can practice what you have learned. This month. we will cover more advanced techniques that simplify XSLT coding. The first two parts (Part One and Part Two) have introduced the most fundamental XSLT instructions: • • templates and loops as a means to transform an XML document into either HTML or another XML document XPaths and predicates as a querying language to extract data from an XML document Together.</xsl:template> <xsl:template match="a:link"> <a href="{@uri}"><xsl:apply-templates/></a> </xsl:template> <xsl:template match="a:em"> <i><xsl:apply-templates/></i> </xsl:template> <xsl:template match="a:em[@role='bold']"> <b><xsl:apply-templates/></b> </xsl:template> </xsl:stylesheet> Testing and exercise I encourage you to download the listing and run the example for yourself. Tests One could argue that we have covered testing already through predicates. Next month. Remember to adapt the processing instruction. Discover the Wonders of XSLT: Advanced Techniques By Benoît Marchal Welcome to the third installment of Developer. XSLT offers two instructions for tests: • • xsl:if is the standard if statement xsl:choose is a switch statement that allows you to combine multiple tests and implement if/then/else The simplest test looks like the following: <xsl:if test="count(a:para) > 1"> . the first two parts cover 70% of XSLT coding needs and you write many fine stylesheets using only these techniques. Yet there are cases where a simple if/then/else would do the job faster and more cleanly than predicates. as explained in Part 1.

The processor will output the content of the xsl:if element if test evaluates to true. the test.. you could use an xsl:choose with a single xsl:when and a single xsl:otherwise. The xsl:choose statement is a more sophisticated test. the content of xsl:otherwise. the xsl:otherwise statement is optional. I use the technique in the first xsl:when.</p> </body> </xsl:template> . if the element does not exist. and XSLT statements: <xsl:template match="a:body"> <body> <h1><xsl:value-of select=". in templates. the XPath will return an empty node set. failing that. loops. To implement an if/then/else statement. In the above example. XML elements. A quick tip: The empty node set evaluates to false so to test for the presence of an element. Be careful with the order of xsl:when statements because the processor will output the first one that is true only. the test attribute holds. XML elements. the XPath will return a non-empty node set.<p><xsl:value-of select="count(a:para)"/> paragraphs</p> </xsl:if> As the name implies. as follows: <xsl:choose> <xsl:when test="not(a:para)"> <p>no paragraphs</p> </xsl:when> <xsl:when test="count(a:para) = 1"> <p>one paragraph</p> </xsl:when> <xsl:otherwise> <p><xsl:value-of select="count(a:para)"/> paragraphs</p> </xsl:otherwise> </xsl:choose> The processor will output the content of the first xsl:when whose test attribute evaluates to true or. The content can be any combination of text literal./a:info/a:title"/></h1> <h2>Table of Contents</h2> <ul> <xsl:for-each select="a:section"> <li><xsl:value-of select="a:title"/></li> </xsl:for-each> </ul><hr/> <xsl:apply-templates/> <p>This page was made with XML and XSLT. and XSLT instructions.. you learned to write the text literals and XML instructions as you want them to appear in the output. Generating Output: Text Literals So far.. A typical template mixes text literals. If the element exists. and tests. Also. it suffices to write the appropriate XPath in the test attribute.

but it is seldom needed in practice. you just want to write the attribute as a literal.</xsl:attribute> </xsl:if> <xsl:apply-templates/> </a> The xsl:attribute has a name parameter with the attribute's name and an optional namespace parameter with the attribute's namespace.marchal. It is mostly identical to just typing the text literal with one simple difference: xsl:text preserves the spaces. Generating Output: elements For completeness. elements.'http://www. the processor could remove the space as part of the normalization process). the most common application of xsl:text is the following: <xsl:text> </xsl:text> to insert one blank space (without the xsl:text instruction. In most cases. xsl:text is an XSLT statement that generates a text literal.com')"> <xsl:attribute name="style">color: red. The following example marks hyperlinks to my Web site in red: <a href="{@uri}"> <xsl:if test="starts-with(@uri. it is a mistake to insert a text literal or any instruction that will insert text before xsl:attribute.There are cases where you need more control over the output. like this: <a href="{@uri}"> But xsl:attribute is an XSLT statement. note that the xsl:element statement exists. use xsl:text. xsl:attribute must appear before any other children. In practice. so it can appear wherever a statement can appear. About the only sensible application is to compute an element name: <xsl:element name="record-{position()}"> <xsl:apply-templates/></xsl:element> Output While we're on the matter of generating output. The XSLT processor will normalize most text literals which is the sensible behavior in most cases. It is mostly similar to xsl:attribute. and attributes. I like to think that the processor has not yet closed the start tag (>) when it encounters the xsl:attribute statement. Uf you absolutely need the spaces though. . Generating output: attributes xsl:attribute adds an attribute to the current element. let's return to the very instruction in any xsl:output stylesheet. XSLT offers special instructions to generate text literals. It is useful mostly for tests.

com). such as processing orders in an e-commerce setup. such as UTF-16. you will come to appreciate this. it is equally easy to integrate style sheets in more serious production environments. C#. doctype-public and doctype-system control the DOCTYPE statement required by some XML vocabularies.barebones. the most frequent complaint is that XSLT is a verbose language. students appreciate the power of the language but few like the syntax. This is handy for debugging. I trust that. ISO-8859-1 (Latin-1). xsl:output controls whether the processor generates an HTML. XSLT Quirks When I teach XSLT in seminars. indent set to yes tells the processor to indent the code. The default is UTF-8. omit-xml-declaration removes the XML declaration from the output document. and more.texteditors. you may want to consider a good XSLT editor.com). C++.com). Truly. XSLT is a tool with many uses. introduced in Part 1. are a declarative set of rules to render and format XML elements. The XSLT syntax has one distinct advantage: It makes it almost impossible for your style sheets to produce invalid XML documents. I recommend XML Buddy (www. Keep practicing. which specifies the encoding. you begin to appreciate how versatile the language is. Many students like XML Spy (www. In my experience. My answer is twofold: • First. or XML document. instead. The templates. . It is not used often. With time. The developer need not explicitly call the templates.As you learned in Part 1. Basic. The most useful attributes are: • • • • encoding. each template specifies (in its match attribute) to which element it applies and the XSLT processor automatically calls the relevant templates. as your experience with XSLT grows. Still. After a few exercises. xsl:output supports more attributes that give you a lot of control over the output document. Congratulations! You have learned how to create efficient style sheets to process your XML documents.xmlbuddy. Discover the Wonders of XSLT: XSLT Quirks By Benoît Marchal You've made it to Part 4 of this XSLT introduction at developer. I do most of my coding with BBEdit (www. • Coding Styles XSLT is a very specialized language with a distinct declarative flavor. but you can specify any valid encoding. The declarative flavor is in sharp contrast with more generic languages (Java. or Pascal) where the developer has to call methods explicitly. XSLT's declarative flavor is the source of some confusion. text. As easy as it is to hack a style sheet quickly to reformat a document. Both offer syntax coloring.altova.com.com) or Boxer (www. On the Eclipse platform.

Some concepts are trivial in the declarative style and nearly impossible to write with the procedural style. A table would be a good example. until they hit a wall.. Rewriting Specifically. You will find that the declarative style is more powerful than the procedural one. it will be more readable and more maintenable. Functions . stick to the declarative style. If in doubt. Some of my students jump on these instructions and never look back to declarative templates again. In the previous four articles. The preceding code uses one template and then a test. if. To avoid problems. and choose instructions. The final article moves to more advanced subjects such as working with functions and multiple files. A table has a very regular structure (columns and cells within columns). is biased towards the first and this bias should be reflected in your style sheets.com. So.Part 2 and Part 3 of this series introduced a more procedural flavor to XSLT with the for-each. The more correct expression in XSLT is to use two templates. and let the processor select the more appropriate one: <xsl:template match="a:para[@type='bold']"> <p><b><xsl:apply-templates/></b></p> </xsl:template> <xsl:template match="a:para"> <p><xsl:apply-templates/></p> </xsl:template> In the long run. you should realize that there are two different coding styles in XSLT: the declarative flavor that uses templates and predicates. and the procedural flavor with loops and tests.. the series has covered the essentials of XSLT coding. XSLT. what should you do with the looping and test instructions? Think of them as shorthand for documents with a regular structure. you might be tempted to write the following code: <xsl:template match="a:para"> <xsl:choose> <xsl:when test="@type='bold'"> <p><b><xsl:apply-templates/></b></p> </xsl:when> <xsl:otherwise> <p><xsl:apply-templates/></p> </xsl:otherwise> </xsl:choose> </xsl:template> Don't. Discover the Wonders of XSLT: Workflows This article concludes the introduction to XSLT at developer. so using the shorthand notation of a loop will be more readable. one for each condition. as a language.

such as count() and not(): count(a:para) A function takes zero. Unfortunately. to use the result. much is left as implementation details that create serious incompatibilities among XSLT processors. However. . check your processor documentation. you turn to the familiar value-of instruction: <xsl:value-of select="count(xxx)"/> As always. Python. one. conversion). chances are EXSLT covers them. if the function/XPath returns multiple values. If you still need a function. not every processor supports EXSLT. or more arguments and computes a result. because it's a voluntary effort and not part of the official W3C recommendations. Still. you must forego portability and tie yourself to one specific XSLT implementation. or a node set. Therefore. indexing (key search). C#. boolean (negation). JavaScript. for those that return node sets. EXSLT defines standards for the most commonly requested extensions. or other languages. you will need the for-each or applytemplates instructions instead: <xsl:for-each select="a:section[count(a:para) > 1]"> Predefined Functions and Extensions XPath and XSLT include functions to cover most common needs: string manipulation (substring. While this simple workflow is appropriate for basic applications. The XPath and XSLT recommendations themselves do a good job at documenting the function. you may want something more sophisticated. don't use extensions. you will find that many algorithms are best implemented through XSLT native (and portable) templates. although the major ones do. a string. Unless your needs are really exotic. We have already encountered functions. number manipulation (sum. length). the W3C has not fully defined the extension mechanism. If this is unacceptable to you. and more. XSLT offers a halfbaked extension mechanism that links with functions written in Java. to implement a function. in place of an element: a:section[count(a:para) = 1] current()/a:para Because functions appear in XPaths. Functions can appear in predicates or. Again. First. Much of the power of functions arises from their integration with XPaths. I suggest you bookmark the recommendations. The result may be a number.Functions are implemented in XPath so they are valid wherever an XPath is valid. if at all possible. you will find yourself looking for your favorite "insert name here" function. Many Documents The default workflow with XSLT is to process one file through one style sheet. there are two workarounds. As you become more familiar with XSLT. check EXSLT.

xml')/p:Parameters/p:Param[@id='1'] The usual combination of for-each and apply-templates instructions offers many options to process the second document: <xsl:for-each select="document('params. as we saw in the Functions section: document('params.0 (see below) supports multiple outputs.xml')/p:Parameters/p:Param"> Typically. Think of an photo gallery where the style sheet generates as many HTML pages as there are photos in the input document.Figure 1: Four common XSLT workflows Figure 1 illustrates four common workflow options. document() accesses parameter files. The document() function (see below) opens multiple input documents but it still produces one output only. fourth. Finally. document() Function The document() function opens a second (or a third. and so on) input document. one XML document is the input for a style sheet that produces one document. opens the file. . The function takes the URI to the file. and returns a node set with the file content. a batch engine extends the XSLT processor to work with directories and file hierarchies instead of isolated files. parses it. Because the result is a node set. XSLT 2. clockwise: • • • • The default workflow. you have been using such a batch engine. If you followed the exercises throughout the series. you can query the result with an XPath. (XM). It is also handy to combine several documents into one output.

By using the substring-before() function.result-document What about the opposite.0. the draft XSLT 2. chances are your XSLT processor does not implement it (most processors have a proprietary alternative. taking an XML document and splitting it in multiple output documents? There is no solution with XSLT 1. At the time of writing.0 has not been formally approved at the time of writing. dates are displayed in the ISO format: 2004-02-08. consult your processor documentation if you need this feature. though). you can reformat it to the more common 02/08/2004 format. Furthermore. <xsl:result-document href="photo-{@id}. So far. Basically. anything that appears within a result-document element is written to a separate file.html"> <!-. Again. . so this feature may still change..0 proposes the result-document element. --> </xsl:result-document> A word of warning: XSLT 2.0 but support for multiple output documents will be added in XSLT 2...

Sign up to vote on this title
UsefulNot useful