You are on page 1of 56

XML and XSL Overview

by Alex Chaffee alex@jguru.com, http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource
FAQs and News and other cool stuff

XML
eXtensible Markup Language Replacement for HTML Metalanguage - used to create other languages Has become a universal dataexchange format

Advantages of XML
Human-readable Machine-readable (easy to parse) Standard format for data interchange Possible to validate Extensible
can represent any data can add new tags for new data formats

Hierarchical structure (nesting)

Why not HTML?


Browsers are too lenient Led to sloppy HTML code all over the Web
<imG src="foo.gif> is "legal" HTML

Told HTML, "go to your room and don't come out until it's clean"
Out came XML

XML Searching and Agents


An early motivation for XML Allows detailed queries of disparate data sources
Find best price for certain product Search for properties with different real estate brokers

HTML insufficient
Good for humans, bad for computers Doesn't scale

XML Example
<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> </menu>

XML Languages
MML - musical scores CML - chemicals HRMML - Human Resource Management (???) MathML - equations RSS - web syndication

Tag vs. Element


A tag is a name, enclosed by angle brackets, with optional attributes
<foo id=123>

An element is a tree, containing an open tag, contents, and a close tag


<foo id=123>This is <bar>an element</bar></foo>

XML Syntax
Tags properly nested Tag names case-sensitive All tags must be closed
or self-closing <foo/> is the same as <foo></foo>

Attributes enclosed in quotes Document consists of a single (root) element A few other details

Well-Formed vs. Valid


Well-Formed:
Structure follows XML syntax rules

Valid:
Structure conforms to a DTD

DTD
Document Type Definition A grammar for XML documents Defines
which elements can contain which other elements which attributes are allowed/required/permitted on which elements

DTD and Data Exchange


Both sides must agree on DTD ahead of time DTD can be part of document or stored separately

DTD Example
<?xml encoding="US-ASCII"> <!ELEMENT menu (meal)*> <!ATTLIST menu name CDATA #OPTIONAL> <!ELEMENT meal (food|drink)*> <!ATTLIST meal
name CDATA #REQUIRED>

<!ELEMENT food (#PCDATA)*> <!ELEMENT drink (#PCDATA)*>

Why isn't a DTD in XML?


It will be someday: XSchema

XML Namespaces
A single document can use multiple DTDs But! Two DTDs can use the same element name with different rules Solution: Namespaces Must prefix tag name with namespace name
e.g. <xsl:apply-templates select="."/>

Entities
Macros / constants Values defined once, used in document
<!DOCTYPE foo SYSTEM "foo.dtd" [ <!ENTITY background "#99FFFF"> ]> <BODY BGCOLOR="&background;">

SML / Minimal XML


Simplified Markup Language Subset of XML, but stripped down Easier to understand, parse No
DTDs Attributes Processing instructions etc.

XSL: XML Transformation

XSL
The eXtensible Style Language Transforms XML into HTML Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML

XSL Architecture
XSL Stylesheet

XML Source

XSL Processor

HTML Output

XML is a Tree
<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> name <meal name="snack"> <food>Chips</food> </meal> "breakfast" </menu>

menu

meal

meal

food
"Scrambled Eggs"

food
"Hash Browns"

drink
"Orange Juice"

XML Is A Tree
Nodes
Branch nodes contain children Leaf nodes contain content
Attributes, Values, Entities, etc.

DOM provides API-based access to tree models XSL turns one tree into a different tree

Command Line Invocation


Apache Xalan
java org.apache.xalan.xslt.Process -IN faq.xml XSL faq.xsl OUT faq.html

IBM LotusXSL
java com.lotus.xsl.xml4j.ProcessXSL -in servletfaq.xml -xsl faq.xsl -out faq.html

And so on

Formatting Objects
Forget about it for now

XSLT
The meat of XSL Syntax for making XSL template files Pattern matching Output formatting Rules-based (like Prolog)

XPath
The stuff inside the quotes in XSL patterns
"/person/name/firstname"

A sensible way to locate content in an XML document More straightforward than walking a DOM tree or waiting for a SAX callback

XPath Syntax
book/title
title child of book child of current node

/book/title
title child of book child of document root

@language
language attribute of current node

chapter/@language
language attribute of chapter child of current node

XPath Syntax (cont.)


chapter[3]/para
all the para children of the third chapter

book/*/title
all title children of all children of book (but not of their children)

chapter//para
all para children of any child of chapter, recursively

../../title
title child of parent of parent parent::node()/parent::node()/child::title

XPath Abbreviations
. .. //
@

self::node() parent::node() descendant-orself::node()


attribute::

XPath Functions
para[1] or para[position()=1]
the first para node of the current node

para[last()] para[count(child::note)>0]
all paragraphs with one or more notes

para[id("abstract")]
selects all child nodes like <para id="abstract">

para[@type='secret'] or para[attribute::type='secret']
selects all child nodes like <para type="secret">

XPath Functions (cont.)


para[not(title)]
selects all child paragraphs with no title elements

para[position() >= 2 and position() < last()]


selects all but the first and last paragraphs

para[lang("en")]
matches <para xml:lang="en-uk"></para>

note[contains(., "alex")]
. means "test childrens' content too, recursively" in this context

note[starts-with(., "hello")]

XPath Disadvantages
Not XML
Not hierarchical New syntax rules Weird mix of /,[],(),*,:,::,.,..,@

New function set


Not Java

Concepts like "axis" not always clear

XSLT Syntax

XSL Rules
XSL is a series of rules or templates Each template matches an element Templates can contain XML commands

XSL Commands: apply-templates


Main rule: apply-templates
looks for a template match applies it

Usually the template calls applytemplates recursively on its children If not, then processing stops at that node (but continues for its other siblings that matched this template)

Default Rule
For a leaf node, output its contents For a branch node, apply templates (recursively) (including default rule)

Some XSL Commands


value-of
grabs raw value, good for text elements and attributes

if
executes conditionally

number
counts position of element in group good for ordered list numbering, table of contents, etc.

XSL Example
<?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">

Example (cont.)
<xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1>
[Note: Can reuse contents, unlike CSS]

Example (cont.)
<xsl:apply-templates />
</BODY> </HTML> </xsl:template>

Example (cont.)
<xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template>

Example (cont.)
<xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet>

Outputting Attributes
From This:
<link> <name>Stinky</name> <url>http://www.stinky.com/</url> </link>

We Want This:
<A href="http://www.stinky.com/">Stinky</A>

Outputting Attributes
The Hard Way:
<xsl:element name="A"> <xsl:attribute name="href"> <xsl:value-of select="url" /> </xsl:attribute> <xsl:value-of select="name" /> </xsl:element>

The Easy Way:


<A href="{url}"> <xsl:value-of select="name"/> </A>

Copying Subtrees
<xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy> </xsl:template>

No, I don't understand it either Default copy rule strips all tags/attributes Also copy-of

XSL conditionals: if
<xsl:if test="author"> by <xsl:apply-templates select="author" /> </xsl:if> Note: no else (?!?)

XSL Conditonals: choose


Case 1
<link> <name>Stinky</name> <url>http://www.stinky.com/</url> </link> <a href="http://www.stinky.com/">Stinky</a>

Case 2
<link> <url>http://www.stinky.com/</url> </link> <a href="http://www.stinky.com/">http://www.stinky.com/</a>

Case 3
<link> <name>Stinky</name> </link> Stinky

XSL Conditionals: choose


<xsl:choose> <xsl:when test="url"> <A href="{url}"> <xsl:choose> <xsl:when test="name"> <xsl:value-of select="name" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="url" /> </xsl:otherwise> </xsl:choose> </A> </xsl:when> <xsl:otherwise> <xsl:value-of select="name" /> </xsl:otherwise> </xsl:choose>

XSL Looping: for-each


<xsl:for-each select="chapter"> <h2><xsl:value-of select="@title"/> </h2> </xsl:for-each> Functional overlap with applytemplates
Difference in programming style Use it inside a given template rule

Template Modes
Same element name, different context -> different template, different output Can invoke apply-templates with a mode, matches corresponding moded template
<h1>Table of Contents</h1> <ol> <xsl:apply-templates select="chapter" mode="toc"/> </ol> <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li> </xsl:template> <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/> </xsl:template>

XSL vs. CSS


Similar problem, different solutions CSS takes HTML and applies fonts, styles, positions XSL takes any XML and turns it into anything else XSL more powerful than CSS
e.g. can use same content in multiple places in result document

XSL Disadvantages
Confusing syntax and semantics
Like Prolog+C+XML It's really a programming language, but using markup language syntax yuck!

Hard to debug
XSL Trace helps a little

Don't have full power of, say, Java inside templates


No database access, hashtables, methods, objects, etc.

Still need separate .xsl file for each client device

Other XSL-Based Products


LotusXSL Resin by Caucho Cocoon IBM XSL Trace Xalan (Apache) XT Cocoon Resin Lots more

Links: XML
XML Spec
http://www.w3.org/TR/REC-xml

XML FAQ
http://www.ucc.ie/xml/

Caf con Leche


http://metalab.unc.edu/xml/

XML.com
http://www.xml.com/

Servlet FAQ in XSL


http://www.purpletech.com/servlet-faq/

References
McLaughlin, "Java and XML", O'Reilly Eckstein, "XML Pocket Reference", O'Reilly Harrold, "XML Bible" Bradley, "The XML Companion", Addison-Wesley

Q&A

You might also like