Professional Documents
Culture Documents
by Alex Chaffee alex@jguru.com, http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource
FAQs and News and other cool stuff
XML
eXtensible Markup Language Replacement for HTML Metalanguage - used to create other languages Has become a universal dataexchange format
Advantages of XML
Human-readable Machine-readable (easy to parse) Standard format for data interchange Possible to validate Extensible
can represent any data can add new tags for new data formats
Told HTML, "go to your room and don't come out until it's clean"
Out came XML
HTML insufficient
Good for humans, bad for computers Doesn't scale
XML Example
<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> </menu>
XML Languages
MML - musical scores CML - chemicals HRMML - Human Resource Management (???) MathML - equations RSS - web syndication
XML Syntax
Tags properly nested Tag names case-sensitive All tags must be closed
or self-closing <foo/> is the same as <foo></foo>
Attributes enclosed in quotes Document consists of a single (root) element A few other details
Valid:
Structure conforms to a DTD
DTD
Document Type Definition A grammar for XML documents Defines
which elements can contain which other elements which attributes are allowed/required/permitted on which elements
DTD Example
<?xml encoding="US-ASCII"> <!ELEMENT menu (meal)*> <!ATTLIST menu name CDATA #OPTIONAL> <!ELEMENT meal (food|drink)*> <!ATTLIST meal
name CDATA #REQUIRED>
XML Namespaces
A single document can use multiple DTDs But! Two DTDs can use the same element name with different rules Solution: Namespaces Must prefix tag name with namespace name
e.g. <xsl:apply-templates select="."/>
Entities
Macros / constants Values defined once, used in document
<!DOCTYPE foo SYSTEM "foo.dtd" [ <!ENTITY background "#99FFFF"> ]> <BODY BGCOLOR="&background;">
XSL
The eXtensible Style Language Transforms XML into HTML Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML
XSL Architecture
XSL Stylesheet
XML Source
XSL Processor
HTML Output
XML is a Tree
<?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> name <meal name="snack"> <food>Chips</food> </meal> "breakfast" </menu>
menu
meal
meal
food
"Scrambled Eggs"
food
"Hash Browns"
drink
"Orange Juice"
XML Is A Tree
Nodes
Branch nodes contain children Leaf nodes contain content
Attributes, Values, Entities, etc.
DOM provides API-based access to tree models XSL turns one tree into a different tree
IBM LotusXSL
java com.lotus.xsl.xml4j.ProcessXSL -in servletfaq.xml -xsl faq.xsl -out faq.html
And so on
Formatting Objects
Forget about it for now
XSLT
The meat of XSL Syntax for making XSL template files Pattern matching Output formatting Rules-based (like Prolog)
XPath
The stuff inside the quotes in XSL patterns
"/person/name/firstname"
A sensible way to locate content in an XML document More straightforward than walking a DOM tree or waiting for a SAX callback
XPath Syntax
book/title
title child of book child of current node
/book/title
title child of book child of document root
@language
language attribute of current node
chapter/@language
language attribute of chapter child of current node
book/*/title
all title children of all children of book (but not of their children)
chapter//para
all para children of any child of chapter, recursively
../../title
title child of parent of parent parent::node()/parent::node()/child::title
XPath Abbreviations
. .. //
@
XPath Functions
para[1] or para[position()=1]
the first para node of the current node
para[last()] para[count(child::note)>0]
all paragraphs with one or more notes
para[id("abstract")]
selects all child nodes like <para id="abstract">
para[@type='secret'] or para[attribute::type='secret']
selects all child nodes like <para type="secret">
para[lang("en")]
matches <para xml:lang="en-uk"></para>
note[contains(., "alex")]
. means "test childrens' content too, recursively" in this context
note[starts-with(., "hello")]
XPath Disadvantages
Not XML
Not hierarchical New syntax rules Weird mix of /,[],(),*,:,::,.,..,@
XSLT Syntax
XSL Rules
XSL is a series of rules or templates Each template matches an element Templates can contain XML commands
Usually the template calls applytemplates recursively on its children If not, then processing stops at that node (but continues for its other siblings that matched this template)
Default Rule
For a leaf node, output its contents For a branch node, apply templates (recursively) (including default rule)
if
executes conditionally
number
counts position of element in group good for ordered list numbering, table of contents, etc.
XSL Example
<?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns="">
Example (cont.)
<xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1>
[Note: Can reuse contents, unlike CSS]
Example (cont.)
<xsl:apply-templates />
</BODY> </HTML> </xsl:template>
Example (cont.)
<xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template>
Example (cont.)
<xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet>
Outputting Attributes
From This:
<link> <name>Stinky</name> <url>http://www.stinky.com/</url> </link>
We Want This:
<A href="http://www.stinky.com/">Stinky</A>
Outputting Attributes
The Hard Way:
<xsl:element name="A"> <xsl:attribute name="href"> <xsl:value-of select="url" /> </xsl:attribute> <xsl:value-of select="name" /> </xsl:element>
Copying Subtrees
<xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy> </xsl:template>
No, I don't understand it either Default copy rule strips all tags/attributes Also copy-of
XSL conditionals: if
<xsl:if test="author"> by <xsl:apply-templates select="author" /> </xsl:if> Note: no else (?!?)
Case 2
<link> <url>http://www.stinky.com/</url> </link> <a href="http://www.stinky.com/">http://www.stinky.com/</a>
Case 3
<link> <name>Stinky</name> </link> Stinky
Template Modes
Same element name, different context -> different template, different output Can invoke apply-templates with a mode, matches corresponding moded template
<h1>Table of Contents</h1> <ol> <xsl:apply-templates select="chapter" mode="toc"/> </ol> <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li> </xsl:template> <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/> </xsl:template>
XSL Disadvantages
Confusing syntax and semantics
Like Prolog+C+XML It's really a programming language, but using markup language syntax yuck!
Hard to debug
XSL Trace helps a little
Links: XML
XML Spec
http://www.w3.org/TR/REC-xml
XML FAQ
http://www.ucc.ie/xml/
XML.com
http://www.xml.com/
References
McLaughlin, "Java and XML", O'Reilly Eckstein, "XML Pocket Reference", O'Reilly Harrold, "XML Bible" Bradley, "The XML Companion", Addison-Wesley
Q&A