You are on page 1of 62

eXtensible Markup Language version 1.

0
Recommendation, February 1998

An Introduction to XML
Patrice Bonhomme & Laurent Romary
Lucid-IT LORIA
bonhomme@lucid-it.com
romary@loria.fr
Objectives
! Understanding the basic concepts of XML
! Elements, attributes and content
! DTD (, Schemas)
! Namespaces
! An overview of the main associated
recommendations:
! XML path language (XPath)
! XML pointers and links (Xpointer and XLink)
! The transformation language of XSL (eXtensible
Stylesheet Language)
XML in the document chain
Conception Edition Transformation Consultation

XML
DTD/ HTML
Schema XML XSL/XSLT XHTML

Data User
Structures Data processing perspective
A quick historical overview
! 1986
! SGML (Standard Generalized Markup Language)
! ISO standard: ISO:8879:1986
! 1987
! TEI (Text Encoding Initiative)
! 1990
! HTML 1.0 (HyperText Markup Language)
! 1997/1998
! XML 1.0 (eXtensible Markup Language)
What XML is:
! XML: eXtended Markup Language
! A W3C (World Wide Web Consortium)
Recommendation
! A meta-language: it allows one to define his
own markup language
! A simplification of the SGML standard
! SGML was intended to represent the “logical”
structure of a document
! HTML was conceived as an application of SGML
A simplified SGML
! An XML document is an SGML document
! With some slight (but essential) differences...
! XML has the expressive power of SGML
without its complexity
! Opens the door to the transmission of
structured documents on the web
! Databases also entered the game...
What can we do with it?
! Data modeling (in complement to UML for
instance)
! Publication of structured data on the web
! Separation of the logical structure of a
document from its actual presentation
! Distributed applications (cf. well-formed vs.
valid documents)
! Integrating data from heterogeneous sources
Why can’t we avoid it?
! Simplicity, which makes it simple to integrate into any
kind of application
! XML specifications = 36 pages
! SGML standard, ISO-8879 = 250 pages

! Wide variety of application already implemented


! Industry: Publishing, Databases, Cataloguing, e-business etc.
! Science, research: genomics, astronomy, maths, etc.

! Consequence:
! a lot of software available: editors, parsers, bridges from and to
existing editing environment or DBMSs
From HTML to XML - 1
! A simple HTML document:

<B> Patrice Bonhomme </B>


<P>
Patrice.Bonhomme@loria.fr <BR>
tél : 03 83 59 30 52 <BR>
fax : 03 83 41 30 79 <BR>
équipe : Langue et Dialogue (<I>LORIA</I>)<BR>
From HTML to XML - 2
! The XML way:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd">
<!-- Un membre du LORIA -->
<MEMBRE TYPE="IE" ID="M28">
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Some properties of XML
! Emphasis should be put on the “semantics” of
a document
! Underlying model: tree structure
! Possibility to imagine a script language to
access any part of an XML document
e.g.: DB/MEMBRE[28]/MEL/text()
! XML supports Unicode character encodings
Elements and their content

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>

Closing tag
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Textual content
Closing tag
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM> Empty element
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Textual content
Closing tag
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM> Empty element
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Textual content
Closing tag
Elements and their content
Opening tag

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM> Empty element
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX> Element
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Textual content
Closing tag
Elements and their attribute

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Elements and their attribute
Attribut name

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Elements and their attribute
Attribut name Attribut value

<MEMBRE TYPE="IE" ID="M28">


<LOGIN ID="bonhomme"/>
<NOM> BONHOMME </NOM>
<PRENOM> Patrice </PRENOM>
<MEL> Patrice.Bonhomme@loria.fr </MEL>
<TEL> 03 83 59 30 52 </TEL>
<FAX> 03 83 41 30 79 </FAX>
<EQUIPE LAB="LORIA">Langue et
Dialogue</EQUIPE>
</MEMBRE>
Other features
! XML declaration
<?xml version=“1.0"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

! Commentaries
<!-- ceci est un commentaire -->
! CDATA section
<![CDATA[Langue & Dialogue]]>
! Processing instruction (application specific)
<?edit line="wrap"?>
From one document to a class…
How do I How may I share
know the this structure
structure of with others?
my document?
Document Type Definition
! Expresses constraints on:
! Allowed element and attribute names
! Possible content of a given element (“content
model”)
! To which elements a given attribute can be
attached
! Similar to the traditional SGML approach, but:
! Simplified syntax
! The DTD is optional for a document
Example
<!ELEMENT
MEMBRE
(LOGIN, NOM?, PRENOM?,MEL, TEL+, FAX*, EQUIPE)>

<!ELEMENT LOGIN EMPTY>


<!ATTLIST LOGIN ID ID #REQUIRED>

<!ELEMENT NOM (#PCDATA)>


...
<!ENTITY W3C "World Wide Web Consortium">
<!ENTITY chap1 SYSTEM "http://…/chapitre-1.xml">
<!ENTITY img2 SYSTEM "image2.gif" NDATA gif>
...
Using a DTD
<!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd">
<MEMBRE TYPE="IE" ID="M28">

</MEMBRE>

<!DOCTYPE MEMBRE [
<!ELEMENT MEMBRE … >

]>
<MEMBRE TYPE="IE" ID="M28">

</MEMBRE>
Valid vs. Well-formed
! Well-formed documents
! Syntactic bracketing is preserved, without a DTD
! Empty element:
<toto></toto> = <toto/>
! Valid documents
! With a DTD (à la SGML)
! Essential difference with SGML
! Extracting and re-using document fragments
! One usually produce valid document and distribute well-
formed ones
XML namespaces
! Objectives: avoid conflicts between element and
attribute names coming from various sources
! Composite documents
! XSLT instructions, Schema declarations
! Declaration:
<DOC xmlns:mml="http://www.w3.org/Math/MathML/"
xmlns="http://www.ua99.net/DOC/1.0">
<P>blah blah :
<mml:fn mml:definitionURL="mydef.xml">

</mml:fn> re blah blah</P>
</DOC>
Reserved namespaces
! The xml: prefix is reserved by the W3C for specific
attributes:
<title xml:space="default">...</title>
<p xml:lang="FR">…</p>
XPath
! XML Path Language 1.0 REC 29012000
! Wide purpose syntax for addressing sub-parts of an
XML document
! Joint specification used by XML Pointers
(XPointer recommendation) and the XSLT
transformation language
! Allows one to access, select and filter XML
fragments (cf. Tree representation of an XML
document)
Addressing nodes in XPath
! Absolute addressing
! Given: a URL
! id(M28), root()
! Relative addressing along axes
! Given: a node
! ancestor, child
! descendant
! psibling, fsibling
An XML document represents a
hierarchical structure
The only view you
should ever, ever have
of an XML document

MEMBRE
TYPE="IE" ID="M28"

LOGIN NOM ... EQUIPE


id="bonhomme" LAB="LORIA"

BONHOMME Langue et Dialogue


XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB /DB/MEMBRE
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB /DB/MEMBRE
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB /DB/MEMBRE /DB/MEMBRE[2]


XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB /DB/MEMBRE /DB/MEMBRE[2]


XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB /DB/MEMBRE /DB/MEMBRE[2]


XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
/ /DB/MEMBRE
/ / /DB/MEMBRE[2]
/ /
/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
/ /DB/MEMBRE
/ / /DB/MEMBRE[2]
/ /
/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
/ /DB/MEMBRE
/ / /DB/MEMBRE[2]
/ /
/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()
/DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID
XPath - Exemples
<DB>
<MEMBRE TYPE="IE" ID="M28">
<LOGIN ID="bonhomme"/>
...
<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>
</MEMBRE>
<MEMBRE TYPE="CR" ID="M14">
<LOGIN ID="romary"/>
...
</MEMBRE>
</DB>

/ ou /DB
/ /DB/MEMBRE
/ / /DB/MEMBRE[2]
/ /
/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()
/DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID
XPointer
! Cf. HTML, anchors are needed:
<A NAME="TOTO">
http://www.titi.fr/index.html#toto
! In XML, pointers can directly address a
document component:
http://…/doc.xml#xptr(id(M28))
http://…/doc.xml#xptr(/DB/MEMBRE[28]/MEL)
! Advantage: no need to modify the target
document (notion of primary source)
XLink
! In HTML: the elements which may carry links are
known:
<A>, <IMG>, ...
! In XML: any element may carry a simple or
complex link
! This is done by using pre-defined attributes:
<a xlink:type="simple"
xlink:href="http://www.w3.org/">W3C</a>
Visualizing XML documents
! Basically, an XML document does not provide
any information about its presentation
! Visualizing a document may depend on the
target audience, device etc.
! Stylesheets:
! Casdading Style Sheets (CSS 1 et 2)
! Extensible Style Language (XSL) >> XSLT
eXtensible Style Language
! Describes the way a
document will be shown,
+ XSL printed or verbalized…
XML
XSL: a two-fold proposal
! XSL = Transformations + Visualizing properties
! XSLT : Transformation of XML documents
! Allows one to transform an XML document into another
XML document
! Use this to produce well-formed (!) HTML documents

! XSL FO: formatting XML data


! FO = Formatting Objects
! Is supposed to be application independent (Word/RTF, PS,

PDF, MIF, …)
! Not a recommendation yet :-(
General structure of an XSL
document
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Trans
form">

<xsl:template match="/">

</xsl:template>

<xsl:template match="NOM">

</xsl:template>
</xsl:stylesheet>
Declarative approach
! Sequence of rules (templates) specifying:
! The pattern (XPath) of nodes to which the rule can
be applied
! Actions to be undertaken:
! Elements to be generated in the target document
! Selection of the elements to be further explored in the

source document
! Additional functionalities: testing, sorting, etc.
A simple rule

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule
pattern (XPath)

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule
pattern (XPath)

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule
HTML element to be produced pattern (XPath)

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule
HTML element to be produced pattern (XPath)

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>
A simple rule
HTML element to be produced pattern (XPath)

<xsl:template match='/DB/MEMBRE/NOM'>
<B>
<xsl:apply-templates/>
</B>
</xsl:template>

The content of <B>


will be the one
produced by the
instruction
Creating a HTML core document
<xsl:template match=“/”>
<HTML>
<HEAD>
<TITLE>My directory</TITLE>
</HEAD>
<BODY>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
Selecting the nodes to be explored
<xsl:template match=“MEMBRE”>
<P>
<xsl:apply-templates
select=“NOM”/>
<xsl:text> - </xsl:text>
<xsl:apply-templates
select=“EQUIPE”/>
</P>
</xsl:template>
Conclusion
! XML - a practical format (protocol)
! Next steps:
! Sharing DTD, resources tools
! Generic mechanisms for handling families of
documents (cf. Nancy’s presentation)
References
www.oasis-open.org/cover/
www.w3.org/XML/
www.w3.org/TR
www.w3.org/TR/REC-xml
babel.alis.com/web_ml/xml/REC-xml.fr.html
www.xml.com
www.xmlinfo.com
xml.apache.org

You might also like