You are on page 1of 74

XML

eXtensible Markup Language

Objectives
Introduce

XML Including:

XML Documents Basics XML Schema XML Stylesheets & Transformations (XMLS/T)

Explore

the XML Support in .NET

Contents
Have The

a Look Back: The pre-XML world


XML & XML Document Basics XML Schemata Stylesheets & Transformations

XML Architecture

.NET

Framework Support for XML

System.Xml and sub-namespaces

Looking Back
Tightly

coupled systems and communication closed protocols and methods

Proprietary, Data

sharing between 3rd party solutions unwieldy solutions

Non-extensible

XML!
XML

technologies introduced:

XML 1.0 - Document Basics XML Schemata XSLT: Style sheets and Transformations

.NET

& XML:

The System.Xml Namespace

XML 1.0 - Document Basics


What XML

is XML?

Tags and Tag Sets of an XML Document

Components Document XML The

Instance

Document by Example

XML Parser

What is XML?
Stands Language

1/2

for Extensible Markup Language specification for describing data

Syntax rules Syntax & Grammar for creating Document Type Definitions

Widely

used and open standard

Defined by the World Wide Web Consortium (W3C) http://www.w3.org/TR/2000/REC-xml-20001006

What is XML?
Designed

2/2

for describing and interchanging data

Data is logically structured Human readable, writeable and understandable text file! Easy to Parse; Easy to Read; and Easy to Write! Data that describes data; data with semantics

Metadata:

Looks

like HTMLbut it isnt!

Uses tags to delimit data and create structure Does not specify how to display the data

XML Tag-Sets
Begin

with <someTag> and end with </someTag> are:

Can have an empty element: <someTag /> XML document declaration: <?xml ... ?> Comments: <!-- some comment --> The document type declaration

Exceptions

<! DOCTYPE [ ... ]> <!ELEMENT >, <!ATTLIST>, etc

Definition of document elements in an Internal DTD:

Promote

logical structuring of documents and data

User definable Create hierarchically nested structure

Components of an XML Document 1/3


XML

Processing Instruction Type Declaration Instance

Document Document

Components of an XML Document 2/3

XML Processing Instruction

<?xml version = 1.0 encoding = UTF-8 ?>


version information encoding type: UTF-8, UTF-16, ISO-10646-UCS-2, etc standalone declaration; indicates if there are external file references Namespace declaration(s), Processing Instructions (for applications), etc

Components of an XML Document 3/3


Document

Type Declaration. Two types:

An Internal declaration
<!DOCTYPE CustomerOrder [ <!-- internal DTD goes here! --> ]>

An External reference
<!DOCTYPE CustomerOrder SYSTEM "http://www.myco.com/CustOrder.dtd">

Document

Instance

This is the XML document instance Read as: the XML-ized data

Document Instance: The Markup


Document

Root Element

Required if a document type declaration exists Must have the same name as the declaration Can contain other elements Can have attributes assigned to them May or may not have a value Properties that are assigned to elements Provide additional element information

Elements

Attributes

XML By Example: A Document


<?xml version = 1.0 encoding = UTF-8 ?> <! DOCTYPE CustomerOrder SYSTEM http://www.myco.com/dtd/order.dtd > <CustomerOrder> <Customer> <Person> <FName> Olaf </FName> <LName> Smith </LName> </Person> <Address AddrType = shipping> 91 Park So, New York, NY 10018 </Address> <Address AddrType = billing> Hauptstrasse 55, D-81671 Munich </Address> </Customer> <Orders> <OrderNo> 10 </OrderNo> <ProductNo> 100 </ProductNo> <ProductNo> 200 </ProductNo> </Orders> <!-- More <Customer>s ... --> </CustomerOrder>

XMLData + DTD
<!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a>

DTD
<!ELEMENT a (b+, c?) > <!ELEMENT b (#PCDATA) > <!ELEMENT c (#PCDATA) >

Not Valid!

<!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a>

Valid

Whats a DTD?

Document Type Definition (DTD) Defines the syntax, grammar & semantics Defines the document structure

What Elements, Attributes, Entities, etc are permitted? How are the document elements related & structured?

Referenced by or defined in XML documents, but its not XML! Enables validation of XML documents using an XML Parser Can be referenced to by more than one XML document DTDs may reference other DTDs

DTD By Diagram
CustomerOrder Customer Person Address Address Address FName LName OrderNo Orders OrderNo ProductNo ProductNo ProductNo ProductNo ProductNo Orders Orders

DTD By Example
http://www.myco.com/dtd/order.dtd <?xml version = 1.0 encoding = UTF-8 ?> <!DOCTYPE CustomerOrder [ <!ELEMENT CustomerOrder (Customer, Orders*) > <!ELEMENT Customer (Person, Address+) > <!ELEMENT Person (FName, LName) > <!ELEMENT FName (#PCDATA) > <!ELEMENT LName (#PCDATA) > <!ELEMENT Address (#PCDATA) > <!ATTLIST Address AddrType ( billing | shipping | home ) shipping > <!ELEMENT Orders (OrderNo, ProductNo+) > <!ELEMENT OrderNo (#PCDATA) > <!ELEMENT ProductNo (#PCDATA) > ]>

XML Parser in Action!


Browser or Application
XML Schema Or DTD

XML Source Document

XML Parser

Validated XML Document

The XML Parser: What is it?


Used

to Process an XML Document

Reads, parses & interprets the DTD and XML document Performs substitutions, validation or additional processing

Knows

the XML language rules and can determine:

Is the document Well-Formed? Is it Valid?

Creates

a Document Object Model (DOM) of the instance

Provides programmatic access to the DOM or instance

What is the DOM?


DOM

stands for Document Object Model interface for HTML & XML documents

Programming An

in-memory representation of a document the document structure through an object model


Tree-view of a document Nodes, elements and attributes, text elements, etc

Defines

W3C

defined the DOM Level 1 and Level 2 Core

http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/

Generating The DOM

<?xml version=1.0?>

Parser
Dom Tree Root Element Child Element Text Child Element Text

XML Document

Where Do You Find XML Parsers?


Transparently

built into XML enabled products

Internet Explorer, SQL Server 2000, etc

All

over the Internet! XML Parser Xerces


http://msdn.microsoft.com/xml/general/xmlparser.asp http://xml.apache.org http://alphaworks.ibm.com

Microsoft

IBM/Apache

XML Schema
Whats

a Schema? vs. DTDs & Structure

Schema

Datatypes

XML Documents + XML Schema


<!-- XML Data--> <a> <b> Some </b> <c> 100 </c> <c> 101 </c> </a> <!-- Some XML Schema --> <element name = a" > <complexType> <sequence> <element name=b type=string" minOccurs=1"/> <element name=c" type="integer" maxOccurs="1" /> </sequence> </complexType> </element>

Not Valid!

<!-- XML Data--> <a> <b> Some </b> <b> Thing </b> </a>

Valid

Whats a Schema?
Websters

Collegiate Dictionary defines it as:

A diagrammatic presentation; a structured framework

The

XML world defines it as:

A structured framework for your XML Documents! A definition language - with its own syntax & grammar A means to structure data and enhance it with semantics! Best of all: Its an alternative to the DTD!

Composed

of two parts:

Structure: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ Datatypes: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

Schema vs. DTDs


Both

are XML document definition languages are written using XML

Schemata Unlike More

DTDs, XML Schema are Extensible like XML!

verbose than DTDs but easier to read & write

Datatypes & Structure


Defining

datatypes

The simple or primitive datatypes

Based on (or derived) from the Schema datatypes

Complex types

Facets Declaring

data types by example

<schema>

XML Schema Datatypes


Two

kinds of datatypes: Built-in and User-defined

Built-in

Primitive Datatypes

string, double, recurringDuration, etc CDATA, integer, date, byte, etc Derived from the primitive types Example: integer is derived from double

Derived Datatypes:

User-defined

Derived from built-in or other user-defined datatypes

The Simple Type: <simpleType>


The

Simplest Type Declaration: on a primitive or the derived built-in datatypes contain sub-elements or attributes

<simpleType name = FirstName type = string/>

Based

Cannot Can

declare constraining properties (facets) be used as base type of a complexType

minLength, maxLength, Length, etc

May

The Complex Type: <complexType>


Used May May

to define a new complex type

be based on simple or existing complexTypes declare elements or element references:

<element name=... type = ... />


<element ref=.../>

May

declare attributes or reference attribute groups

<attribute name=... type=.../> <attributeGroup ref = ... />

Defining a complexType By Example


<complexType name= Customer> <sequence> <element name= Person type=Name /> <element name= Address type=Address /> </sequence> </complexType> <complexType name=Address> <sequence> <element name=Street type=string /> <element name=City type=string /> <element name=State type=State_Region /> <element name=PostalCode type=string /> <element name=Country type=string /> </sequence> <!-- AddrType attribute not shown --> </complexType>

More Complex Types


Derivation simpleContent Extension

complexContent

& Restriction (well see some of this) Groups

Substitution Abstract

Elements and Types

The Many Facets of a Datatype!

http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

way to constrain datatypes

Constrain the value space of a datatype

Specify

optional properties of Constraining Facets:

Examples

precision, minLength,enumeration, ...


<simpleType name=FirstName> <restriction base = string> <minLength value = 0 /> <maxLength value = 25 /> </restriction> </simpleType>

Declaring <element> Elements using the <element> tag Elements are declared
<element name = FirstName type = string /> Based

on either a simple or complex type

<element name = Address type = AddressType /> May

contain simple or other complex types

<element name = Orders > <complexType> <sequence> <element name = OrderNo type = string /> <element name = ProductNo type = string /> </sequence> </complexType> </element> May

reference an existing element

<element ref = FirstName />

Declaring Attributes
Declared Value Can May

using <attribute> tag

pairs

only be assigned to <complexType> types be grouped into an attribute group more later! on a <simpleType>, by reference or explicitly

Based

<attribute name = age type=integer /> <!-- OR --> <attribute name = age > <simpleType> <restriction base=integer> <maxLength = 3/> </restriction> </simpleType> </attribute>

Declaring Attribute Groups 1/2


Way

to group related attributes together logical organization reuse defined once, referenced many times

Promotes

Encourages Facilitates Improves Must

maintenance

Schema readability

be unique within an XML Schema from complexType definitions

Referenced

Declaring Attribute Groups 2/2


<!-- Define the unique group: --> <attributeGroup name = CreditCardInfo > <attribute name = CardNumber type = integer use = required /> <attribute name = ExpirationDate type = date use = required /> <attribute name = CardHolder type = FullName use = required /> </attributeGroup> <!-- Then you can reference it from a complexType: --> <complexType name = CreditInformation > <attributeGroup ref = CreditCardInfo /> </complexType>

Schema Namespaces
Equivalent

to XML namespaces

http://www.w3.org/TR/1999/REC-xml-names-19990114/

Used

to qualify schema elements must itself be qualified with the schema namespace may have a namespace prefix for the schema

<schema>

<schema xmlns = http://www.w3.org/2001/XMLSchema > Namespace

Prefix qualifies elements belonging to the targetNamespace


= http://www.w3.org/2001/XMLSchema = http://www.MyCompany.com/Schema

<schema xmlns xmlns:CO >

<schema> targetNamespace Attribute

<schema> targetNamespace

attribute

Declares the namespace of the current schema


This must be a universally unique Universal Resource Identifier (URI)

Helps the parser differentiate type definitions Used during schema validation Differentiates differing schema vocabularies in the schema Should match the schema namespace declaration
targetNamespace:CO ="http://www.myCo.com/CO"

targetNamespace:namespace_prefix = some_URI...

Example:

XML <schema> By Example


<?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace = http://www.myCo.com/CO xmlns:CO=http://www.myCo.com/CO xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified attributeFormDefault="qualified elementFormDefault="qualified">

<!-- Declare the root element of our schema --> <xsd:element name="CustomerOrder" type="CO:CustomerOrder"/> <!-- Further Definitions & declarations not shown --> </xsd:schema>

Follow the Yellow Brick XPath


Specification

found at:

http://www.w3.org/TR/1999/REC-xpath-19991116

Language Permits Uses


used to address parts of an XML document

selection of nodes in an XML document

a path notations like with URLs

Absolute paths: /CustomerOrder/Orders Relative paths: Orders

Roadmap To Selection
Location

Syntax Paths

axis::node_test[ predicate ] Axis: Defines from where to start navigating

Location

parent, child, ancestor, attribute, / (the document), etc By tag name, node selector or wildcard (*) node( ), text( ), comment( ), etc position( ), count( ), etc Example: /Address:: * [@AddrType=billing]

Node test: Selects one or more nodes


Predicates: Optional function or expression enclosed in [...]


Taking XPath Shortcuts


Abbreviated

Syntax exists

The following are equivalent


OrderNo[position=1]/ProductNo[position=3] OrderNo[1]/ProductNo[3]

.. instead of parent::node() . instead self::node() // instead of /descendant-or-self::node()/

Operators
To

select an attribute value use @ select the value of an element use $

CustomerOrder/Customer/Address[@AddrType]
To

CustomerOrder/Orders/ProductNo[1] [$ProductNo]
Can

compare objects arithmetically

&lt; (for <), &gt; (for >), &lt;= (for <=), etc

Must adhere to XML 1.0 quoting rules

Can

use logical operators

and or

XSLT: Stylesheets & Transformations


What The

is XSLT?

Basic Structure Template Rules

Some More More

Advanced Structure Advanced Template Rules (or Features ;) It All

Transforming

What is XSLT?

Widely used and open standard defined by the W3C A sub-specification of XSL http://www.w3.org/TR/1999/REC-xslt-19991116 Designed to be used independently of XSL Designed primarily for the transformation needed in XSL W3C defines XSLT: a language for transforming XML documents XSLT is more than a language its an XML programming language Can have rules, evaluate conditions, etc Offers the ability to transform one XML document into another Transform an XDR Schema to and XSD Schema! Transform an XML document into an HTML document

The XSLT Process Overview


XSLT Processor XSLT Style Sheet
Target Schema

XML Source Document

XML Target Document

Source Schema

Transformation Process Overview


Pass

source document to an XSLT processor then:

Processor contains a loaded XSLT style-sheet Loads the specified Stylesheet templates... Traverses the source document, node by node... Where a node matches a template... Applies the template to the node Outputs the (new) XML or HTML result document

Processor

<Orders > <OrderNo> 10 </OrderNo> <ProductNo> 100 </ProductNo> <ProductNo> 200 </ProductNo> </Orders > <Orders > <OrderNo> 20 </OrderNo> <ProductNo> 501 </ProductNo> </Orders >

Process of Transmutation

XSLT Stylesheet

XSLT Processor

<HTML> <BODY> <TABLE border = 3> <TR> <TD> 10 </TD> <TD> 100</TD> </TR> <TR> <TD> 10 </TD> <TD> 200</TD> </TR> <TR> <TR></TR> <TD> 20 </TD> <TD> 501 </TD> </TR> </TABLE> </BODY> </HTML>

Alchemy Anyone?
Need

to declare the XSLT namespace:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> xsl:stylesheet synonymous for xsl:transform

Use

patterns to locate nodes in the source document transform the nodes as you like!

Then

The Elements - Templates

<xsl:template/>

Used for selecting node or node sub-tree Use the match attribute to select a specific node

<xsl:template match = ... >

Then apply changes Used to recursively process children of the selected node Used to select all nodes with a specific value

<xsl:apply-templates />

<xsl:apply-templates select = ... />

XSLT Alchemy By Example


<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/2000/CR-xsl-20001121/"> <xsl:template match="/">
<HTML> <BODY>

<TABLE border = 3> <xsl:for-each select=Customer/Orders/OrderNo"> <xsl:for-each select=Customer/Orders/ProductNo"> <TR> <TD> <xsl:value-of select=OrderNo"/></TD> <TD> <xsl:value-of select=ProductNo"/></TD> </TR> </xsl:for-each> <TR></TR> </xsl:for-each> </TABLE>
</BODY> </HTML>

</xsl:template> <xsl:stylesheet>

Some More Elements


<xsl:value-of

select = ... />

Select a node and insert its value into the output stream

Many,

many more XSLT elements enabling:

Repetition

< xsl:for-each select = >

Conditional processing

< xsl:if test = > < xsl:choose >, <xsl:when test = >, <xsl:otherwise>

Sorting

<xsl:sort >

Etc

Brief look at XML in .NET


.NET XML

Support for XML Namespaces in .NET XML Classes in .NET

Some

.NET Supports XML!

XML 1.0

http://www.w3.org/TR/1998/REC-xml-19980210

XML Namespaces

http://www.w3.org/TR/1999/REC-xml-names-19990114/

XML Schemas

http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/

XPath expressions

http://www.w3.org/TR/1999/REC-xpath-19991116

XSL/T transformations

http://www.w3.org/TR/1999/REC-xslt-19991116

DOM Level 1 and Level 2 Core


http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/

SOAP 1.1

http://msdn.microsoft.com/xml/general/soapspec.asp

XML Namespaces in .NET System.Xml


.Xsl .XPath .Schema .Serialization

System.Xml Namespace
Overall

namespace for classes that provide XML support for creating, editing, navigating XML documents writing and manipulating documents via the DOM

Classes

Reading,

Use the XmlDocument class for XML documents Use the XmlDataDocument class relational or XML data

Classes

that correspond to every type of XML element:

XmlElement, XmlAttribute, XmlComment, etc

Used by the XmlDocument and XmlDataDocument classes

XmlReader
Abstract Fast, Base

base class for reading XML

forward-only, non-cached XML stream reader class for XmlTextReader of Interest

Properties

Value: Gets the value of the node NodeType: Returns the type of node HasValue: Returns true if the node has a value LocalName: Gets the name of the node without its prefix ReadState: Returns the ReadState of the stream

Closed, EndOfFile, Error, Initial or Interactive

XmlWriter
Abstract Fast, Base

base classes for writing XML

forward-only, non-cached XML stream writer classes for XmlTextWriter of Interest

Properties

WriteState: Returns the WriteState of the stream

Attribute, Content, Element, etc

XmlLang: Returns the current xml:lang scope XmlSpace: Returns the current xml:space

XmlTextReader & XmlTextWriter


Derived

from the XmlReader & XmlWriter abstract classes all the functionality defined by their base classes

Implement Designed

to work with a text based stream

As opposed to an in-memory DOM

Inherit

the properties of the XmlReader and XmlWriter methods support reading XML elements methods support writing XML elements

XmlTextReader

Read, MoveToElement, ReadString, etc

XmlTextWriter

WriteDocType, WriteComment, WriteName, etc

XmlDocument
Derived

from the XmlNode class an entire (in memory) XML document

Represents Supports Reading

DOM Level 1 and Level 2 Core functionality

& writing built on top of XmlReader & XmlWriter

Load a document and generate the DOM

Using: URI, file, XmlReader, XmlTextReader or Stream

Properties & Methods of Interest

Properties of Interest:

ChildNodes: Returns all the children of the current node DocumentType: Gets the DOCTYPE declaration node DocumentElement: Returns the root XmlElement XmlResolver: Used to resolve DTD & schema references FirstChild: Returns the first child of the current node ParentNode: Returns the parent of the current node Value: Returns the (string) value of the current node CreateComment: Creates a comment node CreateElement: Creates an element node Load: Loads XML data using a URL, file, Stream, etc Save: Saves the XML document to a file, Stream, orwriter

Methods of Interest

XmlDocument & the .NET DOM


System.Xml
.XPath .Xsl
EntityHandling Formatting NameTable ReadState TreePosition Validation WriteState XmlAttribute XmlAttributeCollection XmlCDataSection XmlCharacterData

.Serialization .Schema
XmlNode XmlNodeReader XmlNodeType XmlNotation XmlReader XmlSpace XmlText XmlTextReader XmlTextWriter XmlUrlResolver XmlWhitespace XmlWriter ...

XmlCharType XmlComment XmlConvert XmlDataDocument XmlDeclaration XmlDocument XmlDocumentFragment XmlDocumentType XmlElement XmlEntity XmlEntityReference XmlNamedNodeMap

XmlDocument By Example
using System.Xml; //Create an XmlDocument, Load it, Write it to the Console //One way: XmlDocument xDoc = new XmlDocument(); xDoc.Load( C:\\myData.xml"); xDoc.Save( Console.Out); //Second way (Use a XmlTextReader to read in load the XML): XmlTextReader reader = new XmlTextReader(C:\\myData.xml"); xDoc.Load( reader ); xDoc.Save( Console.Out); //Third way (Use a XmlTextWriter to output the XML document): XmlTextWriter writer = new XmlTextWriter( Console.Out ); writer.Formatting = Formatting.Indented; xDoc.WriteContentTo( writer ); writer.Flush(); Console.WriteLine(); writer.Close();

System.Xml.Xsl Namespace
Provides Some

support for XSL Transformations

of the classes:

XsltTransform: Transforms using a stylesheet XsltException: Used to handle transformation exceptions XsltContext: The XSLT processors execution context

XsltTransform
Four

simple steps to perform a transformation

Instantiate a XsltTransform object Load a stylesheet Load the data Transform!

Transformation By Example
Using System.Xml.Xsl; // 1. Create a XslTransform object XslTransform xslt = new XslTransform(); // 2. Load an XSL stylesheet xslt.Load("http://somewhere/favorite.xsl"); // 3 & 4. Load the XML data file & transform! xslt.Transform(http://somewhere/mydata.xml, C:\\somewhere_else\\TransformedXmlOutput.xml );

Summary
XML XML

is powerful, flexible, open & extensible is easy to learn easy to read & easy to use XML Schema and XSLT combine to let you

XML,

Have data with semantics Dictate and enforce you data structure Separate data and data representation Easily transform your data

.NET

is XML-ized .NET lives on XML!

Not only exposes XML functionality, built using it

Section 4: Q&A

Document Object Model (DOM) 1/2


Use

an XML parser to generate and manipulate the DOM

Load an XML file using a parser Use the parsers programming interface to:

Navigate through the Document Object Model Manipulate the DOM: Add, Delete, Move, Modify DOM elements

Using Some

a DOM the parser can insure well formed documents parsers can validate the DOM Validating Parser

By loading and comparing to either a DTD or a Schema

Document Object Model (DOM) 2/2

System.Xml

contains DOM related classes

XmlDocument XmlDataDocument XmlNavigator XmlDataNavigator

etc

.NET

supports DOM Level 1 and most of the Level 2 Core

http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/ http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/

XML Namespaces
Another

1/2

W3C specification

http://www.w3.org/TR/REC-xml-names/

Create Used

collection of tags that share the same semantics

to qualify tags that would otherwise collide

Multiple documents can use the same tag differently For example:

Document A may use <name/> to designate a persons name Document B may use <name/> to designate a file name

XML Namespaces
A

2/2

URI is used to uniquely identify a namespace


xmlns=urn:schemas-microsoftcom:customerdata

May

assign a namespace prefix to the namespace the prefix to differentiate elements & attributes have a default namespace

xmlns:ms=urn:schemas-microsoft-com:data

Use

<ms:name> John Smith </ms:name>

Documents

Default prefix is xmlns

You might also like