You are on page 1of 60

ISOM

MIS 3150 Data and Info Mgmt:


XML

Arijit Sengupta
Learning Objectives
ISOM

• Learn what XML is


• Learn the various ways in which
XML is used
• Learn the key companion
technologies
• See how XML is being used in
industry as a meta-language
Agenda
ISOM

• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Overview
What is XML?
ISOM

• A tag-based meta language


• Designed for structured data representation
• Represents data hierarchically (in a tree)
• Provides context to data (makes it meaningful)
 Self-describing data
• Separates presentation (HTML) from data (XML)
• An open W3C standard
• A subset of SGML
 vs. HTML, which is an implementation of SGML
Overview
What is XML?
ISOM

• XML is a "use everywhere" data


specification
XML XML

Application X

XML XML Configuration


Documents

Repository Database
Overview
Documents vs. Data
ISOM

• XML is used to represent two main


types of things:
Documents
• Lots of text with tags to identify and
annotate portions
of the document
Data
• Hierarchical data structures
Overview
XML and Structured Data
ISOM

• Pre-XML representation of data:


"PO-1234","CUST001","X9876","5","14.98"
• XML representation of the same data:
<PURCHASE_ORDER>
<PO_NUM> PO-1234 </PO_NUM>
<CUST_ID> CUST001 </CUST_ID>
<ITEM_NUM> X9876 </ITEM_NUM>
<QUANTITY> 5 </QUANTITY>
<PRICE> 14.98 </PRICE>
</PURCHASE_ORDER>
Overview
Benefits of XML
ISOM

• Open W3C standard


• Representation of data across
heterogeneous environments
 Cross platform
 Allows for high degree of interoperability
• Strict rules
 Syntax
 Structure
 Case sensitive
Overview
Who Uses XML?
ISOM

• Submissions by
 Microsoft
 IBM
 Hewlett-Packard
 Fujitsu Laboratories
 Sun Microsystems
 Netscape (AOL), and others…
• Technologies using XML
 SOAP, ebXML, BizTalk, WebSphere, many
others…
Agenda
ISOM

• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Syntax and Structure
Components of an XML Document
ISOM

• Elements
 Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME>
 Elements can be empty (<TAG_NAME />)
• Attributes
 Describes an element; e.g. data type, data range, etc.
 Can only appear on beginning tag
• Processing instructions
 Encoding specification (Unicode by default)
 Namespace declaration
 Schema declaration
Syntax and Structure
Components of an XML Document
ISOM

<?xml version="1.0" ?>


<?xml-stylesheet type="text/xsl" href="template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type='string'> </ELEMENT3>
<ELEMENT4 type='integer' value='9.3'> </ELEMENT4>
</ROOT>

Elements with Attributes


Elements
Prologue (processing instructions)
Syntax and Structure
Rules For Well-Formed XML
ISOM

• There must be one, and only one, root element


• Sub-elements must be properly nested
 A tag must end within the tag in which it was started
• Attributes are optional
 Defined by an optional schema
• Attribute values must be enclosed in " " or ' '
• Processing instructions are optional
• XML is case-sensitive
 <tag> and <TAG> are not the same type of element
Syntax and Structure
Well-Formed XML?
ISOM

• No, CHILD2 and CHILD3 do not


nest properly

<xml? Version="1.0" ?>


<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM

• No, there are two root elements

<xml? Version="1.0" ?>


<PARENT>
<CHILD1>This is element 1</CHILD1>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM

• Yes
<xml? Version="1.0" ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>
Syntax and Structure
An XML Document
ISOM

<?xml version='1.0'?>
<bookstore>
<book genre='autobiography' publicationdate='1981'
ISBN='1-861003-11-0'>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre='novel' publicationdate='1967' ISBN='0-201-63361-2'>
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Syntax and Structure
Namespaces: Overview
ISOM

• Part of XML's extensibility


• Allow authors to differentiate between tags of the
same name (using a prefix)
 Frees author to focus on the data and decide how to
best describe it
 Allows multiple XML documents from multiple authors
to be merged
• Identified by a URI (Uniform Resource Identifier)
 When a URL is used, it does NOT have to represent
a live server
Syntax and Structure
Namespaces: Declaration
ISOM

Namespace declaration examples:


xmlns: bk = "http://www.example.com/bookinfo/"

xmlns: bk = "urn:mybookstuff.org:bookinfo"

xmlns: bk = "http://www.example.com/bookinfo/"

Namespace declaration Prefix URI (URL)


Syntax and Structure
Namespaces: Examples
ISOM

<BOOK xmlns:bk="http://www.bookstuff.org/bookinfo">
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency='US Dollar'>19.99</bk:PRICE>

<bk:BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"
xmlns:money="urn:finance:money">
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE money:currency='US Dollar'>
19.99</bk:PRICE>
Syntax and Structure
Namespaces: Default Namespace
ISOM

• An XML namespace declared


without a prefix becomes the default
namespace for all
sub-elements
• All elements without a prefix will
belong to the default namespace:
<BOOK xmlns="http://www.bookstuff.org/bookinfo">
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
Syntax and Structure
Namespaces: Scope
ISOM

• Unqualified elements belong to the


inner-most default namespace.
BOOK, TITLE, and AUTHOR belong to
the default book namespace
PUBLISHER and NAME belong to the
<BOOK default publisher namespace
xmlns="www.bookstuff.org/bookinfo">
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns="urn:publishers:publinfo">
<NAME>Microsoft Press</NAME>
</PUBLISHER>
</BOOK>
Syntax and Structure
Namespaces: Attributes
ISOM

• Unqualified attributes do NOT


belong to any namespace
Even if there is a default namespace
• This differs from elements, which
belong to the default namespace
Syntax and Structure
Entities
ISOM

• Entities provide a mechanism for textual


substitution, e.g.
Entity Substitution
&lt; <
&amp; &
• You can define your own entities
• Parsed entities can contain text and markup
• Unparsed entities can contain any data
 JPEG photos, GIF files, movies, etc.
Agenda
ISOM

• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
The XML 'Alphabet Soup'
ISOM

• XML itself is fairly simple


• Most of the learning curve is
knowing about
all of the related technologies
The XML 'Alphabet Soup'
ISOM
XML Extensible Markup Defines XML documents
Language
Infoset Information Set Abstract model of XML data;
definition of terms
DTD Document Type Non-XML schema
Definition
XSD XML Schema XML-based schema language
XDR XML Data Reduced An earlier XML schema
CSS Cascading Style Sheets Allows you to specify styles
XSL Extensible Stylesheet Language for expressing
Language stylesheets; consists of XSLT and
XSL-FO
XSLT XSL Transformations Language for transforming XML
documents
XSL-FO XSL Formatting Language to describe precise layout
Objects of text on a page
The XML 'Alphabet Soup'
ISOM

XPath XML Path Language A language for addressing parts of


an XML document, designed to be
used by both XSLT and XPointer
XPointer XML Pointer Supports addressing into the
Language internal structures of XML
documents
XLink XML Linking Describes links between XML
Language documents
XQuery XML Query Language Flexible mechanism for querying
(draft) XML data as if it were a database
DOM Document Object API to read, create and edit XML
Model documents; creates in-memory
object model
SAX Simple API for XML API to parse XML documents;
event-driven
Data Island XML data embedded in a HTML page
Data Automatic population of HTML elements from XML data
Binding
The XML 'Alphabet Soup'
Schemas: Overview
ISOM

• DTD (Document Type Definitions)


Not written in XML
No support for data types or namespaces
• XSD (XML Schema Definition)
Written in XML
Supports data types
Current standard recommended by W3C
The XML 'Alphabet Soup'
Schemas: Purpose
ISOM

• Define the "rules" (grammar) of the document


 Data types
 Value bounds
• A XML document that conforms to a schema
is said to be valid
 More restrictive than well-formed XML
• Define which elements are present and
in what order
• Define the structural relationships of elements
The XML 'Alphabet Soup'
Schemas: DTD Example
ISOM

• XML document:
<BOOK ISBN=“1234567890”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>

• DTD schema:
<!DOCTYPE BOOK [
<!ELEMENT BOOK (TITLE+, AUTHOR) >
<!ATTLIST BOOK ISBN ID #REQUIRED >
<!ELEMENT TITLE (#PCDATA) >
<!ELEMENT AUTHOR (#PCDATA) >
]>
The XML 'Alphabet Soup'
Schemas: XSD Example
ISOM

• XML document:

<CATALOG>
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>

</CATALOG>
The XML 'Alphabet Soup'
Schemas: XSD Example
ISOM

<xsd:schema id="NewDataSet" targetNamespace="http://tempuri.org/schema1.xsd"


xmlns="http://tempuri.org/schema1.xsd"
xmlns:xsd="http://www.w3.org/1999/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xsd:element name="book">
<xsd:complexType content="elementOnly">
<xsd:all>
<xsd:element name="title" minOccurs="0" type="xsd:string"/>
<xsd:element name="author" minOccurs="0" type="xsd:string"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name="Catalog" msdata:IsDataSet="True">
<xsd:complexType>
<xsd:choice maxOccurs="unbounded">
<xsd:element ref="book"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>
The XML 'Alphabet Soup'
Schemas: Why You Should Use XSD
ISOM

• Newest W3C Standard


• Broad support for data types
• Reusable "components"
 Simple data types
 Complex data types
• Extensible
• Inheritance support
• Namespace support
• Ability to map to relational database tables
• XSD support in Visual Studio.NET
The XML 'Alphabet Soup'
Transformations: XSL
ISOM

• Language for expressing document


styles
• Specifies the presentation of XML
More powerful than CSS
• Consists of:
XSLT
XPath
XSL Formatting Objects (XSL-FO)
The XML 'Alphabet Soup'
Transformations: Overview
ISOM

• XSLT – a language used to


transform XML data into a different
form (commonly XML or HTML)
XML
XML,
HTML,

XSLT
The XML 'Alphabet Soup'
Transformations: XSLT
ISOM

• The language used for converting XML


documents into other forms
• Describes how the document is transformed
• Expressed as an XML document (.xsl)
• Template rules
 Patterns match nodes in source document
 Templates instantiated to form part of result
document
• Uses XPath for querying, sorting, etc.
The XML 'Alphabet Soup'
XPath (XML Path Language)
ISOM

• General purpose query language for


identifying nodes in an XML document
• Declarative (vs. procedural)
• Contextual – the results depend on
current node
• Supports standard comparison,
Boolean and mathematical operators
(=, <, and, or, *, +, etc.)
The XML 'Alphabet Soup'
XPath Operators
ISOM

Operator Usage Description


/ Child operator – selects only immediate children
(when at the beginning of the pattern, context is root)
// Recursive descent – selects elements at any depth
(when at the beginning of the pattern, context is root)
. Indicates current context
.. Selects the parent of the current node
* Wildcard
@ Prefix to attribute name (when alone, it is an attribute
wildcard)
[ ] Applies filter pattern
The XML 'Alphabet Soup'
XPath Query Examples
ISOM
./author (finds all author elements within current context)

/bookstore (find the bookstore element at the root)

/* (find the root element)

//author (find all author elements anywhere in document)

/bookstore[@specialty = "textbooks"]
(find all bookstores where the specialty
attribute = "textbooks")

//book[@style = /bookstore/@specialty]
(find all books where the style attribute = the sepciality
attribute of the bookstore element at the root)

//book[title=‘ABCD’]/author/name/text()
(find the text node for the author’s name of ‘ABCD’)

//book[title=‘ABCD’ and price > 30]


(boolean combination of two conditions)

//book[not(.//publisher=‘Addison Wesley’)]
(more like a not exists! No publisher tag with that name!)
More XPath Examples
ISOM

Path Expression Result

/bookstore/book[1] Selects the first book element that is the child of the
bookstore element

/bookstore/book[last()] Selects the last book element that is the child of the
bookstore element

/bookstore/book[last()-1] Selects the last but one book element that is the child of
the bookstore element

/bookstore/book[position()<3] Selects the first two book elements that are children of the
bookstore element

//title[@lang] Selects all the title elements that have an attribute named
lang

//title[@lang='eng'] Selects all the title elements that have an attribute named
lang with a value of 'eng'

/bookstore/book[price>35.00] Selects all the book elements of the bookstore element


that have a price element with a value greater than
35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the
bookstore element that have a price element with a
value greater than 35.00
XPath Functions
ISOM

• Accessor functions:
node-name, data, base-uri, document-uri
• Numeric value functions:
abs, ceiling, floor, round, …
• String functions:
compare, concat, substring, string-length,
uppercase, lowercase, starts-with, ends-
with, matches, replace, …
• Other functions include functions on
boolean values, dates, nodes, etc.
The XML 'Alphabet Soup'
Data Islands
ISOM

• XML embedded in an HTML document


• Manipulated via client side script or data
binding

<XML id="XMLID">
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
</XML>

<XML id="XMLID" src="mydocument.xml">


The XML 'Alphabet Soup'
Data Islands
ISOM

• Can be embedded in an HTML


SCRIPT element
• XML is accessible via the DOM:
<SCRIPT language="xml" id="XMLID">
<SCRIPT type="text/xml" id="XMLID">
<SCRIPT language="xml" id="XMLID"
src="mydocument.xml">
The XML 'Alphabet Soup'
XML-Based Applications
ISOM

• Microsoft SQL Server


 Retrieve relational data as XML
 Query XML data
 Join XML data with existing database tables
 Update the database via XML Updategrams
 New XML data type in SQL 2005
• Microsoft Exchange Server
 XML is native representation of many types of data
 Used to enhance performance of UI scenarios (for
example, Outlook Web Access (OWA))
Agenda
ISOM

• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
XML as a Meta-Language
ISOM

SAX A Language to CSS


DOM
create Languages
DSSL
XSL

XLL XML/DTD
XSLT

XSchema GO

CML
XPath
MathML
WML
XPointer XQL BeanML
Gene Ontology (GO)
ISOM

• Describing and manipulating information about the


molecular function, biological process and cellular
component of gene products.
• Gene Ontology website:
 http://www.geneontology.org
• GO DTD:
 ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd
• GO Browsers and tools:
 http://www.geneontology.org/#tools
• GO Resources and samples:
 http://www.geneontology.org/#annotations
Math ML
ISOM

• Describing and manipulating mathematical


notations
• MathML website
 www.w3.org/Math
• MathML DTD
 www.w3.org/Math/DTD
• MathML Browser
 www.w3.org/Amaya
• MathML Resources
 www.webeq.com/mathml see sample documents here
Chemical ML
ISOM

• Representing molecular and chemical information


• CML website
 www.xml-cml.org
• CML DTD
 www.xml-cml.org/dtdschema/index.html
• CML Browser and Authoring Environment
 www.xml-cml.org/jumbo.html
• CML Resources
 www.xml-cml.org/chimeral/index.html
 see sample documents here
 some require plug-in downloads, can be slow
Wireless ML
ISOM

• Allows web pages to be displayed over mobile devices


• WML works with WAP to deliver the content
• Underlying model: Deck of Cards that the User can sift
through
• WAP/WML website
 www.wapforum.org
• WML DTD
 www.wapforum.org/DTD/wml_1.1.xml
• WAP/WML Resources
 www.oasis-open.org/cover/wap-wml.html
 www.w3scripts.com/wap Tutorial on WML, also see WAP Demo
Scalable Vector Graphics
ISOM

• Describing vector graphics data for use over the web


• Rendering is done on the browser
• Bandwidth demands lower, scaling easier
• SVG website
 www.w3.org/Graphics/SVG
• SVG Plug-Ins
 www.adobe.com/svg
• SVG Resources
 www.irt.org/articles/js176 1999 article and good, brief tutorial
 planet.svg An Example from Deitel
Bean ML
ISOM

• Describing software components such as Java Beans


• Defines how the components are interconnected and
can be used
• Bean ML Specs and Tools
 www.alphaworks.ibm.com/aw.nsf/techmain/bml
• Bean ML Resources
 www.oasis-open.org/cover/beanML.html
 With Bean ML
• You can mark-up beans using Bean ML
• And invoke different operations on Beans
• Includes BML Scripting Framework
XBRL
ISOM

• Extensible Business Reporting Language


• Capturing and representing financial and accounting information
• Variety of situations
 e.g. publishing reports, extracting data for analysis, regulatory forms
etc.
• Initiated under the direction of AICPA
• XBRL website
 www.xbrl.org
• XBRL DTDs and Schemas
 http://www.xbrl.org/Core/2000-07-31/default.htm
• Demos and Tools
 http://www.xbrl.org/Demos/demos.htm
 http://www.xbrl.org/Tools.htm
News ML
ISOM

• Designed to be media-independent
• Initiated by International Press
Telecommunications Council
• Enables tracking of news stories over time
• NewsML website
 www.newsml.org
• NewsML DTD
 http://www.oasis-open.org/cover/newsML.html
• SportsML DTD – Derived from NewsML DTD
 http://xml.coverpages.org/sportsML.html
cXML
ISOM

• CommerceXML from Ariba plus 40 other companies


• cXML website
 www.cxml.org
• Primary Set of Tools/Implementations to support
cXML
 http://www.ariba.com/solutions/solutions_overview.cfm
 See also Whitepapers link explaining how these can be
used for
• E-procurement
• E-fulfillment
• And others ..
xCBL
ISOM

• xCBL from Microsoft, SAP, Sun


• xCBL website
 www.xcbl.org
 Marketed as XML component library for B2B
e-commerce
• Available Resources (see internal links)
 DTDs and Schemas
 XDK: SOX Parser and an XSLT Engine
 Example Documents
ebXML
ISOM

• UN/CEFACT: the United Nations body whose mandate covers


worldwide policy and technical development in the area of trade
facilitation and electronic business.
 www.uncefact.org
• ebXML website
 www.ebxml.org
• Current Endorsements
 http://www.ebxml.org/endorsements.htm
 Still needs buy-in from the larger IS/IT vendors

• Related Effort: RosettaNet


 http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/LayoutInitial
 Business Processes for IT, Component and Chip companies
Conclusion
ISOM

• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Resources
ISOM

• http://www.xml.com/
• http://www.w3.org/xml/
• http://www.w3schools.com/
• http://msdn.microsoft.com/xml/

You might also like