You are on page 1of 47

XML

MANIPAL UNIVERSITY JAIPUR


What is XML

 XML stands for Extensible Markup Language


 Used to describe documents and data in a standardized, text-based format
 Based on Standard Generalized Markup Language (SGML)
 SGML was created in 1974, became International Organization for Standardization (ISO) standard in 1986
 HTML - a common language for sharing technical documents
 Standardized way of describing document layout and display
 But , there was no standardized way to describe and share data that was stored in the document.
 Ex: closing prices of a share of every company
 In 1998 the World Wide Web Consortium (W3C) recommended first Extensible Markup Language (XML)
What is XML

 A markup language is used to provide information about a document.


 Tags are added to the document to provide the extra information.
 HTML tags tell a browser how to display the document.
 XML tags give a reader some idea what some of the data means.
 A tag-based meta language
 Designed for structured data representation
 Represents data hierarchically (in a tree)
 Provides context to data (makes it meaningful)
 Self-describing data
 Separates presentation (HTML) from data (XML)
 An open W3C standard
What is XML?

 XML is a “use everywhere” data specification

XML XML

Application X

XML XML Configuration


Documents

Repository Database
Example of an HTML Document

<html>
<head><title>Example</title></head>
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>
Example of an XML Document

<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
Difference Between HTML and XML

 HTML tags have a fixed meaning and browsers know what it is.
 XML tags are different for different applications, and users know what they mean.
 HTML tags are used for display.
 XML tags are used to describe documents and data.
XML Explained

 Simple file format without XML


 Product information (ID, name, and price) 
XML Explained

 You might need to resort to adding a header that indicates the version of the file
XML Explained

 Add a special sequence of characters (##Start##) to show where each new record begins
Improving the List with XML
XML Elements

 Element names can contain letters, numbers, hyphens, underscores, periods, and colons
<element></element>
 Element names cannot contain spaces
 Element names can start with a letter, underscore, or colon, but cannot start with other non-
alphabetic characters or a number, or the letters xml
 hyphens or periods – not suggested
 Elements with no attributes or text can also be represented in an XML document like this:
<element/>
XML Attributes

 Attributes contain values that are associated with an element


<element attribute=”value”></element>
 Attribute name must follow an element name, then an equals sign
(=), then the attribute value, in single or double quotes
Text

 Text is located between the opening and closing tags of an element, and usually represents
the actual data associated with the elements and attributes that surround the text:
<element attribute=”value”>text</element>
XML Document Structure

<?xml version="1.0" encoding="UTF-8"?>


<rootelement>
<firstelement position="1">
<level1 children="0">
This is level 1 of the nested
elements
</level1>
</firstelement>
<secondelement position="2">
<level1 children="1">
<level2>
This is level 2 of the nested
elements
</level2>
</level1>
</secondelement>
</rootelement>
XML Basics

 XML elements are composed of a start tag (like <Name>) and an end tag (like </Name>)
 Whitespace between elements is ignored.
 XML elements are case sensitive
 All elements must be nested in a root element.
 Ex: the root element is <SuperProProductList>
 Every element must be fully enclosed.
<Product><ID></ID></Product> is valid, but
<Product><ID></Product></ID> isn’t
 XML documents must start with an XML declaration like <?xml version="1.0"?>
XML Basics (Attributes)

 Attributes add extra information to an element.


 Use subelements or attributes

You can use single or double quotes


around any attribute value.
XML Basics (Comments)

 Comments are bracketed by the <!-- and --> character sequences.


Syntax and Structure
Components of an XML Document
<?xml version=“1.0” ?>
<?xml-stylesheet type="text/xsl” href=“template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type=‘string’> </ELEMENT3>
<ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>
</ROOT>

Elements with Attributes


Elements
Prologue (processing instructions)
Syntax and Structure
Rules For Well-Formed XML

 There must be one, and only one, root element


 Sub-elements must be properly nested
 A tag must end within the tag in which it was started
 Attributes are optional
 Defined by an optional schema
 Attribute values must be enclosed in “” or ‘’
 Processing instructions are optional
 XML is case-sensitive
 <tag> and <TAG> are not the same type of element
Syntax and Structure
Well-Formed XML?

 No, CHILD2 and CHILD3 do not nest properly

<xml? Version=“1.0” ?>


<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
Syntax and Structure
Well-Formed XML?

 No, there are two root elements

<xml? Version=“1.0” ?>


<PARENT>
<CHILD1>This is element 1</CHILD1>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>
Syntax and Structure
Well-Formed XML?

 Yes
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>
Syntax and Structure
An XML Document
<?xml version='1.0'?>
<bookstore>
<book genre=‘autobiography’ publicationdate=‘1981’
ISBN=‘1-861003-11-0’>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’>
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
What is XML

The three pillars of XML are


 Extensibility: XML does a great job of describing structured data as text, and the format is open to
extension. This means that any data that can be described as text and that can be nested in XML
tags will be generally accepted as XML. The only limits are imposed on the data by the data itself,
via syntax rules and self-imposed format directives via data validation
 Structure: The structure of XML is usually complex and hard for human eyes to follow, but it’s
important to remember that it’s not designed for us to read. XML parsers and other types of tools
that are designed to work with XML easily digest XML, even in its most complex forms.
 Validity: Aside from the mandatory syntax requirements that make up an XML document, data
represented by XML can optionally be validated for structure and content, based on two separate
data validation standards. The original XML data validation standard is called Data Type Definition
(DTD), and the more recent evolution of XML data validation is the XML Schema standard.
What Is XML Not?

 While XML facilitates data integration by providing a transport with which to send and receive data in a
common format, XML is not data integration. It’s simply the glue that holds data integration solutions
together with a multi-platform “lowest common denominator” for data transportation.
 XML cannot make queries against a data source or read data into a repository by itself.
 Similarly, data cannot be formatted as XML without additional tools or programming languages that
specifically generate XML data from other types of data.
 Also, data cannot be parsed into destination data formats without a parser or other type of application
that converts data from XML to a compatible destination format.
 It’s also important to point out that XML is not HTML. XML may look like HTML, based on the
similarities of the tags and the general format of the data, but that’s where the similarity ends. While
HTML is designed to describe display characteristics of data on a Web page to browsers, XML is
designed to represent data structures.
Exercise

 Write an XML representation for the following information

address

name email phone birthday

first last year month day


<?xml version = “1.0” ?>
<address>
<name>
<first>Alice</first>
<last>Lee</last>
address
</name>
<email>alee@aol.com</email>
<phone>123-45-6789</phone> name email phone birthday
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day> first last year month day
</birthday>
</address>
Namespace
Namespaces
 XML data has to be exchanged between organizations
 Same tag name may have different meaning in different organizations,
causing confusion on exchanged documents
 Specifying a unique string as an element name avoids confusion
 Better solution: use unique-name:element-name
 Avoid using long unique names all over document by using XML Namespaces
<university xmlns:yale=“http://www.yale.edu”>

<yale:course>
<yale:course_id> CS-101 </yale:course_id>
<yale:title> Intro. to Computer Science</yale:title>
<yale:dept_name> Comp. Sci. </yale:dept_name>
<yale:credits> 4 </yale:credits>
</yale:course>

XML Namespaces

 Namespaces are a method for separating and identifying duplicate XML element names in an XML document.
 In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from
different XML applications.
 Namespaces can also be used as identifiers to describe data types and other information.
 The namespace can be defined by an xmlns attribute in the start tag of an element.
 The namespace declaration has the following syntax.

xmlns:prefix="URI".
 Define the prefix in the xmlns attribute by inserting a colon (:) followed by the characters you want to use for the prefix
 Note: The namespace URI is not used by the parser to look up information. The purpose of using an URI is to give the
namespace a unique name. However, companies often use the namespace as a pointer to a web page containing
namespace information
XML Namespaces

<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/"> This XML carries information about an
<h:tr> HTML table, and a piece of furniture. If
<h:td>Apples</h:td> these XML fragments were added together,
<h:td>Bananas</h:td>
there would be a name conflict. Both
contain a <table> element, but the
</h:tr>
elements have different content and
</h:table>
meaning. A user or an XML application
will not know how to handle these
<f:table xmlns:f="http://www.w3schools.com/furniture"> differences.
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
XML Namespaces

 To assign a namespace to all the elements, you can use an xmlns attribute to assign a URI to the default namespace:

<?xml version="1.0" encoding="UTF-8"?>


<rootelement xmlns="www.mit.edu">
<firstelement position="1">
<level1>level1 elements</level1>
</firstelement>
<secondelement position="2">
<level1 children="1">
<level2>level2 elements</level2>
</level1>
</secondelement>
</rootelement>

Here firstelement and secondelement belong to the same namespace as rootelement


Namespaces: Declaration

Namespace declaration examples:


xmlns: bk = “http://www.example.com/bookinfo/”

xmlns: bk = “urn:mybookstuff.org:bookinfo”

xmlns: bk = “http://www.example.com/bookinfo/”

Namespace declaration Prefix URI (URL)


Namespaces: Examples

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”
xmlns:money=“urn:finance:money”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE money:currency=‘US Dollar’>
19.99</bk:PRICE>
Namespaces: Default Namespace

 An XML namespace declared without a prefix becomes the default namespace for all
sub-elements
 All elements without a prefix will belong to the default namespace:

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
Namespaces: Scope

 Unqualified elements belong to the inner-most default namespace.


 BOOK, TITLE, and AUTHOR belong to the default book namespace
 PUBLISHER and NAME belong to the default publisher namespace

<BOOK xmlns=“www.bookstuff.org/bookinfo”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns=“urn:publishers:publinfo”>
<NAME>Microsoft Press</NAME>
</PUBLISHER>
</BOOK>
XML Schema

 An XML Schema describes the structure of an XML document.


 The XML Schema language is also referred to as XML Schema Definition (XSD).
 XML Schema is itself specified in XML syntax.
 XML Scheme is integrated with namespaces
Purpose of XML Schema

 Define the legal building blocks of an XML document


 the elements and attributes that can appear in a document
 the number of (and order of) child elements
 data types for elements and attributes
 default and fixed values for elements and attributes
Strength of XML Schemas

 It is easier to describe allowable document content


 It is easier to validate the correctness of data
 It is easier to define data facets (restrictions on data)
 It is easier to define data patterns (data formats)
 It is easier to convert data between different data types
Example
<xsd:element
<xsd:elementname=“paper”
name=“paper”type=“papertype”/>
type=“papertype”/>
<xsd:complexType
<xsd:complexTypename=“papertype”>
name=“papertype”>
<xsd:sequence>
<xsd:sequence>
<xsd:element
<xsd:elementname=“title”
name=“title”type=“xsd:string”/>
type=“xsd:string”/>
<xsd:element
<xsd:elementname=“author”
name=“author”minOccurs=“0”/>
minOccurs=“0”/>
<xsd:element
<xsd:elementname=“year”/>
name=“year”/>
<xsd:
<xsd:choice>
choice><<xsd:element
xsd:elementname=“journal”/>
name=“journal”/>
<xsd:element
<xsd:elementname=“conference”/>
name=“conference”/>
</xsd:choice>
</xsd:choice>
</xsd:sequence>
</xsd:sequence>
</xsd:element>
</xsd:element>
Example <?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema
targetNamespace="https://www.w3schools.com"
xmlns="https://www.w3schools.com"
<?xml version="1.0"?>
elementFormDefault="qualified">
<note>
  <to>Tove</to>
<xs:element name="note">
  <from>Jani</from>
  <xs:complexType>
  <heading>Reminder</heading>
    <xs:sequence>
  <body>Don't forget me this weekend!</body>
      <xs:element name="to" type="xs:string"/>
</note>
      <xs:element name="from" type="xs:string"/>
      <xs:element name="heading" type="xs:string"/>
      <xs:element name="body" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

</xs:schema>
Reference to External XSD

<?xml version="1.0"?>

<note
xmlns="https://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.w3schools.com/xml note.xsd">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>
XSD Simple Elements

 Defining a Simple Element


 <xs:element name="xxx" type="yyy"/>

 built-in data types


• xs:string
• xs:decimal
• xs:integer
• xs:boolean
• xs:date
• xs:time
Example

 Here are some XML elements:


<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-03-27</dateborn>

 And here are the corresponding simple element definitions:


<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
Default and Fixed Values for Simple
Elements

 <xs:element name="color" type="xs:string" default="red"/>

 <xs:element name="color" type="xs:string" fixed="red"/>


XSD Attributes

 How to Define an Attribute?


 <xs:attribute name="xxx" type="yyy"/>

 Here is an XML element with an attribute:


<lastname lang="EN">Smith</lastname>
 And here is the corresponding attribute definition:
<xs:attribute name="lang" type="xs:string"/>

You might also like