You are on page 1of 23

eXtensible Markup Language (XML)

A comment by Tim Bray of Sun Microsystems on Celebration of


10th Anniversary of XML in Feb 2008.
"There is essentially no computer in the world, desk-top, hand-held,
or back-room, that doesn't process XML sometimes. This is a good
thing, because it shows that information can be packaged and
transmitted and used in a way that's independent of the kinds of
computer and software that are involved. XML won't be the last
neutral information wrapping system; but as the first, it's done very
well."
A Programme Under the compumitra Series
Copyright 2010-14 © Sunmitra Education Technologies Limited, India
Outline

 XML Eye-opener.
 What is XML?
 HTML vs. XML.
 Basic XML Syntax.
 Constituents.
 Some XML Rules.
 Element Vs. Attribute.
 Node Naming Principles.
 Advanced Concepts related to XML
 Future of XML
What is XML-1

 XML is abbreviation of
eXtensible Markup Language.

 XML evolved from more general


purpose ISO standard SGML
(Standard Generalised Markup
Language).

 All Data needs Description to make


it some useful Information. XML
provides a neat solution.

 XML looks like normal English but it


has been designed to be machine
readable.
What is XML-2

 XML can store data

 XML can help standardization in


exchange of data.

 User defined markup tags to name


dataitems.

 Library Functions are available in most


programming languages to parse XML.

 The syntax looks like


<addressbook>
<adrrecord>
<name>Name1</name>
<address>Address1</address>
<city>City1</city>
</adrrecord>
</addressbook>
Understanding Basic XML Syntax

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<COUNTRYLIST> XML Declarations:
<COUNTRY group="G20"> Root Element Node
<NAME>India</NAME> Version: of XML
<CODE>IN</CODE>
Encoding: Character-set
<ISD>91</ISD>
<CAPITAL largestcity="No">New Delhi</CAPITAL> Used. UTF-8 is common
<LCITY>Mumbai</LCITY> (unicode 8 bit variant)
<CURRENCY>Indian Rupee</CURRENCY>
Standalone=Yes, depicts
<CURCODE>INR</CURCODE>
</COUNTRY> Attribute Node non-usage of external
<COUNTRY group="G5"> type definitions
<NAME>Japan</NAME>
<CODE>JP</CODE> Element
<ISD>81</ISD>
Node
<CAPITAL largestcity="Yes">Tokyo</CAPITAL>
<LCITY>Tokyo</LCITY>
<CURRENCY>Yen</CURRENCY> Element Value
<CURCODE>JPY</CURCODE>
</COUNTRY>
</COUNTRYLIST>
Attribute Value
XML Constituents Parsable Character data (PCDATA)
between element <address> start and end
tags.
 Elements
<address><name>somename</name></address>
Attribute has a name and a value in
 Attributes
quotes.
<Book Version="1.0"><name></name></Book>

 Five predefined Entities to allow for special charaters in the PCDATA


area.
> to &gt;
< to &lt;
& to &amp;
' to &apos;
" to &quot;

 CDATA section (Character Data Not to be parsed). This is meant for


putting lot of code like or general purpose data. Even HTML data can
be put here.
<![CDATA[ ... ]]>

 Processing Instructions (PI) or Directives given betweem <? ?>


<?xml-stylesheet type="text/css" href="mySheet.css"?>
or even initial declaration like below is a PI
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Some XML Rules - 1
 All elements to have closing tags.
<address>invalid syntax
<address>valid syntax</address>

 All elements are case sensitive.


<Name>incorrect</name>
<Name>correct</Name>

 Elements shall be correctly nested.


<address><name>incorrect</address></name
>
<address><name>correct</name></address>

 Attribute values must be quoted.


<Book Version=1.0><name></name></Book>
(Incorrect)
<Book
Version="1.0"><name></name></Book>
(correct)
Some XML Rules - 2
 XML Document must have a root element and only one root element
(it can have any name though).
<root>
<Child>correct</child>
</root>

 Entities in data values must use special codes.


> as &gt; < as &lt; & as &amp; ' as &apos; " as &quot;

 Comments has this syntax.


<!– This is a comment -->
Comments can not contain – in its text matter.

 Whitespace are preserved as against HTML. For e.g.


"Hello World" in HTML would be "Hello World". In XML it will retain
exact spaces specified.

 Empty Elements have this kind of optional format.


<Name />
Some XML Rules - 3
 Whitespace are preserved as against
HTML.
For e.g.
"Hello World" in HTML would be
"Hello World".
In XML it will retain exact spaces
specified.

 The optional style of writing empty


elements is.
<Name /> in place of <Name></Name>
XML Practice: Element Vs Attributes - 1

 It is generally possible to define all data as


ELEMENT tags in a tree format.
<Library>
<Book>
<ID>201</ID>
<ISBN>8175257660</ISBN>
<Author>Name1</Author>
<Title>Book Title</Title>
</Book>
</Library>
 A neat alternative to above could be using
ATTRIBUTES as follows:
<Library>
<Book ID="201" ISBN="8175257660">
<Author>Name1</Author>
<Title>Book Title</Title>
</Book>
</Library>
XML Practice: Element Vs Attributes -2

 Which method to use is a thoughtful decision.


 Information that is surely singular (will not be
repeated) and is not domain specific is recommended
as ATTRIBUTE.
 If you are unable to classify or the Information can be
repeated (For e.g. Author tag can be repeated in
above example) should be used as ELEMENT.

 Even better format for previous example would be


<Library>
<Book ID="201">
<ISBN>8175257660</ISBN>
<Author>Name1</Author>
<Title>Book Title</Title>
</Book>
</Library>
This is because ISBN is a book related property while ID
may be related to a storage place.
XML Node Naming – Begins with

 Node (elements or attributes) names shall


begin with a letter or _ (underscore).
<1STLINE></1STLINE> invalid element naming
<LINE1></LINE1> valid naming

<BOOK 1Ver="1.00"></BOOK> invalid attribute naming


<BOOK _Ver="1.00"></BOOK> valid attribute naming
XML Node Naming – Consists of
 Name can consist of
 Any English Character or even any foreign language
character as allowed by the encoding set given in the
declaration.
<Name>Sun</Name>
<नाम>सरू ज</नाम>
A dot (.) or hyphen (-) or _(undescore)
<Address.Cityname>Delhi</Address.Cityname>
<Address-Cityname>Delhi</Address-Cityname>
<Address_Cityname>Delhi</Address_Cityname>

Tabs and Spaces are not allowed in


XML Node Names.
XML Node Naming – Based on
Namespace
 Name can belong to a namespace
 Table may be used in html or furniture. One can
resolve this problem by using namespaces as follows
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>

<f:table>
<f:name>Dining Table</f:name>
<f:width>120</f:width>
<f:length>230</f:length>
</f:table>
HTML Vs XML - 1

 Similarities.
 Both Uses markup tags
(elements and attributes) e.g.
<H1>Heading1</H1> or <font
face="Verdana"></font>.
 Both use entities e.g. &lt; &gt;
etc.
 Both are derived from SGML
HTML Vs XML - 2

 Differences.
 HTML has predefined tags, XML
tags are user defined.
 HTML is for Humans and errors
are ignored. XML is for
computers as data storehouse or
definitions so errors can not be
ignored.
 HTML is usually not updated by
programs while XML is meant for
program based writing.
 HTML has large number of
entities. XML has just five.
XSL (Extensible Stylesheet Language)

 Unlike HTML styling using CSS (Cascade


Style Sheet) it has tags that are user
defined.
 It has three parts
 XSLT (XSL Transformation): for showing XML
data as transformed XHTML onto a webpage.
 Xpath: a way to reach a particular data-item in
an XML file. This is very often useful in
reading XML based configuration files.
 XSL-FO (XSL Formatting Objects): Provides a
display/print formatting mechanism for XML
data.
DTD (Document Type Definition)

 A DTD is referred within a DOCTYPE


declaration in an XML file such as.
<!DOCTYPE note SYSTEM "Note.dtd">
 This DTD file will have the format as
XML file has the root node
follows.
<!DOCTYPE note named note with four sub-
[ elements.
<!ELEMENT note
(to,from,heading,body)> The sub-
elements have
<!ELEMENT to (#PCDATA)> the PCDATA
<!ELEMENT from (#PCDATA)> format.
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Parsing XML

 Process of reading XML file and extracting


valid data out of it is called "PARSING".
 Parsers are of two types
 Non-Validating Parser: When the document
doesn't check against a validating DTD.
 Validating Parser: When a document is
checked against its DTD
Some Advanced Concepts Related to
XML

 XML Schema: Relates to defining


validation rules in form of XSD
(XML Schema Definition) files that
too are in the XML format.

 XQuery: This is a way to search


within an XML file and get the
selected nodes that match the
criteria.
Where to View/Edit

 Browsers: Most Browsers are good at viewing


XML. Internet Explorer is particularly good at it.

 Editors: Special Editors are available that allow


good XML views/editing facilities. Microsoft's
XML Editor, Peter's XML editor are good at it.

 Office Tools: MS-Word, Frontpage like tools


provide good XML Editing. Even MS-Excel
support XML file opening.

 Visual Studio/WebDeveloper: They provide


excellent environment for XML editing and
viewing along with validation support.
Let's Quickly Revise

 2 Types of Nodes: Elements and Attributes. Elements


are repeatable. Attributes can always be put up like
elements, reverse may not be true.

 Special syntax for non-parsable data as CDATA.

 5 Entities for special symbols( <, >, ', ", &).

 HTML style Comments Allowed. <!-- comments --


>

 Case-Sensitive. Closing Required

 One can apply other Processing Instructions (PI) that


is enclosed with in <? ?>. First line is usually a
Version declaration line which is also a PI.

 Always have a single root node.


Future of XML
 All websites may one day be written in XML.
HTML has already been re-standardised as
XHTML which provides better syntax checking
and browser compatibility.

 XML promises to be the most open system for


storage of information from all IT gadgets like
Desktops to Mobile phones to ipods to ipads to
DVD players to microwave-ovens etc. It is already
being used and it is expected to be used in more
and more devices.

 All office documents/e-books offline and online


shall ultimately be in XML as it is the sole non-
proprietary format that is simple and is able to
meet the needs well.

You might also like