You are on page 1of 39

XML Basic

Lecture 2

XML video: https://www.youtube.com/watch?v=KeLiQXqVgMI


XML
 XML is the “eXtensible Markup Language”
 XML become a W3C Recommendation in 1998.
 It is a markup language for documents containing
structured information

 X means eXtensible – you can make your own tags.


 XML uses a Document Type Definition (DTD) or an
XML Schema to describe data

XML validator: https://www.freeformatter.com/xml-validator-xsd.html


Markup Language
 Markup Language defines a set of rules that used to encode
document in a format that is both human-readable and
machine-readable.
XML and HTML
 XML is not a replacement for HTML
 XML and HTML were designed with different goals
 XML is used to describe data and focus on what data is
 HTML is used to display data and focus on how data looks
 XML is created to structure, store and to send information
HTML XML
to mark up text so it can be displayed to to mark up data so it can be processed by
users computers
HTML describes both structure (e.g. <p>, XML describes only content, or “meaning”
<h2>) and appearance (e.g. <br>, <font>)

HTML uses a fixed, unchangeable set of In XML, you make up your own tags
tags
XML and HTML cont..
Simple XML Document
 How to write XML Document
 Open a text editor, such as Notepad
 Save your XML file as “~.xml”
Simple XML Document cont..
 How to write XML Document
 Open the XML using a web browser
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<Notebook> <Notebook>
<Processor>Intel Core i7-2710QE</Processor> <Processor>Intel Core i7-2710QE</Processor>
<Memory>DDR3L 1600</Memory> <Memory>DDR3L 1600</Mem>
<HardDisk>SSHD 1TB</HardDisk> <HardDisk>SSHD 1TB</HardDisk>
</Notebook> </Notebook>

Well formed XML Not Well formed XML


XML Parser
 XML Parser is an API that reads the XML document, gets
its content based on the structure, and provides the
programming interfaces to user.

 Popular XML Parser APIs:


 DOM (Document Object Model)
 SAX (Simple API for XML)
Well-formed XML and Valid XML
 Well-formed XML
 A document that adheres to the syntax rules specified by the
XML 1.0 specification.
 Valid XML
 XML contains a reference to a Document Type Definition (DTD),
and that its elements and attributes are declared in that DTD and
follow the grammatical rule for them that the DTD specifies.

Syntax Rule:
• XML documents must have a root element
• XML elements must have start and end tags
• XML tags are case sensitive
• XML elements must be properly nested
• XML attributed must always be quoted
XML Example
<?xml version="1.0" encoding="UTF-8"?> XML declaration
<!-- bookstore.xml --> XML comment
<bookstore> Root element start-tag
<book ISBN="101223547"> (one and only one root)
<title>Data Structure</title> First child element start-tag
<author>Willian Wong</author> (with an attribute of ISBN)
<year>2016</year>
</book> First child element end-tag
<book ISBN="121329748"> Second child element start-tag
<title>Guide to Travelling</title>
<author>Steven Muthu</author>
<author>Mary Ong</author>
<author>Ali Ahmad</author>
<year>2013</year>
Second child element end-tag
</book>
Root element end-tag
</bookstore>
Logical View of XML
Root element
bookstore Solid line

First child Rectangle shape Second child


element element

Attribute is book book


written in ISBN = 101223547 ISBN = 121329748 Attribute is
new line
written in
new line

title author year title author author author year

Data Structure 2016 Guide to Travelling Mary Ong 2013


Willian Wong Steven Muthu Ali Ahmad

Content
XML Declaration
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

 It is a processing instruction that identifies the document as


being XML.
 if present, it must be placed as the first line in the document.
Parameter Value Description
version 1.0, 1.1 • mandatory
• to specify the version of the XML document

encoding UTF-8, UTF-16, ISO- • optional


10646-UCS-2, ISO- • to specify the encoding style of XML document.
10646-UCS-4, ISO- • UTF-8 is the default encoding used.
8859-1 to ISO-8859-9

standalone yes or no • optional


• It informs the parser whether the document relies on the
information from an external source for its content.
• Default value is no (parser will accept external resources)
XML Declaration cont..
Element
 XML element is represented by tags.
 It behaves as containers to hold text, elements, attributes,
media objects or all of these.
 Element usually consists of start tag, content, and an end tag.

 Content is optional.
 Empty element is an element without content.
 Example of empty element:
 <product />
 <product></product>
 <product code="1345" />
Root Element
 There is exactly ONE element, called the root, or
document element.
<?xml version=“1.0”>
<Book>
<Title>Java Programming</Title>
<Author>Jimmy Kong</Author>
</Book>

<?xml version=“1.0”>
<Title>Java Programming</Title> <?xml version=“1.0”>
<Author>Jimmy Kong</Author> This is unvalid xml document
Element Example
<article> Start Tag
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Content of the Element
(Subelements and/or Text)
End Tag Element
Attribute
 XML elements can have zero and more attributes in the
start tag.
 Attribute are specified on the start tag of an element
 An attribute consists of name and value pair.
 Attribute must be quoted.

 Example of attributes:
 <student id="1011">John</student>
Attribute Example
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=“1” title=“Introduction”>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Attributes with name and value
Naming Rules of Element and Attribute
 Names can contain letters, numbers, and other characters.
 Names must not start with a number or punctuation
character.
 Names cannot contain spaces.
 Names must not start with the letters xml or XML.
 Colon should not be used in XML names except for
namespace purposes.
Comments
 Comments begin with <!-- and end with -->.
 Comments can be placed anywhere in a document outside
other markup.
 Comments are not part of the textual content of an XML
document.
 The string “--” (double-hyphen) MUST NOT occur within
comments.
Character Data
 The text between the start and end tags is defined as
‘character data’.
 Character data may be any legal (Unicode) except ( < or >
or ” or ’ or &)
 Character data is classified into:
 PCDATA
 CDATA

Examples of Character Data:


<author>Muthu</author>
<author>Muthu & Sammy </author> - not well formed!
XML Example 1
<?xml version=“1.0”?> XML declaration
<guestbook>
<entry date=“19.05.2016”>
<author>Siti</author> Element
<level>great</level>
<email href=“siti@gmail.com”/>
</entry> Character Data
<entry date=“22.06.2016”>
<author>Kelly</author>
<level>Not satisfy</level> Empty element
<email href=“Kelly@gmail.com”/>
</entry> Attribute
</guestbook>
Comment
<!-- last
Document accessed:
element 01.7.2016 -->
aka root element
XML Example 2
Change the tabular data format
into XML document and vice
verse.
Character Data Sections (CDATA)
 CDATA are defined as blocks of text that are not parsed
by the parser, but are otherwise recognized as markup.
 It is used when the data contains a lot of characters (like &
or <) that would illegal in XML markup.
 The format:
<![CDATA[some text data]]>
 Restrictions of the CDATA section are:
- CDATA section cannot contain the string “]]>”.
- CDATA section nesting is not allowed.
Character Data Sections (CDATA) cont..
 Example:
<Section>
<![CDATA[
Since this is a CDATA section
I can use all sorts of reserved characters
like > < " and & or write things like
<foo></bar>
but my document is still well formed!
]]>
</Section>
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" >
<head> <title>CDATA Example</title></head>
<body> Example
<h2>Using a Comment</h2>
<div id="commentExample">
<!--
You won't see this in the document
and can use reserved characters like
<>&"
-->
</div>

<h2>Using a CDATA Section</h2>


<div id="cdataExample">
<![CDATA[
You will see this in the document
and can use reserved characters like
<>&"
]]>
</div>
</body>
</html>
Parsed Character Data (PCDATA)
 Parsed Character Data (PCDATA) consists of all those
character that XML treats as parts of the code of XML
document.
 It is text that will be parsed by a parser.
 This includes:
 XML declaration
 Opening and closing tags of an element
 Empty element tags
 Character or entity references
 However, PCDATA should not contain any &, <, or >
characters.
Processing Instructions
 It is used to pass information to applications in a way that
escapes most XML rule.
 The format: <? target instruction ?>
 Target names can start with a letter or underscore, followed by
zero or more letters, digits, periods, hyphens, or underscores
 Example:
<?xml version='1.0'?>

<?xml-stylesheet href="mystyle.css" type="text/css"?>

<?xml-stylesheet type="text/xsl" href="show_book.xsl"?>


Entity References
 It is a way to shorten and modularize the XML documents.
 It is used to represent specific characters (that are generally
difficult to produce on a standard keyboard) that would be
illegal to type in markup.
 Five special characters must be written as entity reference:
XML Namespace
 Problem of XML
 Pose problems of recognition and collision in having identical
names, but different definitions.

 XML Namespace
 To provide uniquely named elements and attributes to avoid name
conflicts in an XML document
 If each vocabulary is given a namespace, the ambiguity between
XML Namespace cont..
 XML Namespace Declaration
 Using ‘xmlns’ attribute

 Applying Namespaces
 Element or attribute name starts with the prefix and “:”

 The namespace URI is not used by the parser to look up information


 The purpose of using an URI is to give the namespace a unique
name
XML Namespace Example 1
<?xml version="1.0" encoding="UTF-8"?>
<record>
<h:table xmlns:h="http://www.w3.org/TR/table/">
<h:tr>
<h:td>Computer</h:td>
<h:td>Printer</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="https://www.w3schools.com/furniture">
<f:name>Wooden Coffee Table</f:name>
<f:width>80</f:width>
<f:length>90</f:length>
</f:table>
</record>
• The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
• The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
• When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
XML Namespace Example 2
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns:h="http://www.w3.org/TR/table/" Namespaces can be declared
xmlns:f="https://www.w3schools.com/furniture"> in the XML root element

<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>

<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>

</record>
Default Namespace
 It is an unprefixed declaration of Namespace.
 It applies to all unprefixed element names within its
scope.
xmlns="namespaceURI"

 Example:
<?xml version="1.0" encoding="UTF-8"?>
<Notebook xmlns="http://www.notebook.com">
<Processor>Intel Core i7-2710QE</Processor>
<Memory>DDR3L 1600</Memory>
<HardDisk>SSHD 1TB</HardDisk>
</Notebook>
Well-formed XML
An XML document with correct syntax is called "Well
Formed".
 Every XML document must have a root element
 Every element must have both start tag and end tag
 e.g. <name> ... </name>
 Empty-element tag must be properly closed, e.g., <book />.
 XML tags are case sensitive
 Elements must be properly nested
 e.g. <book><title>...</book></title> is incorrectly nested.
 Attribute values must be enclosed in single or double quotes
 e.g. <time unit="days">
Benefit of using XML
 XML separates data from HTML
 With XML, data can be stored in separate XML files. Thus, changes
in the underlying data will not require any changes to the HTML.
 XML simplifies data sharing
 XML data is stored in plain text format. This provides a software-
and hardware-independent way of storing data.
 This makes it much easier to create data that can be shared by
different applications.
 XML simplifies data transport
 Exchange data between incompatible systems over the Internet is
troublesome and time-consuming.
 Exchanging data as XML greatly reduces this complexity, since the
data can be read by different incompatible applications.
Benefit of using XML cont..
 XML simplifies platform change
 Upgrading to new systems (hardware or software platforms), is
time consuming. Large amounts of data must be converted and
incompatible data is often lost.
 XML data is stored in text format. This makes it easier to
expand or upgrade to new operating systems, or new
applications, without losing data.
 XML increases data availability
 With XML, data can be available to all kinds of "reading
machines" (Handheld computers, voice machines, etc), and
make it more available for blind people, or people with other
disabilities.
XML-related Format
XML-related Technologies
 DTD (Document Type Definition) and XML Schemas are used
to define legal XML tags and their attributes for particular
purposes

 CSS (Cascading Style Sheets) describes how to display HTML


or XML in a browser

 XSLT (eXtensible Stylesheet Language Transformations) and


XPath are used to translate from one form of XML to another

 DOM (Document Object Model), SAX (Simple API for XML,


and JAXP (Java API for XML Processing) are all APIs for
XML parsing

You might also like