You are on page 1of 16

XML

Introduction to XML
From HTML to XML (eXtensible Markup Language)
 HTML describes the presentation of the content
<h1>Bibliography</h1>
<p><i>Foundations of Databases</i>
Abiteboul, Hull, and Vianu
<br>Addison Wesley, 1995.
 XML describes only the content
<bibliography>
<book>
<title>Foundations of Databases</title>
<author>Abiteboul</author>
<publisher>Addison Wesley</publisher>
<year>1995</year>
</book>.
</bibliography>
 Separation of content from presentation simplifies content extraction
and allows the same content to be presented easily in different looks
eXtensible Markup Language
 XML stands for “eXtensible Markup Language”
 Unlike Java, it is not a programming language
 Instructions, syntactic/semantic rules, control the behavior

 It is a markup language: that is, it is a way of annotating text,


like HTML
 this text is <big>large</big>
 but it is extensible: that is, the user can define their own tags
 XML is used to describe the structure of a document and not the
way that it is presented
 There is a great deal of interest in XML
 because it offers a new way of storing and transmitting
information
Features of XML
 Portability: Just like HTML, you can ship XML data
across any platforms
 Relational data requires heavy-weight protocols,
e.g.,JDBC
 Flexibility: You can represent any information
(structured, semi-structured, documents, .)
 Relational data is best suited for structured data
 Extensibility: Since data describes itself, you can
change the schema easily
 Relational schema is rigid and difficult to change
 Areas in which XML appear to be potential:
 Structuring data for storage where a relational

database is inappropriate
 Structuring data for presentation on Web pages
Example:Simple XML
<?xml version=”1.0”?>
<recipes>
<category type=”loaf”>
<name>Basic Farmhouse</name>
<ingredient></ingredient>
<cooking>
<time></time>
<setting></setting>
</cooking>
<serves></serves>
<instructions>
<item></item>
</instructions>
</category>
</recipes>
Components of an XML Document
An XML document is composed of a number of
Components that can be used for representing
information in a hierarchical order. They are
 Processing instruction
 Tags
 Elements
 Content
 Attributes
 Entities
 Comments
 Comments start with <!-- and end with -->
<!-- This is a comment.
The rules are:
no double hyphens (except in closing --> )
comments can appear anywhere, except inside
a tag or another comment -->
 A processing-instruction is for a program that may
read the document(used to control applications). It is
written like this
<?target text ?>
 target is some label that will be recognised by the
program
 text is material that will be meaningful to the application
 <?xml version=“1.0” ?>
 Tags
Tags are used to specify the name for a given piece of
information.
 Data is marked using tags.
 <Emp-name>Nick Shaw </Emp-name>
 Entity references
 An entity is a thing which is used as a part of the document but
which is not a simple elelment
 Because some characters have special meanings, you cannot
have them inside elements: e.g. it is wrong to use < or &
 Instead, we use entity references
 we represent the above as: &lt; and &amp;
 the other predefined entity references are:
 &gt; (the greater-than sign)
 &quot; (double-quote)

 &apos; (apostrophe i.e. single-quote


 Elements, attributes
 Elements are basic units that are used to identify and describe data

in XML.
 Elements are represented using tags.

 The start-tag + content + end-tag is called an element, e.g. the

phone element
 <phone>7424</phone>
 There can be empty elements. For example, “horizontal rule” <HR>

or “break” <BR> in HTML become <hr /> and <br /> in XHTML
 Elements can have one or more attributes.

 Attributes provide additional information about elements.

 Attributes can be either mandatory or optional.

 The start-tag can contain attributes: e.g. the href attribute of the

anchor tag in XHTML


 <a href = "http://www.cs.stir.ac.uk/">
Department home
</a>
 all attribute values must be quoted (single or double)
 Content
 The information that is represented by the elements of
XML document is referred to as the content of that
element.
 <phone>7424</phone>
 Here 7424 is the content of the phone element.
 The XML declaration
 We should start the document with the XML
declaration this looks like a processing-instruction and
must be the very first thing in the file (no whitespace)
 <?xml version="1.0" encoding="UTF-8" ?>
 This tells the parser (i.e. the web browser) that it is an
XML document and defines the character set used.
Well-formedness

 A document is well-formed if it obeys the rules of


XML
 The most important of these are:
 start-tags must have matching end-tags
 tags must not overlap
 there must be exactly one root element
 attributes must be quoted
 an element must not have two attributes with the same
name (e.g. two href= in an XHTML anchor tag)
 no comments or processing-instructions inside tags
 no raw < or & signs (use &lt; and &amp; instead)
Valid? Well-formed?

 A document can be well-formed but is not valid unless its


structure conforms to a declared specification.
 These specifications can be written as the Document Type
Definitions (DTDs) or in the W3C XML Schema Language
 A valid document must declare where its specification can be
found
 an application reading the document may choose to ignore it
 The specification can be either
 contained in the document (useful while debugging a DTD)
 in a separate file (probably accessed over the WWW)
 Valid Document : the structure of the XML document is said to
be valid if it conforms to a declared specification(DTD,XSD
A well-formed document
(with the XML declaration)
 This is a well-formed document :
<?xml version="1.0" encoding="UTF-8"?>
<!-- A simple example of xml-->
<staff>
<staffMember>
<name>Robert Clark</name>
<phone>7427</phone>
</staffMember>
<staffMember>
<name>David Cairns</name>
<phone>7445</phone>
</staffMember>
</staff>
 But in fact this document is not Valid.
 Valid Document : the structure of the XML document is said to
be valid if it conforms to a declared specification(DTD,XSD
Document Type Definition (DTD)

 XML has neither meaning nor context without a grammar against


which it can be validated.
 The grammar is called a Document Type Definition(DTD).
 Document Type Definition: A set of rules for constructing an
XML document.
 A DTD defines the structure of the content of an XML document.
 It specifies the elements that can be present in an XML
document, the attributes, and their arrangement with relation to
each other.
 It also specifies whether an element or an attribute is optional or
mandatory..
 XML documents that conform to a DTD are considered as Valid
documents.
 DTD can be classified into two types:
 In-line(Internal)
 In separate document(External)
Why are DTDs Important?
 In order to communicate, computers need to
agree to specific rules.
 By agreeing to standards, lots of
people/computers can share data.
 Once we can share data, we can create more
powerful applications.

You might also like