Professional Documents
Culture Documents
Introduction to XML
From HTML to XML (eXtensible Markup Language)
HTML describes the presentation of the content
<h1>Bibliography</h1>
<p><i>Foundations of Databases</i>
Abiteboul, Hull, and Vianu
<br>Addison Wesley, 1995.
XML describes only the content
<bibliography>
<book>
<title>Foundations of Databases</title>
<author>Abiteboul</author>
<publisher>Addison Wesley</publisher>
<year>1995</year>
</book>.
</bibliography>
Separation of content from presentation simplifies content extraction
and allows the same content to be presented easily in different looks
eXtensible Markup Language
XML stands for “eXtensible Markup Language”
Unlike Java, it is not a programming language
Instructions, syntactic/semantic rules, control the behavior
database is inappropriate
Structuring data for presentation on Web pages
Example:Simple XML
<?xml version=”1.0”?>
<recipes>
<category type=”loaf”>
<name>Basic Farmhouse</name>
<ingredient></ingredient>
<cooking>
<time></time>
<setting></setting>
</cooking>
<serves></serves>
<instructions>
<item></item>
</instructions>
</category>
</recipes>
Components of an XML Document
An XML document is composed of a number of
Components that can be used for representing
information in a hierarchical order. They are
Processing instruction
Tags
Elements
Content
Attributes
Entities
Comments
Comments start with <!-- and end with -->
<!-- This is a comment.
The rules are:
no double hyphens (except in closing --> )
comments can appear anywhere, except inside
a tag or another comment -->
A processing-instruction is for a program that may
read the document(used to control applications). It is
written like this
<?target text ?>
target is some label that will be recognised by the
program
text is material that will be meaningful to the application
<?xml version=“1.0” ?>
Tags
Tags are used to specify the name for a given piece of
information.
Data is marked using tags.
<Emp-name>Nick Shaw </Emp-name>
Entity references
An entity is a thing which is used as a part of the document but
which is not a simple elelment
Because some characters have special meanings, you cannot
have them inside elements: e.g. it is wrong to use < or &
Instead, we use entity references
we represent the above as: < and &
the other predefined entity references are:
> (the greater-than sign)
" (double-quote)
in XML.
Elements are represented using tags.
phone element
<phone>7424</phone>
There can be empty elements. For example, “horizontal rule” <HR>
or “break” <BR> in HTML become <hr /> and <br /> in XHTML
Elements can have one or more attributes.
The start-tag can contain attributes: e.g. the href attribute of the