You are on page 1of 22

8/20/2014

Internet Technologies and Web


Services
Lecture 1 - XML

Aims of the lecture


• At the end of today’s lecture students should
be able to
– Understand what is XML and what it is used for.
– Differentiate between XML and XHTML.
– Write XML code.
– Determine if xml code is well-formed

1
8/20/2014

What is XML?
• XML stands for Extensible Markup Language
• XML is a meta-markup language.
– It is a language that lets you make up the tags you
need as you go along.
– HTML is also a markup language
• It is used to carry or store data
• XML is portable – can be used to represent a
photograph or a text file or both from the same
XML
• XML does not do anything

What is XML? (2)


• XML tags are not predefined. You must define
your own tags.
• How to ensure that the XML document is legal
in that case?
• Make use of a schema or DTD (old)
– used to define the legal building blocks of an XML
document.
– define the document structure with a list of legal
elements and attributes.

2
8/20/2014

XML vs HTML
• HTML describes presentation - HTML was
designed to display data and to focus on how
data looks.
• XML describes content – XML was designed to
describe data and to focus on what data is.
• XML files do not contain any formatting
information

How to present/format an xml file?


• An XSL style sheet is, like with CSS, a file that
describes how to display an XML document of
a given type.
• For example a PDA may render an XML
document differently from a wireless phone or
a pc.

3
8/20/2014

Writing XML files


• XML document typically end with .xml
• To view or modify an XML document, no
special software is required, any text editor
can be used although some editors have more
features
• To process an XML document, an XML parser
or XML processor is required

XML Parsers
• Parsers check an XML document’s syntax and
enable software programs to process the
mark-up data
• They can support the DOM or SAX (Simple API
for XML)
• All modern browsers have a built-in XML
parser.

4
8/20/2014

Why use XML?


1. Self-Describing data
• E.g. <name>Armadillo</name> v/s
<p>Armadillo</p>
2. Interchange of data among applications
• Non proprietary, portable, use any tool that
understands xml
3. Structured Data
• Specify the relations between elements (every
contact to have a phone number and an e-mail
address), data can even be rearranged on the fly.

Where to use XML?


• XML can be used to exchange data
– In the real world, computer systems and databases
contain data in incompatible formats. One of the most
time consuming challenges for developers has been to
exchange data between such systems over the
Internet.
– Converting the data to XML can greatly reduce this
complexity and create data that can be read by
different types of applications. XML provides a plain
text format for data and hence is compatible with
different software and hardware.

10

5
8/20/2014

Where to use XML? (2)


• XML can be used to store data in files or
databases
– Can replace a database. (small scale applications)
– For real applications, XML makes sense as an
alternative to the database for representing data
structures that do not need to be searched,
broken apart, and recombined but it might not
make sense for data that is truly relational in
spirit.

11

Where to use XML? (3)


• XML can also store data inside HTML
documents
– XML data can also be stored inside HTML pages as
"Data Islands". An XML data island is XML data
embedded into an HTML page. This should be
avoided as XML “Data Islands” only works with
Internet Explorer browsers.

12

6
8/20/2014

feedback.xml – Example of an xml file


representing feedback from students
<?xml version="1.0" encoding="ISO88591"?>
<feedback>
<studentemail>s1@uom.ac.mu</studentemail>
<modulename code="CSE2041">Web Tech
II</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</feedback>

13

Explanation of feedback.xml (1)


• The xml code in feedback.xml is self explanatory
• But it does not do anything – purely information
• The first line in the document - the XML declaration -
defines the XML version and the character encoding
used in the document. In this case the document
conforms to the 1.0 specification of XML and uses
the ISO-8859-1 (Latin-1/West European) character
set.

14

7
8/20/2014

XML declaration
<?xml version="1.0" encoding="ISO-8859-1"?>
<feedback>
<studentemail>s1@uom.ac.mu</studentemail>
<modulename code="CSE2041">Web Tech
II</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</feedback>

15

Explanation of feedback.xml (2)


• The next line describes the root element of the document –
only one per document, it has an opening and a closing tag at
the end

<?xml version="1.0" encoding="ISO-8859-1"?>


<feedback>
<studentemail>s1@uom.ac.mu</studentemail>
<modulename code="CSE2041">Web Tech
II</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</feedback>
16

8
8/20/2014

Explanation of feedback.xml (3)


• The root is followed by 4 child elements of the root
(studentemail, modulename, moduleyear and classsize)

<?xml version="1.0" encoding="ISO-8859-1"?>


<feedback>
<studentemail>s1@uom.ac.mu</studentemail>
<modulename code="CSE2041">Web Tech
II</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</feedback>
17

XML elements and attributes


• In the above example the modulename is also
providing information about the module code.
• Hence an element can have attributes.
• These are specified in the tag as in HTML
<modulename code="CSE2041">Web Tech
II</modulename>

18

9
8/20/2014

Extending the above example to contain more


than one feedback
• The feedback.xml example can be extended to
contain more than one feedback
• The tag <module> can be added to represent
feedback concerning various modules

19

feedback.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<feedback>
<module>
<studentemail>s1@uom.ac.mu</studentemail>
<modulename code="CSE2041">Web Tech II</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</module>
<module>
<studentemail>s3@uom.ac.mu</studentemail>
<modulename code="CSE1041">Web Tech I</modulename>
<moduleyear>2010</moduleyear>
<classsize>Adequate</classsize>
</module></feedback> 20

10
8/20/2014

XML Tree structure


• The elements in an XML document form a
document tree.
• All elements can have child and subchild
<root>
<child>
<subchild>.....</subchild>
</child>
</root>

21

Tree structure of the feedback.xml


example
Root element
<feedback>

Element
<module>

Element Element Element Element Attibute


<studentemail> <classsize> <moduleyear> <modulename> <code>

22

11
8/20/2014

XML Syntax

• All XML tags have a closing tags – cannot be omitted


• XML tags ARE case sensitive
• The tags must be properly nested
– <module><studentemail>
…</studentemail></module>
– <module><studentemail>
…</module></studentemail> - WRONG
• All XML documents must have a root element and
only one root element

23

XML Syntax (2)


• White spaces ARE preserved in XML
<title>Professional P HP 5</title>
will be displayed as
Professional P HP 5
• Whitespace characters before the XML
declaration is an error
• Comments in XML are similar to that of
HTML
<!-- This is a comment -->
• Attibutes must be quoted – single or double
<modulename code="CSE2041">
24

12
8/20/2014

25

Answer
• To be discussed in class.

26

13
8/20/2014

27

Answer
• To be discussed in class.

28

14
8/20/2014

Attributes
• Attributes are simple name/value pairs associated with an
element.
• They are attached to the start-tag, but not to the end-tag.
• Names are separated from values by an equals sign and
optional whitespace.
• Values are enclosed in single or double quotation marks.
• Attributes must have values—even if that value is just an
empty string (“”)!
• Order in which attributes are included on an element is
irrelevant.
• E.g. Element person having attribute born as 1912-06-23
<person born="1912-06-23">Alan Turing</person>

29

30

15
8/20/2014

Answer
• To be discussed in class.

31

XML validation
• An XML is well-formed if the syntax is correct
(as described above)
• An XML is valid if it has been validated against
a schema or a DTD

32

16
8/20/2014

Viewing XML documents in a browser


• The XML can be viewed in a browser
• Most browsers have XML viewers
• If there are no XSLT attached to the document,
a message saying so will be displayed and the
XML document will be displayed
• If the document is invalid, an error message
will be displayed indicating the location of the
error.

33

Comments in XML
• Comments start with the string <!— and end with
the string -->
• E.g
• <name nickname=’Shiny John’>
• <first>John</first>
• <!--John lost his middle name in a fire-->
• <middle></middle>
• <last>Doe</last>
• </name>
34

17
8/20/2014

Comments in XML (2)


• A comment cannot be inside a tag, so the
following is illegal:
• <middle></middle <!--John lost his middle
name in a fire--> >
• The double-dash string (--) cannot be used
inside a comment, so the following is also
illegal:
• <!--John lost his middle name -- in a fire-->

35

Empty Elements
• Sometimes an element has no PCDATA.
<middle> element contained no name in the example that follows:
<name nickname=’Shiny John’>
<first>John</first>
<!--John lost his middle name in a fire-->
<middle></middle>
<last>Doe</last>
</name>
• In this case, the element can also be written using the
special empty element syntax (this syntax is also called a
self-closing tag):
• Here the start-tag does not need a separate end-tag. In all
other cases, both open and close tags are required.

36

18
8/20/2014

Empty Elements(2)
• Recall from earlier that the only place there can
be a space within the tag is before the closing >.
• This rule is slightly different when it comes to
empty elements.
• The / and > characters always have to be
together, so an empty element can be created
like this:
• <middle /> or <middle/> - Both valid.
• but NOT like this
• <middle/ > or <middle / > - Both invalid.

37

38

19
8/20/2014

Answer
• To be discussed in class.

39

40

20
8/20/2014

Answer
• To be discussed in class.

41

Next Week
• Lecture will be on XSD and valid XML
documents

42

21
8/20/2014

References
• CSE2003Y and CSE2041 Lecture Notes
• W3schools website
• XML Unleashed book

43

Questions

44

22

You might also like