Developing Web Applications with XML

Lecture 1
Introduction to the unit Staff Structure Unit outline Introduction to XML Background XML markup Editors, Parsers and Browsers

300111 DWAX Staff
Unit Coordinator, Lecturer & Tutor
– Heidi Bjering
• • • • Room 1.04, Building 26 Telephone: (02) 4620 3162 Email: Consultation: Wednesday 10-11, other times by appointment.

Please use e-mail for general communication
– Use your UWS student e-mail account – Put DWAX in the subject line – Include your full name, student id and practical time/day in the body of the e-mail
DWAX 2010.1

DWAX 2010.1



vUWS will be used throughout semester for:
– Communication
• Ensure you check vUWS for updates at least twice a week • vUWS announcements will be used to broadcast updates and important news • Some announcements will appear on the main DWAX vUWS page

How is DWAX run?
– Lecture notes will be available on vUWS

Practical classes:
– discuss tutorial questions – work on the weekly practical exercises and assignments

– – – – –

Lecture notes Practical work Assignments Submission of Assignments and Practical Exercises DWAX staff contact details & consultation times
DWAX 2010.1 3

Text book:
– Carey,P.(2007). New Perspectives on XML Comprehensive. 2nd Ed. Thompson Learning
• A lot of practical work will be based on the textbook

DWAX 2010.1



Practical classes
Prac classes are held in TL28 & LR25, BLD17
– You will need a SCM account to use the computers in these labs – organise login THIS week (week 1)
• New Accounts: • Login details will be sent to your UWS email account

Continuous assessments 60% Final exam 40%
Continuous Assessment Task Weight Due Date & Time

– Websites for DWAX will be organised during prac classes

1 2 3

Practical Exercises (5) Assignment 1 (Practical Assignment) Assignment 2 (Practical Assignment) Total

20% 20% 20% 60%

Weekly Thursday 15th April 2010, 4pm Thursday 27th May 2010, 4pm

DWAX 2010.1


DWAX 2010.1


Introduction to XML
What is XML? eXtensible Markup Language a Markup Language that is Extensible Not owned or dominated by a single commercial interest, developed by W3C Subset of SGML (Standard Generalized Markup Language) Metalanguage to create other markup languages
• • •

History of XML
Very early versions of Markup languages were proprietary, specific and nonstandard such as defining a RTF document to be displayed in a specific format in word pad Generalized Markup Language (GML) was developed by Charles F. Goldfarb, Ed Mosher and Ray Lorie in 1969 Creation of Metalanguages that creates markup languages was the major achievement of the 1970’s. This was formalized as an international standard (ISO 8879) – The Standrad Generalized Markup Language (SGML) In 1989, Tim Berners-Lee created a proposal for a hypertext document system to be used within the CERN community and defined the HTML language, which was created and defined using SGML In 1998 a special W3C group headed by Jon Bosak from SUN began working on a simplified version of SGML and version 1 of XML was recommended by W3C in 1998

Markup – term applied to any set of codes or tags added to the contents of a document in order to indicate its meaning or presentation
DWAX 2010.1 7

DWAX 2010.1



History of XML ..contd
Simplified 80% of power with 20% complexity

History of SGML
SGML (Standardized Generalized Markup Language)
– Goal was data independence
• Documents transferable between environments without data loss


– Uses Style Sheets
• Format documents

Interoperability Formalisation


Pre-Markup or Proprietary markup languages
DWAX 2010.1

– Structured via document type definition (DTD) – Enables information interchange within and between some of the worlds largest companies. – Used in very large scale/long term applications such as aircraft maintenance information, government regulations etc. – SGML intro: – More information:
9 DWAX 2010.1 10

Extensible Markup Language - XML
– – – – – – Used for describing and structure data Makes documents human readable Makes documents computer manipulable Enables separation of content and its presentation Text based (anyone can create a XML document) Part of a family of technologies that allows for creation of applications – Enables data sharing between applications 1. 2. 3. 4. 5.

The 10 Primary XML Design Goals
XML must be easily usable over the Internet XML must support a wide variety of applications XML must be compatible with SGML It must be easy to write programs that process XML documents The number of optional features in XML must be kept small XML documents should be clear and easily understood The XML design should be prepared quickly The design of XML must be exact and concise XML documents must be easy to create Keeping an XML document size small is of minimal importance
DWAX 2010.1 12

6. 7. 8. 9. 10.

Source: DWAX 2010.1 11


Ten points to XML … W3C
1. XML is for structuring Data 2. XML looks a bit like HTML 3. XML is text 4. XML is verbose by design 5. XML is a family of technologies 6. XML is new, but not that new 7. XML leads HTML to XHTML 8. XML is modular 9. XML is the basis for RDF and the Semantic Web 10. XML is license-free, platform-independent and well-supported
Linking and pointing SMIL SVG Web and Traditional Publishing

XML Family
Other specifications compliments the XML capability by allowing linking, querying and transformation, which creates an XML Family. Family

Graphic and Multimedia

Remote calls and B2B




Wireless and Voice
Key Applications

XLink Xpointer Xpath


Style and Transformation

Underlying and Object Model

XML Info set DOM

XML 1.0 +Namespaces SAX

XML Schema

Programmatic Interfaces Source: DWAX 2010.1 13 DWAX 2010.1

Complex Data Modelling

Core XML Family


Why XML?
Limitations of HTML
– Each tag describes the function the text has in the document – Not designed for dealing with the content of a Web page – Platform specific formatting – threat to the interoperability and scalability of the Web – Need for a new, standardised, fully extensible, structurally strict language

Limitations of HTML • HTML file contains data, structure and the presentation • HTML was not designed with data in mind • Not extensible • Can be inconsistently applied – some browsers require all attribute values to be
enclosed within quotes whereas other browsers don’t

<html> <head> <title>CD collection</title> </head> <body> <h2>Kind of Blue</h2> <h3>Miles Davies</h3> <OL>Tracks <Li>Song 1</Li> <Li>Song 2</Li> <Li>Song 3</Li> <Li>Song 4</Li> </OL> </body> </html> 15

<html> <head> <title>Grocery Items</title> </head> <body> <h2>HiValue Foods</h2> <h3>Fresh Produce</h3> <OL>Products <Li>Apples</Li> <Li>Grapes</Li> <Li>Onions</Li> <Li>Mushrooms</Li> </OL> </body> </html> DWAX 2010.1 16

DWAX 2010.1


HTML and XML cont.
Comparison between XML and HTML • HTML file contains data, structure and the presentation – XML breaks these into three files - Data – XML file - Presentation – CSS or XSL - Structure – DTD or Schema • Tags in XML describes the data in the document. Tags in HTML describes the function the text has in the document. • In XML we can create custom tags and are able to create a vocabulary that defines a particular company or industry information.

Displaying an XML Document in a Web Browser
XML doesn’t do anything! How it looks in the editor

How it looks in the browser

DWAX 2010.1


DWAX 2010.1


Some Uses of XML
- Locally, XML can be used to • store configuration files, • attach meta-data to documents (information about the document) in a welldefined, extensible format that can be processed using widely available XML tools. XML can be used to create temporary documents - used by various entities in a company, or to exchange data between two incompatible databases in a neutral and auto-documented form. - XML documents can be published • on the Web, • in WML for wireless phones, • on paper….. Etc - A convenient use of XML is to • include new elements to an HTML document, like price or reference, that will be processed on the server side and rendered in HTML on the Web. - One of the most important uses of XML is in e-commerce, • a set of tags can be agreed upon by several companies doing business together. - XML can also be used as the basic format • to exchange messages between processes in distributed applications, allowing more flexibility in communications
DWAX 2010.1 19

World Wide Web Consortium (W3C)
Primary goals
– make the web universally accessible – Standardization

Process of recommendation
– – – – Working draft Candidate Recommendation Proposed Recommendation Final Recommendation

DWAX 2010.1



XML Standards
Versions XML version 1.0 was published in 1998 – W3C Recommendation Second edition of XML version 1.0 (not a new version) published in October 2000 Third edition of XML version 1.0 – W3C recommendation February 2004 XML version 1.1 – W3C Recommendation February 2004 Allows the use of the latest Unicode version See

XML documents
An XML document explained
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!--This Document contains weather information--> <?xml-stylesheet type="text/xsl" href="weather.xsl"?> <weather> <city id=”sydney”> <yesterday>Rain</yesterday > <today>Cloudy</today > <tomorrow>Sunny</tomorrow > </city > <city id=”Melbourne”> <yesterday>Sunny</yesterday > <today>Windy</today > <tomorrow>Rain</tomorrow > </city > <city id=”Brisbane”> <yesterday>Sunny</yesterday > <today>Rain</today > <tomorrow>Rain</tomorrow > </city > </weather>
21 DWAX 2010.1 Prolog of the document


Root element of the document Use of attributes within an element


Document elements



Child elements within an element

DWAX 2010.1


XML Markup
Elements Attributes Comments Characters
<?xml version=“1.0”?> <!-- This is a comment --> <book> <title isbn=“0-22-4444”>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book>
DWAX 2010.1 23

XML markup cont.
XML document:
– Must contain exactly one root element – Elements must be nested properly

XML elements:
– May or may not contain content
• Child elements, character data, etc

– All elements must have an end tag
- <img src=“img.gif”></img> - <img src=“img.gif”/>

DWAX 2010.1



XML markup cont.
– A feature or characteristic of an element. An attribute describes an element – Placed within the element’s start tag – Text strings – Values must be enclosed in quotes
<title isbn=“0-22-4444”>Web Applications</title>

XML markup cont.
Processing instructions (PI)
– – – – – Passed to application using the XML document Provides application-specific document information Delimited by <? And ?> Used for example to link to a stylesheet Will be discussed more in later lectures

DWAX 2010.1


DWAX 2010.1


Creating XML Documents
Structure of an XML document
• XML documents consist of three parts 1. The prolog 2. The document body 3. The epilog The prolog provides information about the document itself The document body contains the document’s content in a hierarchical tree structure. The epilog contains any final comments or processing instructions

Creating XML Documents cont
Creating the Prolog
The prolog consists of four parts in the following order: 1. XML declaration 2. Miscellaneous statements (processing instructions) or comments 3. Document type declaration 4. Miscellaneous statements (processing instructions) or comments This order has to be followed or the parser will generate an error message. All four parts are not required, but if included the order must be followed

1. 2.


DWAX 2010.1


DWAX 2010.1



Creating XML Documents cont.
XML declaration
• The XML declaration is always the first line of code in an XML document. It tells the processor that what follows is written using XML. It can also provide information about how the parser should interpret the code. • The complete syntax is:
<?xml version=“version number” encoding=“encoding type” standalone=“yes | no” ?> Simple Example:

Markup vs. Data
XML must differentiate between
– Markup text (Elements)
• Enclosed in angle brackets (< and >) – e.g,. Child elements

– Character data
• Text between start tag and end tag – Eg, John Doe in earlier example

<?xml version=“1.0”?>

DWAX 2010.1


DWAX 2010.1


XML Data

Whitespace Characters
Spaces, tabs, line feeds and carriage returns are considered whitespace characters Parsers Normalize the whitespace characters o Whitespace collapsed into single whitespace character o Sometimes whitespace removed entirely

Data in an XML document may consist of:
– – – – ASCII Character Set – Character Data Whitespace characters Entity References Unicode Characters - Enables computers to process characters for several languages – CDATA sections •

Eg: <markup>This is character data</markup> after normalization, becomes <markup>This is character data</markup>

DWAX 2010.1


DWAX 2010.1



Reserved Characters
XML-reserved characters
– – – – – Ampersand (&) Left-angle bracket (<) Right-angle bracket (>) Apostrophe (’) Double quote (”)

Entity References
Built-in entities
– – – – – – Ampersand (&amp;) Left-angle bracket (&lt;) Right-angle bracket (&gt;) Apostrophe (&apos;) Quotation mark (&quot;) Mark up characters “<>&” in element message:

Entity references
– Allow us to use XML-reserved characters
• Begin with ampersand (&) and end with semicolon (;)

– Prevents from misinterpreting character data as markup
DWAX 2010.1 33 DWAX 2010.1 34

Unicode Characters
• • Unicode Characters allows the use of various symbols such as ©, ®, ¶, £, etc. and having other languages Unicode Character information can be found at

CDATA Sections
A CDATA section is a block of text the XML processor will interpret only as text. Used for example to display some code in the text The syntax to create a CDATA section is: <! [CDATA [ Text Block ] ]>

DWAX 2010.1


DWAX 2010.1



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

<?xml version = "1.0"?>

Result of XML in browser
<!-- CDATA section containing C++ code --> <book title = "C++ How to Program" edition = "3"> <sample> // C++ comment if ( this-&gt;getX() &lt; 5 &amp;&amp; value[ 0 ] != 3 ) cerr &lt;&lt; this-&gt;displayError(); </sample>

<sample> <![CDATA[

// C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); ]]> </sample>

C++ How to Program by Deitel &amp; Deitel

24 </book>

DWAX 2010.1


DWAX 2010.1


Important things to remember about XML documents
• An XML document begin with the version definition line in the Prolog section
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Editors, Parsers and Browsers

• • • •

DTD & XSD – for Describing Structure XML – for Describing information XSLT – for Formatting the data HTML – for Displaying information

• • • • •

An XML file is saved with the extension ‘xml’ An XML document has only one root element which can contain many child elements, attributes, etc. Elements construct the XML document and attributes describes the elements Elements must be properly nested and attribute values must be within quotes Elements must be properly closed unlike in HTML <img src="picture.gif"> is invalid in XML it should be <img src="picture.gif“/> XML is case sensitive
DWAX 2010.1 39 DWAX 2010.1 40






Document Type Definition (DTD) or an XML Schema to describe the vocabulary

eXtensible Markup Language (XML) Data Description

eXtensible Style Sheets (XSL) Data Formatting

Data Display


• XML, DTD, XSD and XSLT are text files with different extensions just like HTML • Any editor that supports text files including Notepad for Windows to advance XMLSpy could be used to edit these files Some Available Editors:

• An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax • There are two types of Parsers Available -DOM (Document Object Model) - is a platform and language neutral
interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents

- XML DOM (in the parser) creates a tree structure of the XML document -SAX (Simple API for XML) – Event-based API; - Reads an XML document starting from the top, and creates various events that are passed to event handlers within the program - Fast processing, very little memory overhead
API - Application Programming Interface
DWAX 2010.1 41 DWAX 2010.1 42

Parsers … contd.
Microsoft’s parser is called MSXML and is built into IE versions 5.0 and above
– – – – – – – –

XML in Internet Explorer 5 and above
Viewing of XML documents Full support for W3C DTD standards XML embedded in HTML as Data Islands Binding XML data to HTML elements Formatting XML with XSL Formatting XML with CSS Support for CSS Behaviors Access to the XML DOM

DWAX 2010.1


DWAX 2010.1



Well-formed and Valid XML documents
An XML document is well-formed if it
– Contains no syntax errors and – Fulfills all of the specifications for XML code as defined by the W3C.


An XML document is valid if it is
– Well-formed AND – satisfies the rules laid out in the DTD or schema attached to the document
Most Browsers (Parser in the browser) can check the well-formedness but not the validity against a Schema (some works with DTD)
DWAX 2010.1 DWAX 2010.1



Formatting XML documents
No formatting:

Formatting XML documents
Using Cascading Style Sheets (CSS) To link a .css file to your xml file
– Add in a processing instruction to the prolog:
<?xml-stylesheet type=“text/css” href=“bookStyle.css”?>

DWAX 2010.1


DWAX 2010.1



Formatting XML documents

Formatting XML documents
Using the display attribute:
– Display: block

Note: bookstyle.css exists, but so far has no styles listed in it

DWAX 2010.1


DWAX 2010.1


Formatting XML documents

Formatting XML documents
Determine display type:
– Display an element’s content inline with the contents of other elements in the document:
display: inline

– Display an element’s content inline in a separate block:
display: block

– Hide an element’s content:
display: none

DWAX 2010.1


DWAX 2010.1



Formatting XML documents
Display the content of an element in a list:
display: list-item

Formatting XML documents
Other CSS formatting styles:
– – – – – – – Width, Height Position Font colour, Background colour Borders, Margins, Padding Font and Text attributes Alignment Adding Images

DWAX 2010.1


DWAX 2010.1


Formatting XML documents

Next Week
More XML Document Type Definition (DTD)

DWAX 2010.1


DWAX 2010.1



– –

Carey: Tutorial 1 Carey: Tutorial 5 – Working with Cascading Style Sheets “XML: We Ain’t Seen Nothin’ Yet”
– Found at:

DWAX 2010.1 57


Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times