You are on page 1of 19

VICTORIA UNIVERSITY OF WELLINGTON

Te Whare Wananga o te Upoko o te Ika a Maui

An Introduction to
XML
Lecturer : Dr. Pavle Mogin
COMP 442
Issues in Databases and
Information Systems
A General Plan for the XML Topic

• First:
– Why XML at all
• Next:
– What is XML and related meta languages - DTD and Schema
– XPath and XQuery XML query languages
– Because, without precise knowledge of XML data model, you
cannot consider storage, retrieval, and update techniques of
XML databases
• Finally:
– How to store, constrain, query, and update XML data bases

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 1
Plan for Intro into XML Databases

• Generally to answer the question “Why XML”


• Particularly:
– Motives to introduce databases on Internet
– Database applications on Web
– New challenges
– Why is XML superior to HTML and SGML
• XML Related Technologies
– Reading:
• Ramakrishnan, Gherke: Database management Systems,
Chapter 7 and Chapter 27
• W3C Recommendations at http://www.w3c.org/TR/

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 2
Motives for Databases on the Internet
• Internet and corporate intranets offer services like:
– Purchasing books online,
– Online auctions,
– Online submission of bids,
– Distant learning
• These pose new challenges on DBMS:
1. Large number of concurrent users (scalability),
2. Storing and handling unstructured and semistructured
documents
3. Ranked keyword search
• The first generation of Internet sites were collections
of HTML files and these proved to be inadequate
• Modern electronic commerce sites rely on database
systems
COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 3
New Requirements
• On-Line Enterprise Reporting (OLER) systems, also
known as Enterprise Information Portals (EIP) provide
a single point entry to integrated corporate data both
to employees, clients and partners
• These users pose a number of new requirements
against database systems like:
– Categorization,
– Personalization,
– Publishing,
– Collaboration, and
– Notification
• For all these new requirements XML with its related
technologies offer a right answer

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 4
A Classification of e-Businesses

• Business to Customer:
– online shops,
– online banking,…
• Business to Business:
– online bidding,
– online ordering,…
• Business to Administration:
– online tax payment,
– online assurance,…
• Customer to Administration:
– online libraries,
– online car registration,…

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 5
Business to Business

• The volume of business to business Internet


transactions is approximately 100 times greater than
business to customer
• Until recently,
– proprietary form layouts, contents, and data formats
were seriously hindering its growth
• Mainly because a considerable effort was needed to
process documents sent by another party using HTML
• The solution is found in the use of eXtensible Markup
Language (XML)

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 6
So, Why is XML Better

• There are three reasons why is XML transforming the


software industry at a breakneck speed

– XML is an open and flexible standard,

– XML is easy to understand and learn, and

– It is driven by the World Wide Consortium (W3C)

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 7
Open and Free Standard

• XML defines an open and flexible standard for:


– Defining new languages,
– Publishing, and
– Exchanging
any kind of information
• This frees business information from proprietary data
formats (imposed by software vendors) and renders it
readable and understandable by both people and
computers
• Users do not need to wait a vendor to introduce a new
markup they need – they simple define it by
themselves

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 8
Easy to Understand

• A valid sample of XML code:


<price>
<currency>NZD</currency>
<amount>99.95</amount>
</price>
• Content is marked up with tags, which describe the content
• Contrast to the value ’99.95’ in a relational database field
– Percentage?
– Price?
– Speed limit?

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 9
An HTML Expression

• What is the meaning of the following HTML


expression:
<B>Seal</B>

a) A rubber gasket?
b) A rock artist?
c) A marine mammal?
d) An official stamp?
e) An elite group of the US Navy command?

• The only certain thing is:


– It should be displayed in bold letters
COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 10
Origins
• Completed in early 1998 by W3C (not by a particular
software vendor)
– The same organization that is setting the overall directions for
the Web
• Initial proposal was based on already existing
Standard Generalized Markup Language (SGML)
• SGML specification: 500 pages
• Initial XML specification: 26 pages
• XML consists of rules and conventions that allow:
– Creation of an own corporate or industry standard markup
language
– Including data structuring and semantic rules into the new
language,
– Easy understanding of communicating parties
COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 11
Related Technologies

• Style languages
– Cascading Style Sheets (CSS),
– Extensible Style Language (XSL)
• Supplemental Technologies:
– XLinks,
– XPointers,
– Namespaces, and
– Resource Description Framework
• We shall briefly discuss each of them

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 12
Style Languages
• XML markup only specifies what is in a document
• Unlike HTML it does not say anything about the
presentation
• Information about XML’s document appearance when
printed, or viewed in a Web browser is stored in a
style sheet document
• Different style sheet documents may accompany the
same XML document
• So, you can change the appearance of an XML
document by choosing another style sheet
• Two style sheet languages in broadest use are:
– Cascading Style Sheets (CSS), and
– Extensible Style Language (XSL)
COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 13
Cascading Style Sheet (CSS)

• CSS is a simple language originally designed for use


with HTML
• It supplies fixed style rules to the content of XML
elements and attributes
• It provides basic information about:
– Fonts,
– Color,
– Positioning,
– Text properties
• It is well supported by Web browsers for both HTML
and XML

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 14
eXtensible Style Language (XSL)

• XSL is a more complex and powerful style language


than Cascading Style Sheet
• It can not only apply styles to the contents of XML
elements but can also rearrange elements, add some
text, and transform a document in an almost arbitrary
way
• XSL is divided in two parts:
– A transformation language for converting XML trees into
alternative trees, and
– A formatting language for specifying the appearance of the
elements of a XML tree

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 15
Supplemental Technologies
• XML based languages that layer on top of basic XML
represent supplemental technologies
• These are:
– XLinks, which provides multi-directional hypertext links that
are more powerful than the simple HTML <A> tag
– XPointers, which introduce a new syntax you can attach to
the end of a URL to link a particular part of a particular
document
– Namespaces, which use prefixes and URLs to disambiguate
conflicting XML markup
– Resource Description Framework (RDF) is an XML
application used to embed meta-data in XML and HTML
documents

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 16
Summary

• Modern e-business offered via Internet relies on


databases
• Up to recently HTML documents have been almost
exclusively exchanged between business parties via
Internet
• But HTML proved to be inadequate for these
purposes, since it embeds only presentation
information of a document, not structure and meaning
• eXtendible Markup Language is an emerging open
and free standard that will allow a seamless
document exchange between communicating parties

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 17
Summary

• XML has meta languages that enable people to build


their own markup and thus add meaning to their
documents
• XML is:
– Easy to learn since it is simple, and
– Easy to understand, because it carries data and meaning
together (it is self describing)
• XML is accompanied with a number of supporting
technologies:
– Style Languages to program appearance of an XML
document
– Powerful languages for interlinking XML documents and even
parts of them

COMP 442 Issues in Databases and Information Systems 2008 Intro to XML 18

You might also like