You are on page 1of 70

Introduction to XML

By A. SandanaKaruppan, AP/IT

UNIT I: INTRODUCTION TO XML


Session 1 to 8
1
COURSE OBJECTIVES

The student should be made to:


1. Learn XML fundamentals.
2. Be exposed to build applications based on XML.
3. Understand the key principles behind SOA.
4. Be familiar with the web services technology elements for realizing SOA.
5. Learn the various web service standards.

2
COURSE OUTCOMES

Upon successful completion of this course, students will be able to:


1. Build applications based on XML.
2. Develop web services using technology elements.
3. Build SOA-based applications for intra-enterprise and inter-enterprise applications.

3
OBJECTIVES

To introduce the XML.


To mark up data using XML.
To introduce the types of markup languages created with XML.
To introduce the relationships among DTDs, Schemas and XML.
To explain the concept of an XML namespace.
To introduce the Web services and related technologies.

4
OUTCOMES

Understand the XML.


Familiar of mark up data using XML.
Become familiar with the types of markup languages created with XML.
Understand the relationships among DTDs, Schemas and XML.
Understand the concept of an XML namespace.
Become familiar with Web services and related technologies.

5
PREREQUISITES

Advanced Computer Programming


Data Structures and Algorithms
Database Systems

6
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

7
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

8
1. INTRODUCTION

What is XML?
eXtensible Markup Language
A simplified version of SGML
Maintains the most useful parts of SGML
Designed so that SGML can be delivered over the Web
XHTML -- a reformulation of HTML 4 in XML 1.0

9
1. INTRODUCTION

XML (Extensible Markup Language)


Derived from Standard Generalized Markup Language (SGML)
Open technology for electronic data exchange and storage
Create other markup languages to describe data in structured manner
XML documents
Contain only data, not formatting instructions
Highly portable
XML parser
Support Document Object Model or Simple API XML
Document Type Definition (DTD, schema)
o XML document can reference another that defines proper structure
XML-based markup languages
XML vocabularies

10
1. INTRODUCTION

Hyper Text Markup Language (RFC 1866)


A small SGML application used on web (a DTD and a set of processing conventions)
Can only use a predefined set of tags

11
1. INTRODUCTION

Difference between XML and HTML


XML was designed to carry data, not displaying data
XML is not a replacement for HTML.
Different goals:
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying information, XML is about describing information.

12
1. INTRODUCTION

Why Is XML Important?


Plain Text
Easy to edit
Platform independent
Data Identification
Tell you what kind of data you have
Can be used in different ways by different applications
Easily Processed
Vendor-neutral standard

13
1. INTRODUCTION

Why Is XML Important?


Stylability
Inherently style-free
XSL---Extensible Stylesheet Language
Different XSL formats can then be used to display the same data in different ways
Inline Reusability
Can be composed from separate entities
Modularize your documents

14
1. INTRODUCTION

Why Is XML Important?


Linkability -- XLink and XPointer
Simple unidirectional hyperlinks
Two-way links
Multiple-target links
Expanding links
Hierarchical
Faster to access
Easier to rearrange

15
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

16
2. XML BUILDING BLOCKS

PI (Processing Instruction)
Tags
Elements
Content
Attributes
Entities
Comments

17
2.1 XML BUILDING BLOCKS--PROLOG

The part of an XML document that precedes the XML data


Includes
A declaration: version [, encoding, standalone]
An optional DTD (Document Type Definition )
Example
<?xml version=1.0 encoding=UTF-8 standalone=yes?>

18
2.2 TAGS

Tags are used to specify a name for a given piece of information.


A tag consists of opening and closing angular brackets (<>) that enclose the name of the tag.
Example
<EMP_NAME>Nick Shaw</EMP_NAME>

19
2.3 ELEMENTS

Elements are represented using tags.


An XML document must always have a root element.
General format:
<element> </element>
Empty element:
<empty-Element />
Example
<Authorname>John Smith</Authorname>

20
2.3 ELEMENTS

XML Elements are Extensible


XML documents can be extended to carry more information
XML Elements have Relationships
Elements are related as parents and children
Elements have Content
Elements can have different content types: element content, mixed content, simple content, or empty
content and attributes
XML elements must follow the naming rules

21
2.4 CONTENT

Content refers to the information represented by the elements of an XML document.


Character or data content
Element content
Combination or mixed content
Example
<BOOKNAME>The Painted House</BOOKNAME>

22
2.5 ATTRIBUTES

Located in the start tag of elements


Provide additional information about elements
Often provide information that is not a part of data
Must be enclosed in quotes
Should I use an element or an attribute?
metadata (data about data) should be stored as attributes, and that data itself should be stored as elements

23
2.6 ENTITIES

An entity is a name that is associated with a block of data..


Internal Entities: &lt; , &gt;
General Entities
General entities are declared in Document Type Definitions (DTD)
Example
<! ENTITY copyright http://www.ssn.edu.in/entities.dtd.">
Parameter Entities
Are only declared and used in DTDs
<!ENTITY % bool ("yes | no")>
<ATTLIST membership (%bool;)>

24
2.7 COMMENTS

Comments are statements used to explain the XML code.


Example
<!--PRODUCTDATA is the root element-->
The text contained within a comment entry cannot have two consecutive hyphens
<!--PRODUCTDATA is the -root element-->

25
2.8 XML SYNTAX

All XML elements must have a closing tag


XML tags are case sensitive
All XML elements must be properly nested
All XML documents must have a root tag
Attribute values must always be quoted
With XML, white space is preserved
With XML, a new line is always stored as LF

26
2.9 ANATOMY OF AN ELEMENT

Element type
Attribute
Element (character)
type
AttributeAttribut entity
name e reference
value
<ptype="rule">Useahyphen:
&#173;.</p>
Start-tag Content End-tag

Element

27
2.10 XML VALIDATION

"Well Formed" XML document


--correct XML syntax
"Valid" XML document
well formed
Conforms to the rules of a DTD (Document Type Definition)
XML DTD
defines the legal building blocks of an XML document
Can be inline in XML or as an external reference
XML Schema
an XML based alternative to DTD, more powerful
Support namespace and data types

28
2.11 DISPLAYING XML

XML documents do not carry information about how to display the data
We can add display information to XML with
CSS (Cascading Style Sheets)
XSL (eXtensible Stylesheet Language) --- preferred

29
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

30
3. STRUCTURING DATA

XML declaration
Value version
Indicates the XML version to which the document conforms
Root element
Element that encompasses every other elements
Container element
Any element contains other elements
Child elements
Elements inside a container element
Empty element flag
Does not contain any text
DTD documents
End with .dtd extension

31
3. STRUCTURING DATA

XML declaration
Value version
Indicates the XML version to which the document conforms
Root element
Element that encompasses every other elements
Container element
Any element contains other elements
Child elements
Elements inside a container element
Empty element flag
Does not contain any text
DTD documents
End with .dtd extension

32
3. STRUCTURING DATA - EXAMPLE

1. <?xml version = 1.0?>


2. <! students.xml -->
3. <college>
4. <collegeName> ssn </collegeName>
5. <dept> information technology</dept>
6. <class>
7. <name> vii a and b </name>
8. <batch> 2013 17 </batch>
9. </class>
10. <contact> good students </contact>
11. </college>

33
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

34
4. XML NAMESPACES

XML
Allows document authors to create custom elements
Naming collisions
XML namespace
Collection of element and attribute names
Uniform resource identifier (URI)
Uniquely identifies the namespace
A string of text for differentiating names
Any name except for reserved namespace xml
Directory
Root element and contains other elements

35
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

36
5. DOCUMENT TYPE DEFINITIONS (DTDS) AND SCHEMAS

Two types of documents for specifying XML document structure


Document Type Definition (DTDs)
Schemas

37
5.1 DOCUMENT TYPE DEFINITIONS (DTDS)

Enables XML parser to verify whether XML document is valid


Allow independent user groups to check structure and exchange data in standardized format
Expresses set of rules for structure using EBNF grammar
ELEMENT type declaration
Defines rules
ATTLIST attribute-list declaration
Defines an attribute

38
5.1.1 SIMPLE DTD EXAMPLES -

An Internal DTD An External DTD


1. <?xml version=1.0?> 1. <?xml version=1.0?>
2. <!DOCTYPE message [
2. <!DOCTYPE message SYSTEM
3. <!ELEMENT message (#PCDATA)>
message.dtd>
4. ]>
5. <message> 3. <message>
6. Let the good times roll! 4. Let the good times roll!
7. </message> 5. </message>
Listing 1
Listing 2

39
5.1.1 SIMPLE DTD EXAMPLES -

Document Not Valid According to Defined DTD


1. <?xml version=1.0?>
2. <!DOCTYPE message SYSTEM message.dtd>
3. <message>
4. <text>
5. Let the good times roll!
6. </text>
7. </message>
Listing 3

40
5.1.2 STRUCTURE OF A DOCUMENT TYPE DEFINITION

The syntax is as follows:


<!DOCTYPE rootelement SYSTEM | PUBLIC DTDlocation [ internalDTDelements ] >
The exclamation mark (!) is used to signify the beginning of the declaration.
DOCTYPE is the keyword used to denote this as a Document Type Definition.
rootelement is the name of the root element or document element of the XML document.
SYSTEM and PUBLIC are keywords used to designate that the DTD is contained in an
external document. Although the use of these keywords is optional, to reference an external
DTD you would have to use one or the other. The SYSTEM keyword is used in tandem with a
URL to locate the DTD. The PUBLIC keyword specifies some public location that will usually
be some application-specific resource reference.
internalDTDelements are internal DTD declarations.
These declarations will always be placed within opening ([) and closing (]) brackets.

41
5.1.2 STRUCTURE OF A DOCUMENT TYPE DEFINITION

It is possible for a Document Type Declaration to contain both an external DTD subset
and an internal DTD subset.
In this situation, the internal declarations take precedenceover the external ones.
In other words, if both the external and internal DTDs define arule for the same element, the
rule of the internal element will be the one used.
Consider the Document Type Declaration fragment shown in Listing 4.

42
5.1.3 DTD ELEMENTS

All elements in a valid XML document are defined with an element declaration in the DTD.
An element declaration defines the name and all allowed contents of an element.
Element names must start with a letter or an underscore and may contain any combination of
letters, numbers, underscores, dashes, and periods.
Element names must never start with the string xml. Colons should not be used in element
names because they are normally used to reference namespaces.
Each element in the DTD should be defined with the following syntax:
<!ELEMENT elementname rule >
ELEMENT is the tag name that specifies that this is an element definition.
elementname is the name of the element.
rule is the definition to which the elements data content must conform.

43
5.1.3 DTD ELEMENTS - EXAMPLE

44
5.1.3 DTD ELEMENTS - EXAMPLE

45
5.1.3 DTD ELEMENTS - EXAMPLE

The XML document in Listing 6 is a valid


document because it follows the rules laid
out in Listing 5 for contactlist.dtd.

46
5.1.3 DTD ELEMENTS - RULE

All data contained in an element must follow a set rule.


The rule is the definition to which the elements data content must conform.
There are two basic types of rules that elements must fall into.
The first type of rule deals with content.
The second type of rule deals with structure.
First, we will look at element rules that deal with content.
Content Rules
The content rules for .elements deal with the actual data that defined elements may contain.
These rules include the ANY rule, the EMPTY rule, and the #PCDATA rule.
An element may be defined. using the ANY rule and EMPTY rule.

47
5.2 THE LIMITATIONS OF DTDS

DTDs capture grammatical structure, but have some drawbacks:


Dont capture database datatypes domains
IDs arent a good implementation of keys
Why not?
No way of defining OO-like inheritance
Almost XML syntax inconvenient to build tools for them

48

48
5.3 W3C XML SCHEMA DOCUMENTS (XSD)

Schemas
Specify XML document structure
Do not use EBNF grammar
Use XML syntax
Can be manipulated like other XML documents
Require validating parsers
XML schemas
Schema vocabulary the W3C created
Recommendation
Schema valid
XML document that conforms to a schema document
Use .xsd extension

49
5.4 W3C XML SCHEMA DOCUMENTS

Root element schema


Contains elements that define the XML document structure
targetNamespace
Namespace of XML vocabulary the schema defines
element tag
Defines element to be included in XML document structure
name and type attributes
Specify elements name and data type respectively
Built-in simple types
date, int, double, time, etc

50
5.5 W3C XML SCHEMA DOCUMENTS

Two categories of data types


Simple types
Cannot contain attributes or child elements
Complex types
May contain attributes and child elements
complexType
Define complex type
Simple content
Cannot have child elements
Complex content
May have child elements

51
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

52
6. XML VOCABULARIES

W3C XML Schema


XSL (Extensible Stylesheet Language)
MathML (Mathematical Markup Language)
SVG (Scalable Vector Graphics)
WML (Wireless Markup Language)
XBRL (Extensible Business Reporting Language)
XUL (Extensible User Interface Language)
PDML (Product Data Markup Language)

53
6.1 MATHML

Describe mathematical notations and expressions


MathML markup
Content markup
Provides tags that embody mathematical concepts
Allows programmers to write mathematical notation specific to different areas of
mathematics
Distinguishes between different uses of same symbol
Presentation markup
Directed towards formatting and displaying mathematical notation

54
6.2 CHEMICAL MARKUP LANGUAGE (CML)

XML vocabulary for representing molecular and chemical information

55
6.3 MUSIC XML

Music distribution
Simplifies exchange of musical scores over Internet
Developed by Recordare
Mark up all type of music
DTD
Less powerful than Schema
Simpler to program
Relies heavily on elements rather than attributes

56
6.3 MUSIC XML

Fig. 20.15 MusicXML markup rendered by Finale 2003 (Courtesy of MakeMusic! Inc.).

57
6.4 RSS

RDF Site summary


Popular and simple XML format designed to share headlines and Web content between Web
sites
RSS file
RSS feed
Container rss element
Denotes the RSS version
Container channel elements
Descriptive tags
Item elements
Describe the news or information
title element
description element
link element

58
7.5 OTHER MARKUP LANGUAGES

59
AGENDA

1. Introduction
2. XML Building blocks
3. Structuring Data
4. XML Namespaces
5. Document Type Definitions (DTDs) and Schemas
6. XML Vocabularies
7. XML Applications

60
8. XML APPLICATION1SEPARATE DATA

XML can Separate Data from HTML


Store data in separate XML files
Using HTML for layout and display
Using Data Islands
Data Islands can be bound to HTML elements

Benefits:
Changes in the underlying data will not require any changes to your HTML

61
XML APPLICATION2EXCHANGE DATA

XML is used to Exchange Data


Text format
Software-independent, hardware-independent
Exchange data between incompatible systems, given that they agree on the same tag definition.
Can be read by many different types of applications

Benefits:
Reduce the complexity of interpreting data
Easier to expand and upgrade a system

62
XML APPLICATION3STORE DATA

XML can be used to Store Data


Plain text file
Store data in files or databases
Application can be written to store and retrieve information from the store
Other clients and applications can access your XML files as data sources

Benefits:
Accessible to more applications

63
XML APPLICATION4CREATE NEW LANGUAGE

XML can be used to Create new Languages


WML (Wireless Markup Language) used to markup Internet applications for handheld devices like mobile
phones (WAP)
MusicXML used to publishing musical scores
RSS.
MathML.

64
XML SUPPORT IN IE 5.0+

Internet Explorer 5.0 has the following XML support:


Viewing of XML documents
Full support for W3C DTD standards
XML embedded in HTML as Data Islands
Binding XML data to HTML elements
Transforming and displaying XML with XSL
Displaying XML with CSS
Access to the XML DOM (Document Object Model)

*Netscape 6.0 also have full XML support

65
MICROSOFT XML PARSER

Comes with IE 5.0


The parser features a language-neutral programming model that supports:
JavaScript, VBScript, Perl, VB, Java, C++ and more
W3C XML 1.0 and XML DOM
DTD and validation

66
9. JAVA APIS FOR XML

JAXP: Java API for XML Processing


JAXB: Java Architecture for XML Binding
JDOM: Java DOM
DOM4J: an alternative to JDOM
JAXM: Java API for XML Messaging (asynchronous)
JAX-RPC: Java API for XML-based Remote Process Communications (synchronous)
JAXR: Java API for XML Registries

67
SUMMARY

XML is a self-descriptive language


XML is a powerful language to describe structure data for web application
XML is currently applied in many fields
Many vendors already supports or will support XML

68
REFERENCES

www.xml.com
msdn.microsoft.com/xml/default.asp
www.oasis-open.org/cover/xml.html
www.gca.org/whats_xml/default.htm
www.xmlinfo.com
http://www.w3.org
Teach Yourself XML in 21 Days, 3rd Edition
Learning XML, 2nd Edition
Hongming Yu presentation.
XML tutorial http://www.w3schools.com/w3c/

69
QUESTIONS???

70