You are on page 1of 55

y

nl
O
se
Module 1

rU
te
en
C
Introduction to XML
h
ec
pt
rA
Fo
Module Overview

y
nl
O
In this module, you will learn about:

se
 Introduction to XML

rU
 Exploring XML

te
 Working with XML

en
 XML Syntax

C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 2 of 55


Lesson 1 – Introduction to XML

y
nl
O
In this first lesson, Introduction to XML, you will

se
learn to:

rU
 Outline the features of markup languages and list

their drawbacks.

te
 Define and describe XML.

en
 State the benefits and scope of XML.
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 3 of 55


Features of Markup Languages 1-3

y
nl
O
 Generalized Markup Language (GML) helps the

se
documents to be edited, formatted, and searched by

rU
different programs using its content-based tags.

te
en
 Standard Generalized Markup Language (SGML) is a
coding scheme for developing specialized markup
languages. C
h
ec
pt

 Hyper Text Markup Language (HTML) is used for


rA

sharing information by using hyperlinked text


documents.
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 4 of 55


Features of Markup Languages 2-3

y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 5 of 55


Features of Markup Languages 3-3

y
nl
Features

O
 GML describes the document in terms of its format,

se
structure and other properties.

rU
 SGML ensures that system can represent the data

te
in its own way.

en
 HTML used ASCII text, which allows the user to

C
use any text editor.
h
ec

Drawbacks
pt

 GML and SGML were not suited for data


rA

interchange over the web.


 HTML instructions is used to display content rather
Fo

than content they encompass.


@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 6 of 55
Evolution of XML 1-2

y
nl
O
 XML is a W3C recommendation.

se
rU
 It is a set of rules for defining semantic tags that
break a document into parts.

te
en

C
XML was developed over HTML.
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 7 of 55


Evolution of XML 2-2

y
nl
O
HTML XML

se
HTML was designed to display XML was designed to carry

rU
data. data.
HTML displays data and XML describes data and

te
focuses on how data looks. focuses on what data is.

en
HTML displays information. XML describes information.
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 8 of 55


Features of XML

y
nl
O
 XML stands for Extensible Markup Language

se
 XML is a markup language much like HTML

rU
 XML was designed to describe data

te
XML tags are not predefined

en

XML uses a Document Type Definition (DTD)



C
or an XML Schema to describe data
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 9 of 55


XML Markup 1-2

y
nl
O
 XML markup defines the physical and logical layout of

se
the document.

rU
XML markup divides a document into separate

te

information containers called elements.

en
C
A document consists of one outermost element,
h

ec

called root element that contains all the other


elements, plus some optional administrative
pt
rA

information at the top, known as XML declaration.


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 10 of 55


XML Markup 2-2

y
nl
O
Code Snippet

se
<?xml version=”1.0” encoding=”iso-8859-1” ?>

rU
- <FlowerPlanet>
<Name>Rose</Name>

te
<Price>$1</Price>

en
<Description>Red in color</Description>
<Number>700</Number>

C
</FlowerPlanet> h
where,
ec

<Name>, <Price>,<Description> and <Number> tags are


elements
pt

<FlowerPlanet> and </FlowerPlanet> are the root elements


rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 11 of 55


Benefits of XML

y
nl
Data independence - separates the content from its presentation

O

se
 Easier to parse - absence of formatting instructions makes it easy to

rU
parse

te
 Reducing Server Load - semantic and structural information enables it

en
to be manipulated by any application, can now be performed by clients

processing tools C
Easier to create – can easily create with the most primitive text
h
ec

 Web Site Content – can be transformed into a number of other formats


pt
rA

 Remote Procedure Calls – protocol that allows objects on one computer


to call objects on another computer
Fo

 E-Commerce - can be used as an exchange format


@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 12 of 55
Lesson 2 – Exploring XML

y
nl
O
In this second lesson, Exploring XML, you will learn

se
to:

rU
 Describe the structure of an XML document.

 Explain the lifecycle of an XML document.

te
en
 State the functions of editors for XML and list the

popularly used editors.


C
 State the functions of parsers for XML and list names
h
ec

of commonly used parsers.


pt

 State the functions of browsers for XML and list the


rA

commonly used browsers.


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 13 of 55


XML Document Structure 1-4

y
nl
O
 The two sections of an XML document are:

se
 Document Prolog

rU
 Root Element

te
 An XML document consists of a set of unambiguously

en
named "entities".

C
Every document starts with a "root" or document
entity. All other entities are optional.
h
ec

 A single entity name can represent a large amount of


pt

text. The alias name is used each time some text is


referenced and the processor expands the contents
rA

of the alias.
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 14 of 55


XML Document Structure 2-4

y
nl
O
Document Prolog

se
 Document prolog contains metadata and consists of two parts -
XML Declaration and Document Type Declaration.

rU
 XML Declaration specifies the version of XML being used.

te
 Document Type Declaration defines entities’ or attributes’ values

en
and checks grammar and vocabulary of markup.

Root Element C
h
Also called a document element.
ec

 It must contain all the other elements and content in the


pt

document.
rA

 An XML element has a start tag and end tag.


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 15 of 55


XML Document Structure 3-4

y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 16 of 55


XML Document Structure 4-4

y
nl
O
Code Snippet

se
<?xml version=”1.0” encoding=”iso-8859-1”?>

rU
<!DOCTYPE Music_Library [
<!ELEMENT Music_Library (Title,Artist,Country,Price,Year)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Artist (#PCDATA)>

te
<!ELEMENT Country (#PCDATA)>
<!ELEMENT Price (#PCDATA)>

en
<!ELEMENT Year (#PCDATA)>
<!ENTITY MS “Thatz life”>

C
]>
<Music_Library>
<Title>YLO</Title>
h
<Artist>&MS; - Jenny Dan</Artist>
ec

<Country>Germanyo</Country>
<Price>$12</Price>
pt

<Year>2002</Year>
</Music_Library>
rA

where,
The first block indicates xml declaration and document type declaration. Music_Library is
Fo

the root element.

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 17 of 55


Logical Structure 1-2

y
nl
The logical structure gives information about the elements and the

O

order in which they are to be included in the document.

se
It shows how a document is constructed rather than what it contains.

rU

Document Prolog forms the basis of the logical structure of the XML

te

document.

en
XML Declaration and Document Type Definition are its components.
C
 h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 18 of 55


Logical Structure 2-2

y
nl
O
Code Snippet

se
<?xml version=”1.0” encoding=”iso-8859-1” ?>

rU
te
Code Snippet

en
<!DOCTYPE Music_Library SYSTEM
“Mlibrary.dtd”>
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 19 of 55


XML Document Life Cycle

y
nl
O
1. XML document creation

se
2. Scanning

rU
3. Parsing

te
4. Access

en
5. Conversion into Application program
6. ModificationC
h
7. Serialization
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 20 of 55


Editors

y
nl
O
Main Functions:
Add opening and closing tags to the code

se

 Check for validity of XML

rU
 Verify XML against a DTD/Schema
Perform series of transforms over a document

te

 Color the XML syntax

en
 Display the line numbers

Complete the word


C
Present the content and hide the code
h

ec

Some popularly used editors are:


XMLwriter
pt

XML Spy
rA

 XML Pro
Fo

 XMLmind
 XMetal
@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 21 of 55
Parsers 1-2

y
nl
An XML parser:

O

Reads the document

se

 Verifies it for its well-formedness

rU
 After verification, converts it into a tree of elements

te
 Commonly used parsers:

en
 Crimson, Xerces, Oracle XML Parser, JAXP, MSXML

C
h
 Type of parsers:
ec

 Non Validating parser


pt

 Validating parser
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 22 of 55


Parsers 2-2

y
nl
O
Non Validating parser

se
 It checks the well-formedness of the document.

rU
 Reads the document and checks for its conformity

te
with XML standards.

en
Validating parser C
h
 It checks the validity of the document using DTD.
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 23 of 55


Browsers

y
nl
O
 Web browsers can format XML data and display it to

se
the user.

rU
 Other programs like database, Musical Instrument
Digital Interface (MIDI) program or a spreadsheet

te
program may present XML data accordingly.

en
C
Commonly used web browsers:
h

ec

 Netscape, Mozilla, Internet Explorer, Firefox, Opera


pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 24 of 55


Lesson 3 – Working with XML

y
nl
O
In this third lesson, Working with XML, you will learn

se
to:

rU
 Explain the steps towards building an XML document.

 Define what is meant by well-formed XML.

te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 25 of 55


Creating an XML document

y
nl
O
 An XML document has three main components:

se
 Tags(markup) and text(content)

rU
 DTD or Schema
Formatting or display specifications

te

en
The steps to build an XML document are as follows:
C

1. Create an XML document in an editor


h
2. Save the XML document
ec

3. Load the XML document in a browser


pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 26 of 55


Exploring the XML document 1-3

y
nl
O
 Various building blocks of an XML document:

se
 XML Version Declaration

rU
 Document Type Definition

te
 Document instance in which the content is defined

en
by the mark up
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 27 of 55


Exploring the XML document 2-3

y
nl
O
se
rU
te
Characters are encoded using
various encoding formats. The

en
character encoding is declared in
encoding declaration.

C XML declaration tells the version of


h
XML, the type of character
encoding being used and the
ec

markup declaration used in the


XML document.
pt
rA

XML declaration should start with


the five character string, <?xml. It
indicates that the document is an
Fo

XML document.

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 28 of 55


Exploring the XML document 3-3

y
nl
O
se
rU
Indicates the presence of external

te
markup declarations. “Yes”
indicates that there are no

en
external markup declarations and
“no” indicate that external markup

C
declarations might exist.

Declares and defines the elements


h
used in the document class.
ec
pt

Defines the content of the XML


document.
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 29 of 55


Meaning in Markup

y
nl
O
 Structure

se
 Semantic

rU
 Style

te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 30 of 55


Well-formed XML document 1-3

y
nl
O
 XML tags should be valid

se
 Length of the tags depend on the XML

rU
processors

te
 XML attributes should be valid

en
 XML documents should be verified
C
Minimum of one element is required
h

ec

 XML tags are case sensitive


pt

 Every start tag should end with end tag


rA

 XML tags should be nested properly


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 31 of 55


Well-formed XML document 2-3

y
nl
O
Code Snippet

se
rU
<Book>
<Name>Good XML</Name>

te
<Cost>$20</Cost>
</Book>

en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 32 of 55


Well-formed XML document 3-3

y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 33 of 55


Lesson 4 – XML Syntax

y
nl
O
In this last lesson, XML Syntax, you will learn to:

se
 State and describe the use of comments and

rU
processing instructions in XML.
 Classify character data that is written between tags.

te
en
 Describe entities, DOCTYPE declarations and

attributes.
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 34 of 55


Comments 1-3

y
nl
O
 Used for the people to give information about the code

se
 Usually made for the content

rU
te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 35 of 55


Comments 2-3

y
nl
O
 Rules:

se
 Comments should not include “-” or “

rU
 Comments should not be placed within a tag or

entity declaration

te
en
 Comments should not be placed before the XML

declaration
C
 Comments can be used to comment the tag sets
h
ec

 Comment should not be nested


pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 36 of 55


Comments 3-3

y
nl
O
Code Snippet

se
rU
<Name NickName=’John’>
<First>John</First>

te
<!--John is yet to pay the term fees-->
<Last>Brown</Last> <Semester>Final</Semester>

en
</Name>

C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 37 of 55


Processing Instructions 1-2

y
nl
O
 The main objective is to present some special

se
instructions to the application.

rU
 All the processing instructions should begin with an
identifier called target.

te
en
Syntax

<?PITarget <instruction>?>
C
h
ec

where,
pt
rA

PITarget is the name of the application that should receive the processing
instructions.
Fo

<instruction> is the instruction for the application

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 38 of 55


Processing Instructions 2-2

y
nl
O
Code Snippet

se
rU
<Name NickName=’John’>
<First>John</First>
<!--John is yet to pay the term fees-->

te
<Last>Brown</Last>

en
<?feesprocessor SELECT fees FROM
STUDENTFEES?>

C
<Semester>Final</Semester>
</Name>
h
ec

where,
pt

feesprocessor is the name of the application that receives the


rA

processing instruction

SELECT fees FROM STUDENTFEES is the instruction.


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 39 of 55


Classification of character data

y
nl
O
 Character data describes the document’s actual

se
content with the white space.

rU
 The character data can be classified into:

te
 Character Data (CDATA)

en
 Parsed Character Data (PCDATA)

C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 40 of 55


PCDATA 1-2

y
nl
O
 The data that is parsed by the parser is called as

se
PCDATA.

rU
 The main objective is to present some special
instructions to the application.

te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 41 of 55


PCDATA 2-2

y
nl
O
se
Code Snippet

rU
<Name nickname=’John’>

te
<First>John</First>

en
<!--John is yet to pay the term fees-->
<Last>Brown</Last>

C
<Semester>Final> 10 & <20</Semester>
</Name>
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 42 of 55


CDATA

y
nl
O
 CDATA part begins with a “<![CDATA[“ and ends

se
with “]]>”.

rU
 CDATA sections cannot be nested.

te
en
Code Snippet

C
<?xml version=”1.0” standalone=”yes”?>
h
<Svg>
ec
<Desc>Three shapes</Desc>
<Para>But the formula was <![CDATA[if (&x1 +
pt

&x2)]]> which
resulted in 7.</Para>
rA

</Svg>>
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 43 of 55


Entities 1-3

y
nl
O
 XML document is made up of large amount of information

se
called as entities.

rU
 Every entity consists a name and a value.

te
 Entity reference consists of an ampersand (&), the entity

en
name, and a semicolon (;).

C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 44 of 55


Entities 2-3

y
nl
O
Predefined Description Output

se
Entity

rU
&lt; produces the left angle <
bracket

te
&gt; produces the right angle >

en
bracket
&amp; C
produces the ampersand &
h
&apos; produces a single quote ‘
ec

character
pt

&quot; produces a double quote “


rA

character
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 45 of 55


Entities 3-3

y
nl
O
Code Snippet

se
<?xml version=”1.0”?>

rU
<!DOCTYPE Letter [
<!ENTITY address “15 Downing St Floor 1”>
<!ENTITY city “New York”>

te
]>

en
<Letter>
<To>&quot;Tom Smith&quot;</To>

C
<Address>&address;</Address>
<City>&city;</City>
h
<Body>
ec

Hi! How are you?


The sum is &gt; $1000
pt

</Body>
rA

<From>ARNOLD</From>
</Letter>
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 46 of 55


Entity Categories 1-3

y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 47 of 55


Entity Categories 2-3

y
nl
O
 General Entity

se
These entities are used within the document

rU

content.

te
Code Snippet

en
C
<<!ENTITY % ADDRESS “text that is to be
represented by an entity”>
h
ec

A well-formed parameter entity will look like a general entity, except that it will
include the “%” specifier.
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 48 of 55


Entity Categories 3-3

y
nl
O
 Parameter Entity

se
These entities are used only in the DTD.

rU

te
Code Snippet

en
C
<!DOCTYPE MusicCollection [
<!ENTITY R “Rock”>
h
<!ENTITY S “Soft”>
ec

<!ENTITY RA “Rap”>
<!ENTITY HH “Hiphop”>
pt

<!ENTITY F “Folk”>
rA

]>
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 49 of 55


DOCTYPE declarations 1-3

y
nl
O
 DTD File:

se
rU
Syntax

te
en
<! DOCTYPE name_of_root_element
SYSTEM “URL of the external DTD subset” [

]> C
Internal DTD subset
h
ec

where,
pt

name_of_root_element is the name of the root element


rA

SYSTEM is the url where the DTD is located


[Internal DTD subset] are declarations in the document
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 50 of 55


DOCTYPE declarations 2-3

y
nl
O
Code Snippet

se
rU
<?xml version=”1.0” encoding=”ISO-8859-1”?>
<!DOCTYPE program SYSTEM “HelloJimmy.dtd”>

te
<Program>

en
<Comments>
This is a simple Java Program. It will display

C
the message “Hello Jimmy,How are you?” on
execution.
h
</Comments>
ec

<Code> public static void main(String[] args) {


System.out.println(“Hello Jimmy,How are you?”);
pt

// Display the string.


}
rA

}
</Code>
Fo

</Program>

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 51 of 55


DOCTYPE declarations 3-3

y
nl
O
 Document Type Definition defines the elements in the document

se
rU
<!ELEMENT Program (comments, code)>
<!ELEMENT Comments (#PCDATA)>

te
<!ELEMENT Code (#PCDATA)>

en
Output:
C
 h
ec
pt
rA
Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 52 of 55


Attributes

y
nl
O
 Attributes are case sensitive and must start with a

se
letter or underscore.

rU
te
Syntax

en
<elementName attName1=”attValue2”

C
attName2=”attValue2”...>
h
Code Snippet
ec

<?xml version=”1.0” ?>


pt

<Player Sex=”male”>
rA

<FirstName>Tom</FirstName>
<LastName>Federer</LastName>
Fo

</Player>

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 53 of 55


Summary 1-2

y
nl
O
 Introduction to XML

se
 XML was developed to overcome the drawbacks of earlier
markup languages.

rU
 XML consists of set of rules that describe the content to
be displayed in the document.

te
 XML markup contains the content in the information

en
containers called as elements.

Exploring XML
C
h

ec

 XML is divided into two parts namely, document prolog


and root element.
pt

 An XML editor creates the XML document and the parser


rA

validates the document.


Fo

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 54 of 55


Summary 2-2

y
nl
O
 Working with XML

se
 XML document is divided into XML Version Declaration,
DTD and the document instance in which the markup

rU
defines the content.
 The XML markup is again categorized into structural,

te
semantic and stylistic.
 The output of the XML document is displayed in the

en
browser if it is well formed.

XML Syntax C
h

 Comments are used in the document to give information


ec

about the line or block of code.


pt

 The content in XML document is divided into markup and


character data.
rA

 The entities in XML are divided into general entities and


parameter entities.
Fo

 A DTD can be declared either internally or externally.

@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 55 of 55

You might also like