You are on page 1of 60

Chapter Three

XML

JU, JiT, Faculty of computing and Informatics


1
Overview
 XML stands for EXtensible Markup Language.

 XML was designed to store and transport data.

 XML was designed to be both human- and machine-readable.

 XML is a text-based markup language derived from Standard

Generalized Markup Language SGML.


 XML is a markup language much like HTML.

 XML tags identify the data and used to store and organize

data, rather than specifying how to display it like HTML tags


are used to display the data.
2
Overview
 XML is not going to replace HTML in the near but introduces

new possibilities by adopting many successful features of HTML.

What is Markup?
 Markup is information added to a document that enhances its

meaning in certain ways, in that it identifies the parts and how they
relate to each other.
 More specifically, a markup language is a set of symbols that can

be placed in the text of a document to demarcate and label the parts


of that document.

3
Example 1

<?xml version="1.0" encoding="UTF-8"?>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

4
Example 2
<?xml version="1.0" encoding="UTF-8"?>

<breakfast_menu>

<food>

<name>Belgian Waffles</name>

<price>$5.95</price>

<description>Our famous Belgian Waffles with plenty of real maple syrup</description>

<calories>650</calories>

</food>

<food>

<name>French Toast</name>

<price>$4.50</price>
5
<description>Thick slices made from our homemade sourdough bread</description>

<calories>600</calories>

</food>

<food>

<name>Homestyle Breakfast</name>

<price>$6.95</price>

<description>Two eggs, bacon or sausage, toast, and our ever-popular hash


browns</description>

<calories>950</calories>

</food>

</breakfast_menu>
6
 XML is a software and hardware-independent tool for storing and

transporting data.

What is XML?
 XML stands for EXtensible Markup Language

 XML is a markup language much like HTML

 XML was designed to store and transport data

 XML was designed to be self-descriptive

7
The Difference between XML and HTML
 XML Separates Data from Presentation: XML does not carry any

information about how to be displayed.

The same XML data can be used in many different presentation scenarios.
Because of this, with XML, there is a full separation between data and
presentation.

 In many HTML applications, XML is used to store or transport data,

while HTML is used to format and display the same data.

 There are three important characteristics of XML that make it useful in

a variety of systems and solutions:

8
The Difference between XML and HTML
1. XML is extensible: XML essentially allows you to create your own
language, or tags, that suit your application.

2. XML separates data from presentation: XML allows you to store


content with regard to how it will be presented.

3. XML is a public standard: XML was developed by an organization


called the W3C and available as an open standard.

9
XML Tree Structure
 XML documents are formed as element trees.

 An XML tree starts at a root element and branches from the root to child

elements.

 All elements can have sub elements (child elements):

<root>
  <child>
    <subchild>.....</subchild>

</child>

</root>
 The terms parent, child, and sibling are used to describe the relationships

between elements. 10
XML Tree

Parents have children. Children have parents. Siblings are children


on the same level (brothers and sisters). 11
XML Structure

 XML uses a much self-describing syntax.

 A prolog defines the XML version and the character encoding.

 The XML prolog is optional. If it exists, it must come first in the

document.

Example

<?xml version="1.0" encoding="UTF-8"?>

 The next line is the root element of the document:

<bookstore>

 The <book> elements have 4 child elements: <title>,< author>, <year>,

<price>. 12
Cont.
 The <book> elements have 4 child elements: <title>,< author>, <year>,

<price>.

<title lang="en">Everyday Italian</title>

<author>Giada De Laurentiis</author>

<year>2005</year>

<price>30.00</price>

 The next line ends the book element:

</book>

13
XML Syntax Rules
 The syntax rules of XML are very simple and logical. The rules are

easy to learn, and easy to use.

1. XML Documents Must Have a Root Element: XML documents


must contain one root element that is the parent of all other elements.

2. All XML Elements Must Have a Closing Tag

In HTML, some elements might work well, even with a missing closing
tag:

<p>This is a paragraph.

<br>

14
XML Syntax Rules
 In XML, it is illegal to omit the closing tag.

 All elements must have a closing tag:

<p>This is a paragraph.</p>

<br />

 The XML prolog does not have a closing tag. This is not an error. The

prolog is not a part of the XML document.

3. XML Tags are Case Sensitive

 XML tags are case sensitive. The tag <Letter> is different from the tag

<letter>.

 Opening and closing tags must be written with the same case: 15
XML Syntax Rules
<Message>This is incorrect</message>

<message>This is correct</message>

4. XML Elements Must be Properly Nested

 In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:

<b><i>This text is bold and italic</i></b>

 In the example above, "Properly nested" simply means that since the

<i> element is opened inside the <b> element, it must be closed inside
the <b> element. 16
XML Syntax Rules
5. XML Attribute Values Must be quoted In HTML, you might see
improperly nested elements:XML elements can have attributes in
name/value pairs just like in HTML. In XML, the attribute values
must always be quoted.
CORRECT:
INCORRECT:
<note date="12/11/2007">
<note date=12/11/2007>
<to>Tove</to>
<to>Tove</to>
<from>Jani</from>
<from>Jani</from>
</note>
</note>

The error in the first document is that the date attribute in the note element
is not quoted. 17
6. Entity References

 Some characters have a special meaning in XML. If you place a character

like "<" inside an XML element, it will generate an error because the
parser interprets it as the start of a new element.

 This will generate an XML error: <message>salary < 1000</message>

To avoid this error, replace the "<" character with an entity reference:
<message>salary &lt; 1000</message>

There are 5 pre-defined entity references in XML:


&lt; <  less than
&gt; >  greater than
&amp; & ampersand 
&apos; ' Apostrophe
&quot; " quotation mark

 Only < and & are strictly illegal in XML, but it is a good habit to replace > with &gt; as well.
18
Comments in XML

 The syntax for writing comments in XML is similar to that of HTML.

 <!-- This is a comment --> Two dashes in the middle of a comment are not

allowed.

<!-- This is a -- comment -->

 White-space is Preserved in XML

 XML does not truncate multiple white-spaces (HTML truncates multiple

white-spaces to one single white-space):

19
XML Elements

 An XML element is everything from (including) the element's start tag to

(including) the element's end tag.

Example <price>29.99</price>

An element can contain:

• text

• attributes

• other elements or a mix of the above

 An element with no content is said to be empty.

 In XML, you can indicate an empty element like this:

<element></element> 20
XML Naming Rules

XML elements must follow these naming rules:

 Element names are case-sensitive

 Element names must start with a letter or underscore

 Element names cannot start with the letters xml (or XML, or

Xml, etc)

 Element names can contain letters, digits, hyphens, underscores,

and periods

 Element names cannot contain spaces

 Any name can be used, no words are reserved (except xml).

21
Best Naming Practices

 Create descriptive names, like this: <person>, <firstname>, <lastname>.

 Create short and simple names, like this: <book_title> not like this:

<the_title_of_the_book>.

 Avoid "-". If you name something "first-name", some software may think

you want to subtract "name" from "first".

 Avoid ".". If you name something "first.name", some software may think

that "name" is a property of the object "first".

 Avoid ":". Colons are reserved for namespaces (more later).

 Non-English letters like éòá are perfectly legal in XML, but watch out for

problems if your software doesn't support them. 22


XML Attributes

 XML elements can have attributes, just like HTML. Attributes are

designed to contain data related to a specific element.

 Attribute values must always be quoted. Either single or double quotes

can be used.

 For a person's gender, the <person> element can be written like this:

<person gender="female">

23
XML Elements vs. Attributes XML Elements vs. Attributes

Take a look at these examples: Example 2

Example 1 <person>

<person gender="female"> <gender>female</gender>

<firstname>Anna</firstname> <firstname>Anna</firstname>

<lastname>Smith</lastname> <lastname>Smith</lastname>

</person> </person>

In the first example gender is an attribute. In the last, gender is an


element. Both examples provide the same information. There are no rules
about when to use attributes or when to use elements in XML.
24
XML Attributes

 Some things to consider when using attributes are:

 Attributes cannot contain multiple values (elements can)

 Attributes cannot contain tree structures (elements can)

 Attributes are not easily expandable (for future changes)

Name Conflicts

In XML, element names are defined by the developer. This often results in
a conflict when trying to mix XML documents from different XML
applications.

25
 This XML carries HTML table information:

Example Example

This XML carries HTML This XML carries info about a

table information: table (a piece of furniture):

<table> <table>

<tr> <name>African Coffee Table


</name>
<td>Apples</td>
<width>80</width>
<td>Bananas</td>
<length>120</length>
</tr>
</table>
</table>
26
 If these XML fragments were added together, there would be a name

conflict.

 Both contain a <table> element, but the elements have different content

and meaning.

 A user or an XML application will not know how to handle these

differences.

Solving the Name Conflict Using a Prefix

 Name conflicts in XML can easily be avoided using a name prefix.

27
 This XML carries HTML table information:

Example Example

<h:table> <f:table>

<h:tr> <f:name>African Coffee Table


</f:name>
<h:td>Apples</h:td>
<f:width>80</f:width>
<h:td>Bananas</h:td>
<f:length>120</f:length>
</h:tr>
</f:table>
</h:table>

In the example above, there will be no conflict because the two <table>
elements have different names. 28
XML Namespaces - The xmlns Attribute

 When using prefixes in XML, a namespace for the prefix must be defined.

 The namespace can be defined by an xmlns attribute in the start tag of an

element.

 The namespace declaration has the following syntax.


xmlns:prefix="URI".
<root>

<h:table xmlns:h="http://www.w3.org/TR/html4/">

<h:tr>

<h:td>Apples</h:td>

<h:td>Bananas</h:td>

</h:tr>

</h:table> 29
XML Namespaces - The xmlns Attribute

 When a namespace is defined for an element, all child elements with the

same prefix are associated with the same namespace.

 Namespaces can also be declared in the XML root element:


<root

xmlns:h="http://www.w3.org/TR/html4/"

xmlns:f="http://www.w3schools.com/furniture">
<h:table>

<h:tr>

<h:td>Apples</h:td>

<h:td>Bananas</h:td>

</h:tr>
30
</h:table>
XML Namespaces - The xmlns Attribute

 Note: The namespace URI is not used by the parser to look up

information.

 The purpose of using an URI is to give the namespace a unique name.

 A Uniform Resource Identifier (URI) is a string of characters which

identifies an Internet Resource.

 The most common URI is the Uniform Resource Locator (URL) which

identifies an Internet domain address.

31
XML Namespaces - The xmlns Attribute

 Default Namespaces Defining a default namespace for an element saves

us from using prefixes in all the child elements.

 It has the following syntax: xmlns="namespaceURI"

This XML carries HTML table information:

<table xmlns="http://www.w3.org/TR/html4/">

<tr>

<td>Apples</td>

<td>Bananas</td>

</tr>

</table>
32
Well Formed XML Documents

 An XML document with correct syntax is called "Well Formed".

The syntax rules were described in the previous :

 XML documents must have a root element

 XML elements must have a closing tag

 XML tags are case sensitive

 XML elements must be properly nested

 XML attribute values must be quoted

33
Valid XML Documents

 A "well formed" XML document is not the same as a "valid" XML

document.

 A "valid" XML document must be well formed. In addition, it must

conform to a document type definition.

There are two different document type definitions that can be used
with XML:

 DTD - The original Document Type Definition

 XML Schema - An XML-based alternative to DTD

 A document type definition defines the rules and the legal elements

and attributes for an XML document. 34


XML DTD

 An XML document with correct syntax is called "Well Formed".

 An XML document validated against a DTD is both "Well Formed"

and "Valid".

 The purpose of a DTD is to define the structure of an XML document.

 DTD describes: the elements that can appear in an XML document.

The order in which they can appear. Element attributes and whether
they are optional or mandatory. Whether attributes can have default
values.

35
DTD Example

<!DOCTYPE note

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

]>

36
XML DTD

The DTD above is interpreted like this:

 !DOCTYPE note defines that the root element of the document is note

 !ELEMENT note defines that the note element must contain the

elements: "to, from, heading, body“

 !ELEMENT to defines the to element to be of type "#PCDATA“

 !ELEMENT from defines the from element to be of type "#PCDATA“

 !ELEMENT heading defines the heading element to be of type

"#PCDATA“

 !ELEMENT body defines the body element to be of type "#PCDATA"

37
Advantages of using DTD

 Documentation - You can define your own format for the XML files.

Looking at this document a user/developer can understand the structure


of the data.

 Validation - It gives a way to check the validity of XML files by

checking whether the elements appear in the right order, mandatory


elements and attributes are in place, the elements and attributes have not
been inserted in an incorrect way, and so on.

Disadvantages of using DTD

 It does not support the namespaces.

 It supports only the text string data type.


38
Types

 DTD can be classified on its declaration basis in the XML document,

such as:

1. Internal DTD

2. External DTD

 When a DTD is declared within the file it is called Internal DTD and if

it is declared in a separate file it is called External DTD.

Internal DTD

 To reference it as internal DTD, standalone attribute in XML

declaration must be set to yes. 39


Syntax

 The syntax of internal DTD is as shown:

<!DOCTYPE root-element [element-declarations]>

where root-element is the name of root element and element-declarations

is where you declare the elements.

Rules
 The document type declaration must appear at the start of the document

preceded only by the XML header.

 Similar to the DOCTYPE declaration, the element declarations must

start with an exclamation mark.

 The Name in the DTD must match the element type of the root element.
40
Declaration <?xml version="1.0“ encoding="UTF-8" standalone="yes"?>
DOCTYPE declaration <!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)>

DTD Body <!ELEMENT to (#PCDATA)>


you declare elements, <!ELEMENT from (#PCDATA)>
attributes, entities
<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

End of declaration ]>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend</body>

</note> 41
Example 2 of internal DTD:
Declaration <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

DOCTYPE declaration <!DOCTYPE address [

<!ELEMENT address (name,company,phone)>

<!ELEMENT name (#PCDATA)> defines the


DTD Body element name to
you declare elements, <!ELEMENT company (#PCDATA)> be of type
attributes, entities "#PCDATA". Here
<!ELEMENT phone (#PCDATA)> #PCDATA means
parse-able text
]> data.
End of declaration
<address>

<name>Tanmay Patil</name>

<company>TutorialsPoint</company>

<phone>(011) 123-4567</phone>
42
</address>
External DTD

 In external DTD elements are declared outside the XML file.

 They are accessed by specifying the system attributes which may be

either the legal .dtd file or a valid URL.

 To reference it as external DTD, standalone attribute in the XML


declaration must be set as no.

Syntax

Following is the syntax for external DTD:

<!DOCTYPE root-element SYSTEM "file-name">

where file-name is the file with .dtd extension.

43
Example 1 of External DTD:
XML Declaration <?xml version="1.0 " encoding="UTF-8" standalone="no"?>

DOCTYPE declaration <!DOCTYPE note SYSTEM "note.dtd">

<note>

<to>Tove</to>
DTD name
<from>Jani</from>

XML Body <heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

44
Example 2 of External DTD:
Declaration
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

DOCTYPE declaration <!DOCTYPE address SYSTEM "address.dtd">

<address>

<name>Tanmay Patil</name>

DTD name
<company>TutorialsPoint</company>
XML Document Body
<phone>(011) 123-4567</phone>

</address>

You can refer to an external DTD by either using system identifiers or public
identifiers. 45
DTD – COMPONENTS

A DTD will basically contain declarations of the following XML


components:

 Element: XML elements can be defined as building blocks of an XML

document.

 Attributes

 Entities

Content of elements declaration in a DTD can be categorized as below:

 Empty content

 Element content

 Mixed content
46

Element Content Types

1. Empty Content: This is a special case of element declaration. This


element declaration does not contain any content.

Following is the syntax for empty element declaration:

<!ELEMENT elementname EMPTY >

2. Element Content: In element declaration with element content, the


content would be allowable elements within parentheses.

Following is syntax of element declaration with element content:

<!ELEMENT elementname (child1, child2...)>

47
Rules

We need to follow certain rules if there is more than one element content:

1. Sequences : Often the elements within DTD documents must appear in a

distinct order.

For example:

<!ELEMENT address (name,company,phone)>

2. Choices: Suppose you need to allow one element or another, but not
both. In such cases you must use the pipe | character. The pipe functions
as an exclusive OR.

For example:

<!ELEMENT address (mobile | landline)> 48


List of Operators and Syntax Rules

49
DTD – ATTRIBUTES

 Attribute gives more information about an element or more precisely it

defines a property of an element.

Syntax

<!ATTLIST element-name attribute-name attribute-type attribute-value>

50
Example
Declaration <?xml version="1.0" encoding="UTF-8" standalone=“yes" ?>

DOCTYPE declaration <!DOCTYPE address [


DTD Body
Elements <!ELEMENT address ( name )>

<!ELEMENT name ( #PCDATA )>


DTD Body
Attributes <!ATTLIST name id CDATA #REQUIRED>

]>

<address>

<name >Tanm ay Patil</name>

</address>

51
Rules of Attribute Declaration
All attributes used in an XML document must be declared in the Document

Type Definition DTD using an Attribute-List Declaration

 Attributes may only appear in start or empty tags.

 The keyword ATTLIST must be in upper case

 No duplicate attribute names will be allowed within the attribute list

for a given element.

52
Rules of Attribute Declaration
 Within each attribute declaration, you must specify how the value will appear

in the document. You can specify if an attribute:

I. can have a default value

<!ATTLIST element-name attribute-name attribute-type "default-value">

II. can have a fixed value

<!ATTLIST element-name attribute-name attribute-type #FIXED "value" >

III. is Required

<!ATTLIST element-name attribute-name attribute-type #REQUIRED>

IV. is Implied

<!ATTLIST element-name attribute-name attribute-type #IMPLIED>


53
Working with CSS
 Cascading Style Sheet (CSS) is a relatively simple tool that allows the

developer to assign styles to HTML elements.

 CSS duplicates formatting built into HTML.

 It provides web developers with access to a large variety of formatting


properties such as margins, line-height, word spacing and much more.

 CSS is easy to learn and style sheets can be included directly in xml documents

or can be saved as standalone text files.

54
Cont.
 Multiple style sheets can be written to provide for different output on the same

formatting device.

 A style sheet can be included in xml document as internal style sheet or as an

External style sheet which is saved in an external file and is referenced using
the link tag in the xml document .

<LINK REL=”stylesheet” type=”text/css” HREF=”MyStylesheet.css” >

 CSS is definitely easier to learn and implement.

 XSL is a special style sheet mechanism created specifically for XML documents

and is noticeably more complex and extensive than CSS.

55
Cont.
 CSS is the primary style language used to direct the display of XML documents

on the web and in other media.

 It describes a solid mechanism for describing the final display of XML

document.

 CSS works well with XML as it does with HTML documents.

56
Example 1
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/css" href="today.css"?>

<CATALOG>

<CD>

<TITLE>Picture book</TITLE>

<ARTIST>Simply Red</ARTIST>

<COUNTRY>EU</COUNTRY>

<COMPANY>Elektra</COMPANY>

<PRICE>7.20</PRICE>

<YEAR>1985</YEAR>

</CD>
57
</CATALOG>
Today.css TITLE {

CATALOG { display: block;

background-color: #ffffff; color: #ff0000;

width: 100%; font-size: 20pt; }

} ARTIST {

CD { display: block;

display: block; color: #0000ff;

margin-bottom: 30pt; font-size: 20pt;

margin-left: 0; }

} COUNTRY, PRICE, YEAR, COMPANY {

display: block;

color: #000000;
58
margin-left: 20pt;
Cont.
 The style sheet processing instruction is added to the beginning of the XML

document so that the XML document can locate its attached style sheet.

<?xml-stylesheet type="text/css" href="emp.css" ?>

 Type indicates that a text file is being converted into css file.

 href refers to the path of the css file.

59
Cont.
 To use HTML tags in the xml document. All the tags have to be prefixed with

keyword HTML as shown below:

<HTML:A>..</HTML:A>

 Here A refers to the <A> ……</A> anchor element of HTML. This tag is

useful to create link between two or more XML documents.

60

You might also like