You are on page 1of 22

UNIT-3

XML
INTRODUCTION:
What is xml

o Xml (extensible Markup Language) is a mark- up language.


o XML is designed to store and transport data.
o Xml was released in late 90’s. It was created to provide an easy to use and
store self describing data.
o XML became a W3C Recommendation on February 10, 1998.
o XML is not a replacement for HTML.
o XML is designed to be self-descriptive.
o XML is designed to carry data, not to display data.
o XML tags are not predefined. You must define your own tags.
o XML is platform independent and language independent.

What is mark-up language

A mark-up language is a modern system for highlight or underline a document.

Students often underline or highlight a passage to revise easily, same in the sense
of modern mark-up language highlighting or underlining is replaced by tags.

HTML VS XML

Difference between HTML and XML: There are many differences


between HTML and XML. These important differences are given below:
HTML XML

1. It was written in 1993. It was released in 1996.


HTML stands for Hyper Text XML stands for Extensible Markup
2.
Markup Language. Language.

3. HTML is static in nature. XML is dynamic in nature.

It was developed by Worldwide Web


4. It was developed by WHATWG.
Consortium.

It is termed as a presentation It is neither termed as a presentation


5.
language. nor a programming language.

XML provides a framework to define


6. HTML is a markup language.
markup languages.

7. HTML can ignore small errors. XML does not allow errors.

It has an extension of .html


8. It has an extension of .xml
and .htm

9. HTML is not Case sensitive. XML is Case sensitive.

10. HTML tags are predefined tags. XML tags are user-defined tags.

There are limited number of tags in


11. XML tags are extensible.
HTML.

HTML does not preserve white White space can be preserved in


12.
spaces. XML.

HTML tags are used for displaying XML tags are used for describing the
13.
the data. data not for displaying.

In HTML, closing tags are not


14. In XML, closing tags are necessary.
necessary.
15. HTML is used to display the data. XML is used to store data.

HTML does not carry data it just XML carries the data to and from the
16.
displays it. database.

IN XML, the objects are expressed by


17. HTML offers native object support.
conventions using attributes.

XML document size is relatively large


HTML document size is relatively
18. as the approach of formatting and the
small.
codes both are lengthy.

An additional application is not DOM(Document Object Model) is


19. required for parsing of JavaScript required for parsing JavaScript codes
code into the HTML document. and mapping of text.

Some of the tools used for HTML


are: Some of the tools used for XML are:
 Visual Studio Code  Oxygen XML
20.  Atom  XML Notepad
 Notepad++  Liquid Studio
 Sublime Text and many more.
and many more.

 Syntax of XML Document:

In this chapter, we will discuss the simple syntax rules to write an XML document.
Following is a complete XML document −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
You can notice there are two kinds of information in the above example −
 Markup, like <contact-info>
 The text, or the character data, Tutorials Point and (040) 123-4567.
The following diagram depicts the syntax rules to write different types of markup
and text in an XML document.

Let us see each component of the above diagram in detail.

XML Declaration
The XML document can optionally have an XML declaration. It is written as follows −
<?xml version = "1.0" encoding = "UTF-8"?>
Where version is the XML version and encoding specifies the character encoding used
in the document.
Syntax Rules for XML Declaration
 The XML declaration is case sensitive and must begin with "<?xml>" where "xml"
is written in lower-case.
 If document contains XML declaration, then it strictly needs to be the first
statement of the XML document.
 The XML declaration strictly needs be the first statement in the XML document.
 An HTTP protocol can override the value of encoding that you put in the XML
declaration.
Tags and Elements
An XML file is structured by several XML-elements, also called XML-nodes or XML-
tags. The names of XML-elements are enclosed in triangular brackets < > as shown
below −
<element>
Syntax Rules for Tags and Elements
Element Syntax − Each XML-element needs to be closed either with start or with end
elements as shown below −
<element>....</element>
or in simple-cases, just this way −
<element/>
Nesting of Elements − An XML-element can contain multiple XML-elements as its
children, but the children elements must not overlap. i.e., an end tag of an element must
have the same name as that of the most recent unmatched start tag.
The Following example shows incorrect nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint
</contact-info>
</company>
The Following example shows correct nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
<contact-info>
Root Element − An XML document can have only one root element. For example,
following is not a correct XML document, because both the x and y elements occur at
the top level without a root element −
<x>...</x>
<y>...</y>
The Following example shows a correctly formed XML document −
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity − The names of XML-elements are case-sensitive. That means the
name of the start and the end elements need to be exactly in the same case.
For example, <contact-info> is different from <Contact-Info>
 XML Attributes:
An attribute specifies a single property for the element, using a name/value pair. An
XML-element can have one or more attributes. For example −
<a href = "http://www.tutorialspoint.com/">Tutorialspoint!</a>
Here href is the attribute name and http://www.tutorialspoint.com/ is attribute value.
Syntax Rules for XML Attributes
 Attribute names in XML (unlike HTML) are case sensitive. That
is, HREF and href are considered two different XML attributes.
 Same attribute cannot have two values in a syntax. The following example shows
incorrect syntax because the attribute b is specified twice


<a b = "x" c = "y" b = "z">....</a>
 Attribute names are defined without quotation marks, whereas attribute values
must always appear in quotation marks. Following example demonstrates
incorrect xml syntax


<a b = x>....</a>
In the above syntax, the attribute value is not defined in quotation marks.

XML References
References usually allow you to add or include additional text or markup in an XML
document. References always begin with the symbol "&" which is a reserved character
and end with the symbol ";". XML has two types of references −
 Entity References − An entity reference contains a name between the start and
the end delimiters. For example &amp; where amp is name. The name refers to
a predefined string of text and/or markup.
 Character References − These contain references, such as &#65;, contains a
hash mark (“#”) followed by a number. The number always refers to the Unicode
code of a character. In this case, 65 refers to alphabet "A".
XML Text
The names of XML-elements and XML-attributes are case-sensitive, which means the
name of start and end elements need to be written in the same case. To avoid character
encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Whitespace characters like blanks, tabs and line-breaks between XML-elements and
between the XML-attributes will be ignored.
Some characters are reserved by the XML syntax itself. Hence, they cannot be used
directly. To use them, some replacement-entities are used, which are listed below −

Not Allowed Character Replacement Entity Character Description

< &lt; less than

> &gt; greater than

& &amp; ampersand

' &apos; apostrophe

" &quot; quotation mark

Previous Page

<to>Tove</to>
<from>Jani</from>
</note>
Entity References
Some characters have a special meaning in XML.

If you place a character like "<" inside an XML element, it will generate an error
because the parser interprets it as the start of a new element.

This will generate an XML error:

<message>salary < 1000</message>

To avoid this error, replace the "<" character with an entity reference:

<message>salary &lt; 1000</message>

There are 5 pre-defined entity references in XML:

&lt; < less than

&gt; > greater than

&amp; & ampersand

&apos; ' apostrophe

&quot; " quotation mark


Only < and & are strictly illegal in XML, but it is a good habit to replace > with
&gt; as well.

 USE OF ELEMENTS VS USE ATTRIBUTES

 In XML, there are no rules about when to use attributes, and when
to use child elements.

Use of Elements vs. Attributes


 Data can be stored in child elements or in attributes.
 Take a look at these examples:

 <person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

 <person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

 In the first example sex is an attribute. In the last, sex is a child


element. Both examples provide the same information.
 There are no rules about when to use attributes, and when to use
child elements. My experience is that attributes are handy in HTML,
but in XML you should try to avoid them. Use child elements if the
information feels like data.

 My Favorite Way
 I like to store data in child elements.
 The following three XML documents contain exactly the same
information:
 A date attribute is used in the first example:

 <note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

 A date element is used in the second example:

 <note>
<date>12/11/2002</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

 An expanded date element is used in the third: (THIS IS MY


FAVORITE):
 <note>
<date>
<day>12</day>
<month>11</month>
<year>2002</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
 XML VALIDATION:

Validation is a process by which an XML document is validated. An XML document is


said to be valid if its contents match with the elements, attributes and associated
document type declaration(DTD), and if the document complies with the constraints
expressed in it. Validation is dealt in two ways by the XML parser. They are −

 Well-formed
XML
document
 Valid XML
document
Well-formed XML Document
An XML document is said to be well-formed if it adheres to the following rules −
 Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
 It must follow the ordering of the tag. i.e., the inner tag must be closed before
closing the outer tag.
 Each of its opening tags must have a closing tag or it must be a self ending tag.
(<title>....</title> or <title/>).
 It must have only one attribute in a start tag, which needs to be quoted.
 amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than
these must be declared.
Example
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>

<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
 It defines the type of document. Here, the document type is element type.
 It includes a root element named as address.
 Each of the child elements among name, company and phone is enclosed in its
self explanatory tag.
 Order of the tags is maintained.
Valid XML Document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document. We will study more about DTD in the
chapter XML - DTDs.

 WELL FORMED XML DOCUMENT:

Introduction

An XML document is called well-formed if it satisfies certain


rules, specified by the W3C.

These rules are:

 A well-formed XML document must have a corresponding end tag for all of its start tags.

 Nesting of elements within each other in an XML document must be proper. For
example, <tutorial><topic>XML</topic></tutorial> is a correct way of nesting but
<tutorial><topic>XML</tutorial></topic> is not.

 In each element two attributes must not have the same value. For example, <tutorial
id="001"><topic>XML</topic></tutorial> is right,but <tutorial id="001"
id="w3r"><topic>XML</topic></tutorial> is incorrect.

 Markup characters must be properly specified. For example, <tutorial


id="001"><topic>XML</topic></tutorial> is right, not <tutorial id="001"
id="w3r"><topic>XML</topic></tutorial>.

 An XML document can contain only one root element. So, the root element of an xml
document is an element which is present only once in an xml document and it does not
appear as a child element within any other element.
Example of a Valid XML document

<?xml

version="1.0" ?>

<w3resource>

<design>

html

xhtml

css

svg

xml

</design>

<programming>

php

mysql

</programming>

</w3resource>

 VALID XML DOCUMENT:


An XML document is said to be well-formed if it adheres to the following rules −
 Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
 It must follow the ordering of the tag. i.e., the inner tag must be closed before
closing the outer tag.
 Each of its opening tags must have a closing tag or it must be a self ending tag.
(<title>....</title> or <title/>).
 It must have only one attribute in a start tag, which needs to be quoted.
 amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than
these must be declared.
Example
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>

<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
 It defines the type of document. Here, the document type is element type.
 It includes a root element named as address.
 Each of the child elements among name, company and phone is enclosed in its
self explanatory tag.
 Order of the tags is maintained.
Valid XML Document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document. We will study more about DTD in the
chapter XML - DTDs.

XML DTD
 INTERNAL DTD:
DTD stands for Document Type Definition , and it is used to define the structure
and content of an XML document. An XML document can have an internal DTD
or an external DTD, depending on the needs of the user. In this article, we will
discuss the differences between internal and external DTDs. This article will
discuss the dissimilarities between these two types of DTD, including their
syntax, sample usage, and variations presented in tabular form.
Internal DTD: Internal DTD is a type of Document Type Definition (DTD) in
XML that is written within the XML document itself. It specifies the structure
and rules for the elements and attributes of the XML document. An internal DTD
is enclosed within the <!DOCTYPE> declaration of the XML document and is
defined using a set of predefined keywords and syntax. Internal DTDs are
suitable for smaller XML documents where the complexity of the structure is
not very high. It is easier to maintain and modify the internal DTD as it is part of
the XML document itself.
Syntax:
<!DOCTYPE root_element[ <!ELEMENT element_name (element_content)>
<!ELEMENT another_element_name (another_element_content)>
]>
Example: In this example, we will show the internal DTD.
 XML

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE customers[ <!ELEMENT customers (customer+)>

<!ELEMENT customer (name, email, phone)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT email (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

]>

<customers>

<customer>

<name>Satyam Nayak</name>

<email>Satyam@Nayak.com</email>

<phone>112-123-1234</phone>

</customer>

<customer>
<name>Sonu N</name>

<email>Sonu@N.com</email>

<phone>112-455-9969</phone>

</customer>

</customers>

Output:

Internal DTD

The DTD defines the structure of an XML document that contains customer
information. The XML document contains two customer elements, and each
customer element contains a name, email, and phone element.
 External DTD:
External DTD is a type of Document Type Definition (DTD) in XML that is
located outside of the actual XML document it describes. It can be stored
in a separate file or accessed via a URL, and it defines the structure,
rules, and constraints for the elements and attributes within an XML
document. By using an external DTD, multiple XML documents can share
the same set of rules and constraints, leading to more consistency and
easier maintenance. External DTD can also be updated independently
without having to modify the XML documents themselves.
Syntax:
<!DOCTYPE root_element SYSTEM "DTD_file_name">
Example: In this example, we will show the external DTD
 XML
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE customers SYSTEM "customers.dtd">

<customers>

<customer>

<name>Satyam Nayak</name>

<email>Satyam@nayak.com</email>

<phone>122-112-1234</phone>

</customer>

<customer>

<name>Sonu N</name>

<email>Sonu@N.com</email>

<phone>112-554-9969</phone>

</customer>

</customers>

Output:

External DTD

The DTD is defined in a separate file called “customers.dtd.” The XML


document references the DTD using the DOCTYPE declaration. The DTD
defines the structure of an XML document that contains customer information.
useful for small XML documents with simple DTDs, while external DTDs are
useful for large XML documents with complex DTDs.
 THE BUILDING BLOCKS OF XML DOCUMENT:

The main building blocks of both XML and HTML documents are elements.

The Building Blocks of XML Documents


Seen from a DTD point of view, all XML documents are made up by the
following building blocks:

 Elements
 Attributes
 Entities
 PCDATA
 CDATA

Elements
Elements are the main building blocks of both XML and HTML documents.

Examples of HTML elements are "body" and "table". Examples of XML elements
could be "note" and "message". Elements can contain text, other elements, or
be empty. Examples of empty HTML elements are "hr", "br" and "img".

Examples:

<body>some text</body>

<message>some text</message>
Attributes
Attributes provide extra information about elements.

Attributes are always placed inside the opening tag of an element. Attributes
always come in name/value pairs. The following "img" element has additional
information about a source file:

<img src="computer.gif" />

The name of the element is "img". The name of the attribute is "src". The value
of the attribute is "computer.gif". Since the element itself is empty it is closed
by a " /".

Entities
Some characters have a special meaning in XML, like the less than sign (<) that
defines the start of an XML tag.

Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is
used in HTML to insert an extra space in a document. Entities are expanded
when a document is parsed by an XML parser.

The following entities are predefined in XML:

Entity References Character

&lt; <

&gt; >
&amp; &

&quot; "

&apos; '

PCDATA
PCDATA means parsed character data.

Think of character data as the text found between the start tag and the end tag
of an XML element.

PCDATA is text that WILL be parsed by a parser. The text will be


examined by the parser for entities and markup.

Tags inside the text will be treated as markup and entities will be expanded.

However, parsed character data should not contain any &, <, or > characters;
these need to be represented by the &amp; &lt; and &gt; entities, respectively.
CDATA
CDATA means character data.

CDATA is text that will NOT be parsed by a parser. Tags inside the text will
NOT be treated as markup and entities will not be expanded.

You might also like