You are on page 1of 131

WEB TECHNOLOGIES

PRASAD B
Assoc. Prof.
Dept. of Computer and Engineering

Email: bprasad@mlritm.ac.in
UNIT - 2
Objective

• To introduce XML and processing of XML Data with


Java

Email: bprasad@mlritm.ac.in
UNIT - 2

Scope

• XML plays an important role in many different IT systems.

• XML is often used for distributing data over the Internet.

• It is important (for all types of software developers!) to have a

good understanding of XML.

Email: bprasad@mlritm.ac.in
UNIT – 2
Course Outcomes

• Explore the concepts of XML and how to Parse XML


files using Java DOM and SAX parsers.

Email: bprasad@mlritm.ac.in
UNIT - 2

XML
( Extensible Markup Language)
What You Should Already Know

05/02/23 PRASAD B, Assoc. prof., 6


MLRITM,JNTU-H
XML - Introduction

05/02/23 PRASAD B, Assoc. prof., 7


MLRITM,JNTU-H
What is XML?

 XML stands for EXtensible Markup Language

 XML is a markup language much like HTML

 XML was designed to store and transport data

 XML was designed to be self-descriptive

 XML is a W3C Recommendation

05/02/23 PRASAD B, Assoc. prof., 8


MLRITM,JNTU-H
What is a XML File?

XML Does Not DO Anything:


Maybe it is a little hard to understand, but XML does not DO anything

Is a text-based markup language derived from Standard Generalized

Markup Language (SGML).

XML was designed to store and transport data.

XML was designed to be both human- and machine-readable.

05/02/23 PRASAD B, Assoc. prof., 9


MLRITM,JNTU-H
The Difference Between XML and HTML

XML and HTML were designed with different goals:


 XML was designed to carry data - with focus on what data is

 HTML was designed to display data - with focus on how data

looks

 XML tags are not predefined like HTML tags are

05/02/23 PRASAD B, Assoc. prof., 10


MLRITM,JNTU-H
What Can XML Do?

XML Simplifies Things:


 It simplifies data sharing
 It simplifies data transport
 It simplifies platform changes
 It simplifies data availability

 XML is used in many aspects of web development.


 XML is often used to separate data from presentation.
 XML is Often a Complement to HTML

05/02/23 PRASAD B, Assoc. prof., 11


MLRITM,JNTU-H
Why XML?

There are three important characteristics of XML that make it useful in


a variety of systems and solutions:
XML is extensible: XML allows you to create your own self-
descriptive tags, or language, that suits your application.
XML carries the data, does not present it: XML allows you to store
the data irrespective of how it will be presented.
XML is a public standard: XML was developed by an organization
called the World Wide Web Consortium (W3C) and is available as an
open standard.
05/02/23 PRASAD B, Assoc. prof., 12
MLRITM,JNTU-H
XML Usage

 XML can work behind the scene to simplify the creation of HTML documents for

large web sites.

 XML can be used to exchange the information between organizations and systems.

 XML can be used for offloading and reloading of databases.

 XML can be used to store and arrange the data, which can customize your data

handling needs.

 XML can easily be merged with style sheets to create almost any desired output.

 Virtually, any type of data can be expressed as an XML document.

05/02/23 PRASAD B, Assoc. prof., 13


MLRITM,JNTU-H
XML - Documents

05/02/23 PRASAD B, Assoc. prof., 14


MLRITM,JNTU-H
XML - Documents

 An XML document is a basic unit of XML information

composed of elements and other markup in an orderly

package.

 An XML document can contains wide variety of data.

05/02/23 PRASAD B, Assoc. prof., 15


MLRITM,JNTU-H
XML – Document Sections

 Document Prolog Section

 Document Elements Section

05/02/23 PRASAD B, Assoc. prof., 16


MLRITM,JNTU-H
XML Document example

// Document Prolog Section


<?xml version="1.0"?>

// Document Elements Section


<contact-info>
<name> Tanmay Patil </name>
<company> TutorialsPoint </company>
<phone> (011) 123-4567 </phone>
</contact-info>

05/02/23 PRASAD B, Assoc. prof., 17


MLRITM,JNTU-H
Document Prolog Section:

The document prolog comes at the top of the document, before

the root element. This section contains:

 XML declaration

 Document type declaration

05/02/23 PRASAD B, Assoc. prof., 18


MLRITM,JNTU-H
Document Elements Section

 Document Elements are the building blocks of XML.

 These divide the document into a hierarchy of sections, each

serving a specific purpose.

 We can separate a document into multiple sections so that they

can be rendered differently, or used by a search engine.

 The elements can be containers, with a combination of text and

other elements.
05/02/23 PRASAD B, Assoc. prof., 19
MLRITM,JNTU-H
XML- Syntax

05/02/23 PRASAD B, Assoc. prof., 20


MLRITM,JNTU-H
XML Syntax

05/02/23 PRASAD B, Assoc. prof., 21


MLRITM,JNTU-H
XML - Declaration

05/02/23 PRASAD B, Assoc. prof., 22


MLRITM,JNTU-H
XML - Declaration

 The XML document can optionally have an XML declaration.

It is written as below:

<?xml version="1.0" encoding="UTF-8"?>
Where version is the XML version and encoding specifies the

character encoding used in the document.

05/02/23 PRASAD B, Assoc. prof., 23


MLRITM,JNTU-H
Syntax Rules for XML Declaration

 Must begin with "<?xml>" where "xml" is written in lower-case.

 The XML declaration has no closing tag i.e. </?xml>

 If document contains XML declaration, then it strictly needs to

be the first statement of the XML document.

 If the XML declaration is included, it must contain version

number attribute.

05/02/23 PRASAD B, Assoc. prof., 24


MLRITM,JNTU-H
Syntax Rules for XML Declaration- CONT….

 The Parameter names and values are case-sensitive.

 The names are always in lower case.

 The order of placing the parameters is important. The correct

order is: version, encoding and standalone.

 Either single or double quotes may be used.

05/02/23 PRASAD B, Assoc. prof., 25


MLRITM,JNTU-H
Parameter Parameter_value Parameter_description

Version 1.0 Specifies the version of the XML


standard used.
Encoding UTF-8, UTF-16, ISO-10646- It defines the character encoding
UCS-2, ISO-10646-UCS-4, ISO- used in the document. UTF-8 is
8859-1 to ISO-8859-9, ISO- the default encoding used.
2022-JP, Shift_JIS, EUC-JP
Standalone yes or no. It informs the parser whether the
document relies on the
information from an external
source, such as external
document type definition (DTD),
for its content. The default value
is set to no. Setting it to yes tells
the processor there are no
external declarations required for
parsing the document.

05/02/23 PRASAD B, Assoc. prof.,


MLRITM,JNTU-H
XML Declaration Examples

 XML declaration with no parameters:


<?xml >
 XML declaration with version definition:
<?xml version="1.0">
 XML declaration with all parameters defined:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?
>
 XML declaration with all parameters defined in single quotes:
<?xml version='1.0' encoding='iso-8859-1'
05/02/23 PRASAD B, Assoc. prof., 27
standalone='no' ?> MLRITM,JNTU-H
XML Tags and Elements

05/02/23 PRASAD B, Assoc. prof., 28


MLRITM,JNTU-H
XML - Tags and Elements

 An XML file is structured by several XML-elements, also called

XML-nodes or XML-tags.

 XML-elements' names are enclosed by triangular brackets < >

as shown below:.

<element>

05/02/23 PRASAD B, Assoc. prof., 29


MLRITM,JNTU-H
XML Tags

 XML tags form the foundation of XML.

 They define the scope of an element in the XML.

 They can also be used to insert comments, declare settings

required for parsing the environment and to insert special

instructions.

05/02/23 PRASAD B, Assoc. prof., 30


MLRITM,JNTU-H
XML -TagsTypes:

 Start Tag

 End Tag

 Empty Tag

05/02/23 PRASAD B, Assoc. prof., 31


MLRITM,JNTU-H
Empty Tag:

 The text that appears between start-tag and end-tag is called content.

 An element which has no content is termed as empty.

An empty element can be represented in two ways as below:

A start-tag immediately followed by an end-tag as shown below:

<hr></hr>

A complete empty-element tag is as shown below:

<hr />
05/02/23 PRASAD B, Assoc. prof., 32
MLRITM,JNTU-H
Syntax Rules for Tags and Elements

 Element Syntax

 Nesting of elements

 Root element

 Case sensitivity

05/02/23 PRASAD B, Assoc. prof., 33


MLRITM,JNTU-H
Element Syntax: 

Each XML-element needs to be closed either with start or with end

Elements as shown below:

<element>....</element>

or in simple-cases, just this way:

<element/>

05/02/23 PRASAD B, Assoc. prof., 34


MLRITM,JNTU-H
Nesting of elements:

An XML-element can contain multiple XML-elements as its children, but the

children elements must not overlap. i.e., an end tag of an element must

have the same name as that of the most recent unmatched start tag.
Following example shows

incorrect nested tags: correct nested tags:


<?xml version="1.0"?> <?xml version="1.0"?>
<contact-info> <contact-info>
<company>TutorialsPoint <company>TutorialsPoint
<contact-info> </company>
</company> <contact-info>
05/02/23 PRASAD B, Assoc. prof., 35
MLRITM,JNTU-H
Root element:

 XML documents must contain one root element that is

the parent of all other elements:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

05/02/23 PRASAD B, Assoc. prof., 36


MLRITM,JNTU-H
Case sensitivity: 

 The names of XML-elements are case-sensitive.

 That means the name of the start and the end elements need to

be exactly in the same case.

 For example

<contact-info> is different from <Contact-Info>. 

05/02/23 PRASAD B, Assoc. prof., 37


MLRITM,JNTU-H
Rules for Tags and Elements Example:

<?xml version="1.0"?>

<contact-info>

<company> TutorialsPoint </company>

<contact-info>

05/02/23 PRASAD B, Assoc. prof., 38


MLRITM,JNTU-H
XML - Attributes

05/02/23 PRASAD B, Assoc. prof., 39


MLRITM,JNTU-H
XML - Attributes

 An attribute specifies a single property for the element, using a

name/value pair.

 An XML-element can have one or more attributes.

For example:

<a href="http://www.tutorialspoint.com/">Tutorialspoint!</a>

Here href is the attribute name and

 http://www.tutorialspoint.com/ is attribute value.


05/02/23 PRASAD B, Assoc. prof., 40
MLRITM,JNTU-H
Syntax Rules for XML Attributes

 Attribute names in XML (unlike HTML) are case sensitive.

 Same attribute cannot have two values in a syntax.


The following example shows incorrect syntax:

<a b="x" c="y" b="z">....</a>


because the attribute b is specified twice:

 Attribute names are defined without quotation marks, whereas attribute values

must always appear in quotation marks.


Following example demonstrates incorrect xml syntax:
<a b=x>....</a>
the attribute value is not defined in quotation marks.
05/02/23 PRASAD B, Assoc. prof., 41
MLRITM,JNTU-H
XML - Attribute Types

 String Type

 TokenizedT ype

 Enumerated Type

05/02/23 PRASAD B, Assoc. prof., 42


MLRITM,JNTU-H
String Type

 It takes any literal string as a value.

 CDATA is a StringType.

 CDATA is character data.

 This means, any string of non-markup characters is a legal part

of the attribute.

05/02/23 PRASAD B, Assoc. prof., 43


MLRITM,JNTU-H
TokenizedT ype

 This is more constrained type.

 The validity constraints noted in the grammar are applied after the

attribute value is normalized.

05/02/23 PRASAD B, Assoc. prof., 44


MLRITM,JNTU-H
TokenizedT ype
ID : It is used to specify the element as unique.
IDREF : It is used to reference an ID that has been named for another element.
IDREFS : It is used to reference all IDs of an element.
ENTITY : It indicates that the attribute will represent an external entity in the
document.
ENTITIES : It indicates that the attribute will represent external entities in the
document.
NMTOKEN : It is similar to CDATA with restrictions on what data can be part of the
attribute.
NMTOKENS : It is similar to CDATA with restrictions on what data can be part of the
attribute.
05/02/23 PRASAD B, Assoc. prof., 45
MLRITM,JNTU-H
Enumerated Type

This has a list of predefined values in its declaration. out of which,

it must assign one value.

There are two types of enumerated attribute:

NotationType : It declares that an element will be referenced to a

NOTATION declared somewhere else in the XML document.

Enumeration : Enumeration allows you to define a specific list of

values that the attribute value must match.


05/02/23 PRASAD B, Assoc. prof., 46
MLRITM,JNTU-H
XML - References

05/02/23 PRASAD B, Assoc. prof., 47


MLRITM,JNTU-H
XML References

 References usually allow you to add or include additional text or

markup in an XML document.

 References always begin with the symbol "&" ,which is a

reserved character and end with the symbol ";"

05/02/23 PRASAD B, Assoc. prof., 48


MLRITM,JNTU-H
XML References Types

 Entity References

 Character References

05/02/23 PRASAD B, Assoc. prof., 49


MLRITM,JNTU-H
Entity References:

  An entity reference contains a name between the start and the

end delimiters.

For example  &amp;

 where amp is name. The name refers to a predefined string of text

and/or markup.

05/02/23 PRASAD B, Assoc. prof., 50


MLRITM,JNTU-H
Character References:

  These contain references, such as &#65;

 contains a hash mark (“#”) followed by a number.

 The number always refers to the Unicode code of a character.

 In this case, 65 refers to alphabet "A".

05/02/23 PRASAD B, Assoc. prof., 51


MLRITM,JNTU-H
XML - Text

05/02/23 PRASAD B, Assoc. prof., 52


MLRITM,JNTU-H
XML Text

 The names of XML-elements and XML-attributes are case-

sensitive, which means the name of start and end elements need

to be written in the same case.

 To avoid character encoding problems, all XML files should be

saved as Unicode UTF-8 or UTF-16 files.

05/02/23 PRASAD B, Assoc. prof., 53


MLRITM,JNTU-H
XML Text – cont…..

 Whitespace characters like blanks, tabs and line-breaks between

XML-elements and between the XML-attributes will be ignored.

 Some characters are reserved by the XML syntax itself.

 Hence, they cannot be used directly.

 To use them, some replacement-entities are used.

05/02/23 PRASAD B, Assoc. prof., 54


MLRITM,JNTU-H
Replacement-entities

not allowed replacement- character


character entity description

< &lt; less than

> &gt; greater than

& &amp; ampersand

' &apos; apostrophe

" &quot; quotation mark

05/02/23 PRASAD B, Assoc. prof., 55


MLRITM,JNTU-H
predefined character entities 

Entity name Character Decimal Hexadecimal


reference reference
quot " &#34; &#x22;
amp & &#38; &#x26;
apos ' &#39; &#x27;
lt < &#60; &#x3C;
gt > &#62; &#x3E;

05/02/23 PRASAD B, Assoc. prof., 56


MLRITM,JNTU-H
XML - Comments

05/02/23 PRASAD B, Assoc. prof., 57


MLRITM,JNTU-H
XML - Comments

 XML comments are similar to HTML comments.

 The comments are added as notes or lines for understanding

the purpose of an XML code.

 Comments can be used to include related links, information

and terms.

 They are visible only in the source code; not in the XML code.

 Comments may appear anywhere in XML code.


05/02/23 PRASAD B, Assoc. prof., 58
MLRITM,JNTU-H
Syntax:

<!-------Your comment----->

 A comment starts with <!-- and ends with -->.

 we can add textual notes as comments between the characters.

 we must not nest one comment inside the other.

05/02/23 PRASAD B, Assoc. prof., 59


MLRITM,JNTU-H
XML Comments Rules

 Comments cannot appear before XML declaration.

 Comments may appear anywhere in a document.

 Comments must not appear within attribute values.

 Comments cannot be nested inside the other comments.

05/02/23 PRASAD B, Assoc. prof., 60


MLRITM,JNTU-H
XML - DTDs
Document Type Declaration

05/02/23 PRASAD B, Assoc. prof., 61


MLRITM,JNTU-H
Document Type Declaration

 The XML Document Type Declaration, commonly known as DTD,

is a way to describe XML language precisely.

 DTDs check vocabulary and validity of the structure of XML

documents against grammatical rules of appropriate XML

language.

 An XML DTD can be either specified inside the document, or it

can be kept in a separate document and then liked separately.


05/02/23 PRASAD B, Assoc. prof., 62
MLRITM,JNTU-H
<!DOCTYPE element DTD identifier
Syntax [
declaration1
declaration2 ........
]>

In the above syntax,


The DTD starts with <!DOCTYPE delimiter.
An element tells the parser to parse the document from the
specified root element.
DTD identifier is an identifier for the document type definition,
which may be the path to a file on the system or URL to a file on the
internet. If the DTD is pointing to external path, it is called External
Subset.
The square brackets [ ] enclose an optional list of entity
declarations called Internal Subset.
05/02/23 PRASAD B, Assoc. prof., 63
MLRITM,JNTU-H
XML DTD -Types

 Internal DTD

 External DTD
• System identifiers  
• Public identifiers
05/02/23 PRASAD B, Assoc. prof., 64
MLRITM,JNTU-H
Internal DTD

 A DTD is referred to as an internal DTD if elements are

declared within the XML files.

 To refer it as internal DTD, standalone attribute in XML

declaration must be set to yes.

 This means, the declaration works independent of external

source.

05/02/23 PRASAD B, Assoc. prof., 65


MLRITM,JNTU-H
Internal DTD - Syntax

<!DOCTYPE root-element [element-declarations]>

 where root-element is the name of root element and 

 element-declarations is where we declare the elements.

05/02/23 PRASAD B, Assoc. prof., 66


MLRITM,JNTU-H
Following is a simple example of internal DTD:

// Start Declaration
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
// DTD
<!DOCTYPE address [
// DTD Body
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

// End Declaration
]>
// XML document
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>

05/02/23 PRASAD B, Assoc. prof., 67


MLRITM,JNTU-H
Start Declaration- Begin the XML

DTD- Immediately after the XML header, the document type

declaration follows, commonly referred to as the DOCTYPE

The DOCTYPE declaration has an exclamation mark (!) at the start of

the element name.

The DOCTYPE informs the parser that a DTD is associated with this

XML document.
05/02/23 PRASAD B, Assoc. prof., 68
MLRITM,JNTU-H
DTD Body- The DOCTYPE declaration is followed by body of the

DTD, where you declare elements, attributes, entities, and notations

End Declaration - Finally, the declaration section of the DTD is

closed using a closing bracket and a closing angle bracket (]>).

This effectively ends the definition, and thereafter, the XML

document follows immediately.

05/02/23 PRASAD B, Assoc. prof., 69


MLRITM,JNTU-H
Internal DTD - Rules
 The document type declaration must appear at the start of the

document (preceded only by the XML header) — it is not

permitted anywhere else within the document.

 Similar to the DOCTYPE declaration, the element declarations

must start with an exclamation mark.

 The Name in the document type declaration must match the

element type of the root element.


05/02/23 PRASAD B, Assoc. prof., 70
MLRITM,JNTU-H
External DTD
 In external DTD elements are declared outside the XML file.

 They are accessed by specifying the system attributes which

may be either the legal .dtd file or a valid URL.

 To refer it as external DTD, standalone attribute in the XML

declaration must be set as no.

 This means, declaration includes information from the external

source.
05/02/23 PRASAD B, Assoc. prof., 71
MLRITM,JNTU-H
External DTD - Syntax
<!DOCTYPE root-element SYSTEM "file-name">

 where file-name is the file with .dtd extension.

05/02/23 PRASAD B, Assoc. prof., 72


MLRITM,JNTU-H
Following is a simple example of external DTD:

<?xml version="1.0" encoding="UTF-8"

standalone="no" ?>

<!DOCTYPE address SYSTEM "address.dtd">


address.dtd
<address>
<!ELEMENT address
<name>Tanmay Patil</name>
(name,company,phone)>
<company>TutorialsPoint</company> <!ELEMENT name (#PCDATA)>

<phone>(011) 123-4567</phone> <!ELEMENT company (#PCDATA)>


<!ELEMENT phone (#PCDATA)>
</address>

05/02/23 PRASAD B, Assoc. prof., 73


MLRITM,JNTU-H
External DTD - Types

we can refer to an external DTD by using either 

System identifiers or 

Public identifiers.

05/02/23 PRASAD B, Assoc. prof., 74


MLRITM,JNTU-H
System Identifiers

A system identifier enables you to specify the location of an external

file containing DTD declarations.

Syntax is as follows:

<!DOCTYPE name SYSTEM "address.dtd" [...]>

 As you can see, it contains keyword SYSTEM and a URI

reference pointing to the location of the document.


05/02/23 PRASAD B, Assoc. prof., 75
MLRITM,JNTU-H
Public Identifiers
 Public identifiers provide a mechanism to locate DTD resources

and are written as below:


<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">

 As we can see, it begins with keyword PUBLIC, followed by a


specialized identifier.
 Public identifiers are used to identify an entry in a catalog.
 Public identifiers can follow any format, however, a commonly
used format is called Formal Public Identifiers, or FPIs.

05/02/23 PRASAD B, Assoc. prof., 76


MLRITM,JNTU-H
XML - Schemas

05/02/23 PRASAD B, Assoc. prof., 77


MLRITM,JNTU-H
XML - Schemas
 XML Schema is commonly known as XML Schema Definition (XSD).

 It is used to describe and validate the structure and the content of

XML data.

 XML schema defines the elements, attributes and data types.

 Schema element supports Namespaces.

 It is similar to a database schema that describes the data in a

database.
05/02/23 PRASAD B, Assoc. prof., 78
MLRITM,JNTU-H
XML Schemas are More Powerful than DTD

 XML Schemas are written in XML

 XML Schemas are extensible to additions

 XML Schemas support data types

 XML Schemas support namespaces

05/02/23 PRASAD B, Assoc. prof., 79


MLRITM,JNTU-H
Why Use an XML Schema?

 With XML Schema, XML files can carry a description of its

own format.

 With XML Schema, independent groups of people can agree

on a standard for interchanging data.

 With XML Schema, we can verify data.

05/02/23 PRASAD B, Assoc. prof., 80


MLRITM,JNTU-H
XML Schemas Support Data Types

One of the greatest strength of XML Schemas is the support for

data types:

 It is easier to describe document content

 It is easier to define restrictions on data

 It is easier to validate the correctness of data

 It is easier to convert data between different data types

05/02/23 PRASAD B, Assoc. prof., 81


MLRITM,JNTU-H
XML Schemas use XML Syntax

Another great strength about XML Schemas is that they are written

in XML:

 You don't have to learn a new language

 You can use your XML editor to edit your Schema files

 You can use your XML parser to parse your Schema files

 You can manipulate your Schemas with the XML DOM

 You can transform your Schemas with XSLT


05/02/23 PRASAD B, Assoc. prof., 82
MLRITM,JNTU-H
XML Schemas are extensible, because they are written in XML.

With an extensible Schema definition you can:

Reuse your Schema in other Schemas

Create your own data types derived from the standard types

Reference multiple schemas in the same document

05/02/23 PRASAD B, Assoc. prof., 83


MLRITM,JNTU-H
XML Schemas Secure Data Communication

 When sending data from a sender to a receiver, it is

essential that both parts have the same "expectations"

about the content.

 With XML Schemas, the sender can describe the data in a

way that the receiver will understand.

XML data type "date" requires the format "YYYY-MM-DD".

05/02/23 PRASAD B, Assoc. prof., 84


MLRITM,JNTU-H
Well-Formed is Not Enough

A well-formed XML document is a document that conforms to the XML syntax rules, like:
it must begin with the XML declaration
it must have one unique root element
start-tags must have matching end-tags
elements are case sensitive
all elements must be closed
all elements must be properly nested
all attribute values must be quoted
entities must be used for special characters
Even if documents are well-formed they can still contain errors, and those
errors can have serious consequences.
05/02/23 PRASAD B, Assoc. prof., 85
MLRITM,JNTU-H
XML – Schemas - Syntax

<?xml version="1.0"?>

<xs:schema>
...
...
</xs:schema>

The <schema> element may contain some attributes. 

 elements and data types used in the schema 


 elements defined by this schema (note, to, from, heading, body.) 
 default namespace 
 elements used by the XML must be namespace qualified.
05/02/23 PRASAD B, Assoc. prof., 86
MLRITM,JNTU-H
XML – Schemas - Syntax

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
...
...
</xs:schema>

05/02/23 PRASAD B, Assoc. prof., 87


MLRITM,JNTU-H
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema

indicates that the elements and data types used in the schema
come from the "http://www.w3.org/2001/XMLSchema" namespace.
It also specifies that the elements and data types that come from
the "http://www.w3.org/2001/XMLSchema" namespace should be
prefixed with xs:

targetNamespace=http://www.w3schools.com

indicates that the elements defined by this schema (note, to, from,
heading, body.) come from the "http://www.w3schools.com" namespace.

05/02/23 PRASAD B, Assoc. prof., 88


MLRITM,JNTU-H
xmlns=http://www.w3schools.com]

indicates that the default namespace is


"http://www.w3schools.com".

elementFormDefault="qualified">

indicates that any elements used by the XML instance document which were
declared in this schema must be namespace qualified.

05/02/23 PRASAD B, Assoc. prof., 89


MLRITM,JNTU-H
Following is a simple example shows how to use schema:

<?xml version="1.0" encoding="UTF-8"?>


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
The basic idea behind XML Schemas is
</xs:complexType>
that they describe the legitimate format
</xs:element>
</xs:schema>
that an XML document can take.

05/02/23 PRASAD B, Assoc. prof., 90


MLRITM,JNTU-H
XML – Schemas : Elements

 As we saw in the XML – Elements are the building blocks of

XML document.

 An element can be defined within an XSD as follows:

<xs:element name="x" type="y"/>

05/02/23 PRASAD B, Assoc. prof., 91


MLRITM,JNTU-H
XML – Schemas : Elements

 As we saw in the XML – Elements are the building blocks of

XML document.

 An element can be defined within an XSD as follows:

<xs:element name="x" type="y"/>

05/02/23 PRASAD B, Assoc. prof., 92


MLRITM,JNTU-H
Schemas Elements - Definition Types

we can define XML schema elements in following ways:

 Simple Type

 Complex Type

 Global Types

05/02/23 PRASAD B, Assoc. prof., 93


MLRITM,JNTU-H
Simple Type

 Simple type element is used only in the context of the text.

 Some of predefined simple types are: xs:integer, xs:boolean,

xs:string, xs:date.

For example:

<xs:element name="phone_number" type="xs:int" />

05/02/23 PRASAD B, Assoc. prof., 94


MLRITM,JNTU-H
Complex Type

 A complex type is a container for other element definitions.

 This allows you to specify which child elements an element can

contain and to provide some structure within your XML

documents.

05/02/23 PRASAD B, Assoc. prof., 95


MLRITM,JNTU-H
In the example, Address element consists of child elements. This is a
container for other <xs:element> definitions, that allows to build a simple
hierarchy of elements in the XML document.

<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
<xs:element name="phone" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
05/02/23 PRASAD B, Assoc. prof., 96
MLRITM,JNTU-H
Global Types

 With global type, we can define a single type in your document,

which can be used by all other references.

 For example, suppose you want to generalize the person

and company for different addresses of the company. In such

case, you can define a general type as below:

05/02/23 PRASAD B, Assoc. prof., 97


MLRITM,JNTU-H
<xs:element name="AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string" />
<xs:element name="company" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>

05/02/23 PRASAD B, Assoc. prof., 98


MLRITM,JNTU-H
Instead of having to define the name and the company twice (once for Address1 and once
for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.

<xs:element name="Address1">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone1" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Address2">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType" />
<xs:element name="phone2" type="xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>

05/02/23 PRASAD B, Assoc. prof., 99


MLRITM,JNTU-H
Attributes

 Attributes in XSD provide extra information within an element.

 Attributes have name andtype property as shown below:

<xs:attribute name="x" type="y"/>

05/02/23 PRASAD B, Assoc. prof., 100


MLRITM,JNTU-H
Restrictions on elements

Refer below link:


http://www.w3schools.com/xml/schema_facets.asp

05/02/23 PRASAD B, Assoc. prof., 101


MLRITM,JNTU-H
XML Parsers

05/02/23 PRASAD B, Assoc. prof., 102


MLRITM,JNTU-H
What is XML Parser?

 It is a software library (or a package) that provides methods (or

interfaces) for client applications to work with XML documents

 It checks the well-formattedness

 It may validate the documents

 It does a lot of other detailed things so that a client is shielded

from that complexities

05/02/23 PRASAD B, Assoc. prof., 103


MLRITM,JNTU-H
Types of Parsers

 DOM: Document Object Model

 SAX: Simple API for XML

 A DOM parser implements DOM API

 A SAX parser implement SAX API

05/02/23 PRASAD B, Assoc. prof., 104


MLRITM,JNTU-H
Dom Parser - Parses the document by loading the complete

contents of the document and creating its complete hiearchical tree

in memory.

SAX Parser - Parses the document on event based triggers. Does

not load the complete document into the memory.

05/02/23 PRASAD B, Assoc. prof., 105


MLRITM,JNTU-H
DOM Parser

05/02/23 PRASAD B, Assoc. prof., 106


MLRITM,JNTU-H
DOM Parser

 A DOM document is an object containing all the information of

an XML document

 It is composed of a tree (DOM tree) of nodes , and various nodes

that are somehow associated with other nodes in the tree but

are not themselves part of the DOM tree

05/02/23 PRASAD B, Assoc. prof., 107


MLRITM,JNTU-H
Main features of DOM parsers

 A DOM parser creates an internal structure in memory which is a DOM

document object

 Client applications get the information of the original XML document

by invoking methods on this Document object or on other objects it

contains

 DOM parser is tree-based (or DOM obj-based)

 Client application seems to be pulling the data actively, from the data

flow point of view


05/02/23 PRASAD B, Assoc. prof., 108
MLRITM,JNTU-H
 Advantage:

 It is good when random access to widely separated

parts of a document is required

 It supports both read and write operations

 Disadvantage:

 It is memory inefficient

 It seems complicated, although not really


05/02/23 PRASAD B, Assoc. prof., 109
MLRITM,JNTU-H
Java DOM Parser - Steps

Following are the steps used while parsing a document using DOM
Parser.
1. Import XML-related packages.
2. Read name of XML document using command prompt.
3. Invoke the parser
4. Call the method

05/02/23 PRASAD B, Assoc. prof., 110


MLRITM,JNTU-H
Check the file name :

DOM and SAX Program

For the program

05/02/23 PRASAD B, Assoc. prof., 111


MLRITM,JNTU-H
1. Import XML-related packages.

05/02/23 PRASAD B, Assoc. prof., 112


MLRITM,JNTU-H
2. Read name of XML document using command
prompt.

05/02/23 PRASAD B, Assoc. prof., 113


MLRITM,JNTU-H
3. Invoke the parser

05/02/23 PRASAD B, Assoc. prof., 114


MLRITM,JNTU-H
4. Call the method

05/02/23 PRASAD B, Assoc. prof., 115


MLRITM,JNTU-H
File name : Parsing_DOMDemo.java:
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*; 1
public class Parsing_DOMDemo
{
static public void main(String[] arg)
catch (Exception e)
4
{
try
{ 2 {
System.out.println(file_name + "
System.out.print("Enter the name of XML document ");
isn't well-formed!");
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String file_name = input.readLine(); System.exit(1);
File fp = new File(file_name); } }
if(fp.exists())
{
try
3 else
{ {
System.out.print("File not found!");
DocumentBuilderFactory Factory_obj = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = Factory_obj.newDocumentBuilder();
} }
InputSource ip_src = new InputSource(file_name);
Document doc = builder.parse(ip_src); catch(IOException ex)
System.out.println(file_name + " is well-formed!"); {
} ex.printStackTrace();
}} }

05/02/23 PRASAD B, Assoc. prof., 116


MLRITM,JNTU-H
File name : dom.xml

<?xml version="1.0"?>
<student> <name>SHARAN</name>

05/02/23 PRASAD B, Assoc. prof., 117


MLRITM,JNTU-H
After Validating:
File name: dom1.xml

<?xml version="1.0"?>
<student>
<name>SHARAN</name>
</student>

05/02/23 PRASAD B, Assoc. prof., 118


MLRITM,JNTU-H
SAX Parser

05/02/23 PRASAD B, Assoc. prof., 119


MLRITM,JNTU-H
SAX Parser

 It does not first create any internal structure

 Client does not specify what methods to call

 Client just overrides the methods of the API and place his

own code inside there

 When the parser encounters start-tag, end-tag,etc., it thinks

of them as events

05/02/23 PRASAD B, Assoc. prof., 120


MLRITM,JNTU-H
 When such an event occurs, the handler automatically calls

back to a particular method overridden by the client, and

feeds as arguments the method what it sees

 SAX parser is event-based,it works like an event handler in

Java (e.g. MouseAdapter)

 Client application seems to be just receiving the data

inactively, from the data flow point of view


05/02/23 PRASAD B, Assoc. prof., 121
MLRITM,JNTU-H
 Advantage:
 It is simple
 It is memory efficient
 It works well in stream application
 Disadvantage:
 The data is broken into pieces and clients never
have all the information as a whole unless they
create their own data structure

05/02/23 PRASAD B, Assoc. prof., 122


MLRITM,JNTU-H
Java SAX Parser - Steps

Following are the steps used while parsing a document using DOM
Parser.
1. Import XML-related packages.
2. Read name of XML document using command prompt.
3. Invoke the XML reader parser
4. Call the method-parser

05/02/23 PRASAD B, Assoc. prof., 123


MLRITM,JNTU-H
Check the file name :

DOM and SAX Program

For the program

05/02/23 PRASAD B, Assoc. prof., 124


MLRITM,JNTU-H
1. Import XML-related packages.

05/02/23 PRASAD B, Assoc. prof., 125


MLRITM,JNTU-H
2. Read name of XML document using command
prompt.

05/02/23 PRASAD B, Assoc. prof., 126


MLRITM,JNTU-H
3. Invoke the XML reader parser

05/02/23 PRASAD B, Assoc. prof., 127


MLRITM,JNTU-H
4. Call the method - parser

05/02/23 PRASAD B, Assoc. prof., 128


MLRITM,JNTU-H
name : Parsing_SAXDemo.java:
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
1
public class Parsing_SAXDemo{
public static void main(String[] args) throws IOException
{
try {
System.out.print("Enter the name of XML document "); 2
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String file_name = input.readLine();
File fp = new File(file_name);
if (fp.exists())
{
try
{
XMLReader reader = XMLReaderFactory.createXMLReader(); 3
reader.parse(file_name); System.out.println(file_name + " is well-formed.");
}

4
catch (Exception e) {
System.out.println(file_name + " is not well-formed."); System.exit(1);
} }
else
{
System.out.println("File is not present: " + file_name);
}}
catch (IOException ex)
{
ex.printStackTrace();
}}}
05/02/23 PRASAD B, Assoc. prof., 129
MLRITM,JNTU-H
File name : sax.xml

<?xml version="1.0"?>
<student> <name>SHARAN</name>

05/02/23 PRASAD B, Assoc. prof., 130


MLRITM,JNTU-H
After Validating:
File name: sax1.xml

<?xml version="1.0"?>
<student>
<name>SHARAN</name>
</student>

05/02/23 PRASAD B, Assoc. prof., 131


MLRITM,JNTU-H

You might also like