You are on page 1of 10

http://www.tnbedcsvips.

in/trb-study-materials/
XML Introduction
1. XML Stands for EXtensible Mark-up Language (XML).
2. SGML Electronic Publishing challenges -1986
3. HTML Web Presentation challenges -1991
4. XML Data Representation challenges -1996
5. W3C- World Wide Web Consortium (W3C)
6. CSS- Cascading Style Sheets (CSS)
7. XML is the Universal format for Structuring Documents and Data on the Web.
8. Actually Extensible Mark-up Language is not a mark-up language, but a set of rules for
creating a new mark-up language.
9. XML is a Subset of Standard Generalized Mark-up Language (SGML).
10. SGML specifies the Rules for Creating Mark-up Languages.
11. The XML 1.0 was made a World Wide Web Consortium (W3C) recommendation on
February 10, 1998.
XML is Replacement of HTML ?
1. One thing we must keep in our mind that XML is not a replacement for HTML. We can
treat it as complementary language for HTML
2. XML is used to represent the data contained in the tags. HTML is used to represent web
pages
3. XML is used to carry information. More clear answer of this question is explained
beautifully
4. In order to write XML file we must have text editor.
5. Our recommendation is to use Notepad++ for Editing and Writing XML files.
6. Save this Document with .xml extension
Difference between XML and HTML :

HTML XML

HTML focuses on – “How data looks?” XML focuses on – “What data is ?”

HTML is used to Display Information. XML is used to Describe Information.

HTML Tags are Predefined XML Tags can be user Defined.

XML is designed to Carry data or to Create new


HTML is used to Design Web pages.
Scripting Language.

HTML is Case Insensitive XML is case Sensitive.

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/

HTML XML

HTML does not preserve white space XML preserves White Space

HTML is not Strict. XML is String as Strict.

XML neither a programming language nor a


HTML is presentation language
presentation language

HTML XML

Markup language defines a set of rules for


Markup language for displaying web pages
encoding documents that can be read by both
Definition in a web browser. Designed to display data
humans and machines. Designed with focus
with focus on how the data looks
on storing and transporting data.

Date when
1990 1996
invented

Extended from SGML SGML

Type Static Dynamic

Transport data between the application and


Usage Display a web page the database. To develop other mark up
languages.

No strict rules. Browser will still generate Strict rules must be followed or processor
Processing/Rules
data to the best of its ability will terminate processing the file

Language type Presentation Neither presentation, nor programming

Tags Predefined Custom tags can be created by the author

White Space Cannot preserve white space Preserves white space

Cannot be used as a subtype of a sql_variant


Data does not know itself very well. Data instance.
cannot change in response to environment. Does not support casting or converting to
Data cannot be easily maintained. Cannot either text or ntext. Does not support the
store or call on variables. Lacks the following column and table constraints.
Limitations
capability to define new structures by XML provides its own encoding. Collations
defining relationships between classes. apply to string types only. Cannot be
Tags are not useful for exchanging the compared or sorted. Cannot be used in
document between applications. Distributed Partitioned Views. Not well
supported by browsers.

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
How to Parse XML Document ?
1. An XML processor is more commonly called a parser.
2. XML Parser parses XML and provides needed information to the application.
3. XML Parser reads document characters by character and determines which
characters are part of the document’s markup and which are part of the
document’s data.
4. XML Parser does all processing of XML document before an application can make use
of it.
5. In market there are many Parser available from different Vendors.
Some Well Known XML Parsers :
1. Microsoft Internet Explorer Parser
 Microsoft’s XML parser is known as MSXML.
 MSXML first shipped with Internet Explorer 4.
 The latest version of the parser is available for download from Microsoft’s MSDN site.
 Comes built-in with the Internet Explorer browser.
2.Apache Xerces
 The Apache Software Foundation’s Xerces subproject of the Apache XML Project has
resulted in XML parsers in Java and C++, plus a Perl wrapper for the C++ parser.
 These tools are free, and the distribution of the code is controlled by the GNU Public
License (GPL).
3. James Clark’s Expat
 Completely Written in C.
 It is free for both private and commercial use
 Can be Downloaded From Here.
4.Xml4j
 Created and Owned By IBM.
 Completely written in Java.
 Available for free.
Different XML Editors :
 XML Notepad
 XML Cooktop
 XML Pro
 XML Spy
 Liquid XML Studio http://www.tnbedcsvips.in/trb-study-materials/
V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
Features of XML Editor :
1. Easy XML Syntax Highlighting.
2. Drag and Drop Tags.
3. More Readable XML Document.
4. Increases Performance Time of Writing XML Document.
5. Can Create Well Formed XML Document.
XML Declaration :
<?xmlversion="1.0"encoding="UTF-8"standalone="no"?>
1. XML Declaration is Optional.
2. XML Declaration must be First Line in XML Document if we write Declaration.
3. XML Declaration tells that Document Written is in XML.
4. XML Declaration tells XML Version used to Write Document.
5. XML Declaration tells Encoding Style Used to Encode XML Document.
6. If XML Document is standalone i.e if it does not depends on other external
document then we need to specify standalone=”yes”.
7. W3C recommends to include XML Declaration.
Document Type Definition (DTD)
<!DOCTYPE document system "Person.dtd">
1. Document Type definition is used to Define XML Document.
2. DTD is used when you Validate your XML document.
3. DTD can be Internal or External.
4. DTD rule tells which Element is allowed to nest inside Other Element.
Comment
1. Comments are Optional part of XML Document.
2. Comments in XML are similar to HTML . <!– and –>
3. Content Written inside Comment is ignored by Parser. (Comment part is not parsed by
Parser)
4. Comments can appear anywhere inside XML Document.
Styling and Processing Instruction
<?xml-stylesheettype="text/css"href="Styles.css"?>
1. Processing Instructions begin with <? and ends with ?>
2. Processing Instructions are instructions for the XML processor.
3. Processing instructions are processor dependant so not all processors understand all
processing instructions.

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
White Space
1. White Space can be created using Carriage Return , Line Feed and Tab.
2. White Space cannot affect Parsing of Document.
3. User is Free to Use White Space anywhere inside document.
4. XML recommendation specifies that XML documents use the UNIX convention for
line endings.
5. It means that you should use a linefeed character only (ASCII code 10) to indicate the
end of a line.
Note:-

XML Declaration- <?xml version="1.0" encoding="UTF-8" standalone="no"?>

Document Type Definition (DTD)- <!doctype document system "Person.dtd">

Comment- <!-- Here is a comment -->

Processing Instructions-<?xml-stylesheet type="text/css" href="Styles.css"?>

Root Node-<Student>
Sub Nodes-<Boy> and <Girl>
Sub-Sub Nodes-<name> and <marks>

XML Tree Structure:

Ex:
<Student>
<Boy name="Pritesh" marks="90"></Boy>
<Boy name="Pooja" marks="89"></Boy>
</Student>

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
Root Element :
1. Each XML Document must have One and Only One Root Element.
2. Other XML elements must be Nested inside Root Element.
3. Opening Tag of Root Element is the Opening Tag of Document.
4. Closing Tag of Root Element is the Closing Tag of Document.
Some Facts:
1. XML is Organized as Tree Structure.
2. XML can have User Defined Tags.
3. XML consists of any number of nodes.
Elements & Content
Root element opening tag- <Person>

Child elements and content-


<Student>
<Boy>
<name>Pritesh</name>
<marks>90</marks>
</Boy>
<Girl>
<name>Pooja</name>
<marks>89</marks>
</Girl>
</Student>
Root element closing tag- </Person>
XML Syntax Rules : Different XML Parsing Rules
1. We have already discussed in our previous chapter about difference between XML and
HTML.
2. HTML tags are not case sensitive. XML is Case Sensitive.
3. XML have some syntax rules that are somewhat simple and logical.
XML Rules are :
1. XML Document must have Exactly One Root Element.
2. XML Tags Must be Closed
3. XML Tags are Case Sensitive.
4. XML Tags Must be Properly Nested.
5. XML Attribute Values Must be Quoted.
XML Syntax Rules are : –
XML Document Must have One and Only One Root Element
 XML Document Consists of Elements Arranged Like Tree.

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
 XML Document Must have One and Only One Root Element.
 In HTML many elements don’t have closing tags.
 XML Element must have Closing Tag.
 It is illegal to have non-closing tag in XML.
 All XML Tags are Case Sensitive.
 Opening Tag must have Closing tag with Same Case and Spelling.
 Opening Tag Must have Same Spelling to that of Closing Tag.
 Opening Tag and Closing Tags must have same Case.
 HTML does not care improperly nested element. It does not show any error.
 XML does Strict Checking of Elements.
 It will give parsing error if Parser finds any Improper nesting of Elements.
What is XML Attribute ?
1. XML Attributes are just like “HTML Attributes“.
2. “XML Attributes” provides additional information about “XML Element“.
3. Attributes Consists of name/value pairs associated with an element.
4. Attributes are attached to the start-tag, but not to the end-tag
“Well Formed XML Document – ”

Rule No Explanation

Rule No 1 : XML Document must have Single Root Element

Rule No 2 : XML Elements must have a Closing Tag

Rule No 3 : XML Tags are Case Sensitive

Rule No 4 : XML Elements must be properly nested

Rule No 5 : XML Attribute Value must be quoted

XML DTD Introduction:


1. DTD stands for Document Type Definition (DTD)
2. DTD is used to define structure of a document encoded in XML
3. DTD is used to define legal building blocks of an XML document.
4. DTD provids us list of all legal elements and attributes defined in the XML file
5. DTD is declared inside the XML document or it can be defined as an external dtd file.

http://www.tnbedcsvips.in/trb-study-materials/
V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
Main XML DTD Building Blocks :

Building Block Explanation

Elements Main nodes and main building block

Attributes Extra information about element.

Entities Special characters in XML

PCDATA parsed character data

CDATA character data

XML Elements :
Ex:
<Book pages="100">
<name>Learn XML</name>
<author>Pritesh</author>
<type>Scripting</type>
</Book>

Element Name of the Element

Root Element Book

Child Elements name,author,type

XML Entities :

Entity References Character

&lt; <

&gt; >

&amp; &

&quot; “

&apos; ‘

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
PCDATA / CDATA :

PCDATA CDATA

PCDATA means parsed character data. CDATA means character data.

Generally used for Elements Generally used for Attributes

PCDATA is text that will be parsed by CDATA is text that will not be parsed by a
a parser parser

Tags inside the text will be treated as Tags inside the text will NOT be treated as
markup and entities will be expanded markup and entities will not be expanded

DTD Elements :
DTD elements are declared with an ELEMENT declaration.
<!ELEMENT element-name keyword> OR
<!ELEMENT element-name (element-content)>
Different ways of using DTD Element :
We can use DTD element like below syntax –

DTD Element Usage

Empty Elements Using EMPTY keyword

Elements with Parsed Character Data Using #PCDATA

Elements with any Contents Using ANY Keyword

Elements with childres Using list wrapped inside ()

Element with Only One Occurrence Using element name only

Element with Minimum One Occurrence Using + Operator

Element with Zero or More Occurrences Using * Operator

Element with Zero or One Occurrences Using ? Operator

Element with either or Using | Operator

Empty Elements :
EMPTY Keyword is used to specify Empty elements. Please look at following syntax –
<!ELEMENT element-name EMPTY>

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com
http://www.tnbedcsvips.in/trb-study-materials/
Example :

DTD Example XML Example

<!ELEMENT br EMPTY>
<br />
<!ELEMENT hr EMPTY>
<hr />

Internal DTD Declaration :

DTD Element Explanation

!DOCTYPE
defines that the root element of this document is Book
bookDocument

Tells that the Book element contains four elements :


!ELEMENT Book
"name,author,type,pages"

!ELEMENT name Tells that name element is of type "#PCDATA"

!ELEMENT author Tells that author element is of type "#PCDATA"

!ELEMENT type Tells that type element is of type "#PCDATA"

!ELEMENT pages Tells that pages element is of type "#PCDATA"


External DTD Declaration :
If we wrote a DTD file then we must wrap it in DOCTYPE definition with the following syntax

<!DOCTYPE root-element SYSTEM "filename">

V.MANIKANDAN. M.Sc.,B.Ed.,M.Phil.,CCNA
Paavai Engineering College,Namakkal-18. E_Mail ID: vmaniapt@Gmail.com

You might also like