You are on page 1of 49

Introduction to XML

by Nikita Bais

Table Of Contents
Markup Languages  What is XML ?  The Difference Between XML and HTML  How Can XML be Used?  XML Structure  XML Syntax  Valid vs Well Formed XML  Document Type Definition (DTD)

Markup Languages

Mark – up The term refers to the tagging electronic documents
◦ Modify look and formatting of documents (ex: bold & italic fonts, font sizes, text indents) ◦ Sets up structure of document and defines semantic meaning ◦ Example of documents uses markup HTML, RTF, SGML, XML

Markup Languages

Classification Of Markup Languages
◦ Specific Markup Language
 Generate code that is specific to particular application. Ex: HTML, RTF

◦ Generalized Markup Language
 Describe only structure not its formatting and syntax is strictly enforced. Ex: SGML, XML

XML Basics

What is XML?
◦ XML stands for EXtensible Markup Language ◦ XML was designed to carry data, not to display data ◦ XML tags are not predefined. Users can define their own tags. ◦ XML is designed to be self-descriptive ◦ XML is a W3C Recommendation ◦ XML documents can be validated using DTD

XML Basics

Difference Between XML and HTML
 

XML is not a replacement for HTML. XML is complement to HTML. XML and HTML were designed with different goals:
 XML was designed to transport and store data, with focus on what data is  HTML was designed to display data, with focus on how data looks

HTML is about displaying information, while XML is about carrying information.

XML Basics

How can XML be used?
 

XML Separates Data from HTML XML Simplifies Data Sharing XML Simplifies Data Transport XML is Used to Create New Internet Languages

XML Structure

XML document includes logical and physical structure Logical Structure Indicates how document is built as opposed to what document contains.

Physical Structure Content used in the document.

XML – Logical Structure


First structural element in an XML document which is optional.

Prolog consists of two basic components
The XML declaration (all in lower case) <?xml version=“1.0”>  The Document Type Declaration <!DOCTYPE filename>

XML – Logical Structure

Document Element  Follows prolog  Heart of XML document where the actual content resides

XML – Physical Structure
The physical structure of an XML document is composed of all the content used in document.  The data is stored in form of entities

Ex: Predefined entities in XML
Entity Reference Character

&lt; &gt; &amp; &quot; &apos;

< > & “ „

XML – Physical Structure
What is Entity?  Entities are storage unit  Each entity is identified by unique name  Entities are declared in DTD and are used anywhere in xml document.  Processor retrieves contents of the entity when referenced in the xml document

XML – Physical Structure

Entity declaration Syntax <!ENTITY entity-name "entity-value"> Example
DTD :<!ENTITY writer "Donald Duck."> XML Document :<author>&writer;</author>

(An entity has three parts: an ampersand (&), an entity name, and a semicolon (;). )

XML – Physical Structure
Internal and External Entities  Internal entities

◦ Require no separate storage ◦ Contents are provided in its declaration

Syntax <!ENTITY entity-name "entity-value"> Example <!ENTITY writer "Donald Duck.">

XML – Physical Structure

External Entities

Require separate storage ◦ Refers to a storage unit in its declaration by using SYSTEM or PUBLIC identifier

<!ENTITY entity-name SYSTEM "URI/URL">


XML – Physical Structure
In addition to SYSTEM identifier an entity can include PUBLIC identifier  PUBLIC identifier provides alternative way to retrieve content of an entity  PUBLIC identifier is useful when working with an entity that is publically available

Ex: <!ENTITY MyImage PUBLIC “-//Images//Text Standard Images//EN” “" NDATA GIF>

XML – Physical Structure

Parsed Entity
◦ An entity made up of parsable text(any text data) ◦ XML processor extract content of entity ◦ Content of entity appears at the location of the entity reference in XML document
Example: <!ENTITY writer "Donald Duck.">
Entity declaration “writer” that contains “Donald Duck”

Reference to the “writer” entity gets replaced with “Donald Duck”

XML – Physical Structure

Unparsed Entity
◦ An entity that cannot be parsed by XML processor ◦ An entity might or might not be text, if text it is not parsable text i.e. binary. ◦ An entity sometimes referred as binary entity as its content is often binary file (i.e. image) ◦ Requires notation, that identifies the format, or type, of resource to which entity is declared.

XML – Physical Structure

Entity Delcaration: <!ENTITY MyImage SYSTEM “sunset.gif" NDATA GIF> Notation Declaration: <!NOTATION GIF SYSTEM “//Utils/Gifview.exe”> (This Specifies that XML processor should use Gifview.exe to process entity of type GIF)

XML Syntax

Opening and Closing tags
XML requires that closing tag be used for every element Example:
<EMAIL> <TO>Ashish</TO> ……………. </EMAIL>

XML Syntax

◦ Shortcut for empty element (element containing no data) Example:
If “CC” element doesn‟t contained data,
it can be declared as: <CC></CC> OR <CC/>

XML Syntax

◦ Attributes provide a method of associating values to an element ◦ XML elements can have attributes in name/value pairs just like in HTML.

<EMAIL DATE=“14/02/2011”> </EMAIL>

Valid Vs Well Formed XML

Valid XML
◦ XML validated against a DTD is "Valid" XML ◦ Obeys all the validity constraints identified in XML specification
Example: Validity Constraint : Required Attribute If default declaration is the key #REQUIRED then attribute must be specified for all the elements of the type in attribute-list declaration.

Valid Vs Well Formed XML
<!ATTLIST element-name attribute-name attributetype #REQUIRED> DTD: <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML: <person />

Valid Vs Well Formed XML

Well formed XML
◦ XML document with correct XML syntax ◦ XML syntax rules
     XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must be quoted

Valid Vs Well Formed XML
Well Formed XML  Example

<?xml version="1.0" ?> <EMAIL> <TO>Ashish</TO> <CC>Rahul</CC> <SUBJECT>Meeting Reminder</SUBJECT> <BODY>Group Meeting at 4.00 PM</BODY> </EMAIL>

Valid Vs Well Formed XML

Benefits of well-formedness
◦ For the Client saves downloading time of DTD, if the xml document is validated against DTD by server. ◦ In cases where validation is not required, the focus is on the structure of document.
(Note: Valid documents = Well-formedness + satisfying all validity constraints)

Document Type Declaration

Document Classes
◦ Background of design of XML ◦ Relates to OOP ◦ Conceptual use of inheritance and polymorphism
◦ Example: Base class Book Book

Number Of Chapters
Cover Letter


Inheritance (Book and its subclasses)
NumberOfChapters CoverLetter

NumberOfChapters(Value 10) CoverLetter(Value Red) Recipe

NumberOfChapters(Value 21) CoverLetter(Value Blue) Recipe



CoverLetter(Value Blue, Pattern pt)
Class ArtBook overloads CoverLetter property of base class Book, it accepts color patterns in addition to the color values.


◦ Acts as a Rule Book that allows author to create new documents of same type and same characteristics as a base document ◦ Defines the building blocks of an XML document. ◦ Defines the document structure with a list of elements and attributes


Example: DTD created for medical community.
Documents created with DTD can contain Patient Name, Medical History, Medications and so on. This information can be easily read by any medical institution which supports XML based document system.


DTD structure
◦ Internal DTD (subset)
 DTD which is declared inside XML document
<!DOCTYPE root-element [element-declarations]>

◦ External DTD (subset)
 DTD declared in external file and that file is included in XML document
<!DOCTYPE root-element SYSTEM "filename"> (Note: If the document contains both type of DTD then internal subset takes precedence over external subset)

Internal DTD

In this example, EMAIL DTD is created in XML document itself.
<?xml version=“1.0” ?>

Interpretation of DTD
      

!DOCTYPE EMAIL defines that the root element of this document is EMAIL !ELEMENT EMAIL defines that the EMAIL element contains four elements: " TO, FROM, CC, SUBJECT, BODY " !ELEMENT TO defines the TO element to be of type "#PCDATA" !ELEMENT FROM defines the FROM element to be of type "#PCDATA" !ELEMENT CC defines the CC element to be of type "#PCDATA“ !ELEMENT SUBJECT defines the SUBJECT element to be of type "#PCDATA“ !ELEMENT BODY defines the BODY element to be of type "#PCDATA"

External DTD

In the following example, email.dtd file is separately created and referenced in XML document as “ email.dtd ”
<?xml version="1.0"?> <!DOCTYPE EMAIL SYSTEM “email.dtd"> <EMAIL> <TO></TO> <FROM></FROM> <CC></CC> <SUBJECT>My First DTD</SUBJECT> <BODY>Hello World</BODY> </EMAIL>

Here the file “email.dtd" will contain the EMAIL DTD.


The Building Blocks of XML Documents
From a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks: ◦ Elements ◦ Attributes ◦ Entities ◦ PCDATA ◦ CDATA


Element Declarations
Syntax: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> Empty Elements : Empty elements are declared with the category keyword EMPTY: <!ELEMENT element-name EMPTY> Example: <!ELEMENT br EMPTY> XML example: <br />


Elements with Parsed Character Data Elements with only parsed character data are declared with #PCDATA inside parentheses: <!ELEMENT element-name (#PCDATA)> Example: <!ELEMENT FROM (#PCDATA)>


Elements with any Contents Elements declared with the category keyword ANY, can contain any combination of parsable data: <!ELEMENT element-name ANY> Example: <!ELEMENT EMAIL ANY>


Elements with Children (sequences) Elements with one or more children are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> Example: <!ELEMENT EMAIL (TO, FROM, CC, SUBJECT, BODY)>
(NOTE : When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. )


Declaring Only One Occurrence of an Element <!ELEMENT element-name (child-name)> Example: <!ELEMENT EMAIL (BODY)> The example above declares that the child element “BODY" must occur once, and only once inside the “EMAIL" element.


Declaring Minimum One Occurrence of an Element <!ELEMENT element-name (child-name+)> Example: <!ELEMENT EMAIL (BODY+)> The + sign in the example above declares that the child element “BODY" must occur one or more times inside the “EMAIL" element.

Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)> Example: <!ELEMENT EMAIL (BODY*)> The * sign in the example above declares that the child element “BODY" can occur zero or more times inside the “EMAIL" element.


Declaring Zero or One Occurrences of an Element <!ELEMENT element-name (child-name?)> Example: <!ELEMENT EMAIL (BODY?)> The ? sign in the example above declares that the child element “BODY" can occur zero or one time inside the “EMAIL" element.


Declaring either/or Content Example: <!ELEMENT EMAIL(TO,FROM,CC,SUBJECT,(MESSAGE|BOD Y))> The example above declares that the “EMAIL" element must contain a “TO" element, a “FROM" element, a “CC" element, and either a “MESSAGE" or a “BODY" element.


Declaring Mixed Content Example: <!ELEMENT EMAIL (#PCDATA|TO|FROM|CC|SUBJECT|BODY)*> The example above declares that the “EMAIL" element can contain zero or more occurrences of parsed character data, “TO", “FROM", “CC", “SUBJECT” or “BODY" elements.


Declaring Attributes An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attributetype default-value>

DTD example: <!ATTLIST person number CDATA “0000">
XML example: <person number="5677" />

THANK YOU!!!!!!!