You are on page 1of 42

Understanding XML

Introduction to XML
What Is XML? XML is a text-based markup language that is fast becoming the standard for data interchange on the Web. As with HTML, you identify data using tags (identifiers enclosed in angle brackets, like this: <...>). Collectively, the tags are known as "markup".

Why Is XML Important?


Data Identification XML tells you what kind of data you have, not how to display it. Stylability When display is important, the stylesheet standard, XSL, lets you dictate how to portray the data. Inline Reusability One of the nicer aspects of XML documents is that they can be composed from separate entities. Linkability Thanks to HTML, the ability to define links between documents is now regarded as a necessity.

Why Is XML Important? (Contd..)


Easily Processed As mentioned earlier, regular and consistent notation makes it easier to build a program to process XML data. Hierarchical Finally, XML documents benefit from their hierarchical structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents Plain Text Since XML is not a binary format, you can create and edit files with anything from a standard text editor to a visual development environment

Why Is XML Important? (Contd..)


exchange data in a platform independent way. share data between programs without prior coordination. W3C standardized API. Based on Universal character set. web language specks xml

differences between XML & HTML


data presentation parser impl. predefined tags

XML Vocabularies
Scientific vocabularies (CML) business vocabularies (EDI) Legal vocabularies (Extensible form description language) medical vocabularies (HL7) computer vocabularies channel definition format (CDF) structured graph format (SGF) Bean markup language (BML) open s/w description (OSD) xml metadata interchange (XMI) call policy markup language

Here is an example of some XML data you might use for a messaging application:
<message> <to>you@yourAddress.com</to> <from>me@myAddress.com</from> <subject>XML Is Really Cool</subject> <text> How many ways is XML cool? Let me count the ways... </text> </message>

Tags and Attributes


Tags can also contain attributes <message to="you@yourAddress.com" from="me@myAddress.com" subject="XML Is Really Cool"> <text> How many ways is XML cool? Let me count the ways... </text> </message>

Empty Tags
One really big difference between XML and HTML is that an XML document is always constrained to be well formed. Sometimes, though, it makes sense to have a tag that stands by itself. For example, you might want to add a "flag" tag that marks message as important. <message to="you@yourAddress.com" from="me@myAddress.com" subject="XML Is Really Cool"> <flag/> <text> How many ways is XML cool? Let me count the ways... </text> </message>

Comments in XML Files


XML comments look just like HTML comments: <message to="you@yourAddress.com" from="me@myAddress.com" subject="XML Is Really Cool"> <!-- This is a comment --> <text> How many ways is XML cool? Let me count the ways... </text> </message>

The XML Prolog


<?xml version="1.0" encoding="ISO-88591" standalone="yes"?>

Processing Instructions
An XML file can also contain processing instructions that give commands or information to an application that is processing the XML data. Processing instructions have the following format: <?target instructions?>

How Can You Use XML?


Traditional data processing, where XML encodes the data for a program to process Document-driven programming, where XML documents are containers that build interfaces and applications from existing components Archiving--the foundation for document-driven programming, where the customized version of a component is saved (archived) so it can be used later Binding, where the DTD or schema that defines an XML data structure is used to automatically generate a significant portion of the application that will eventually process that data

Basic Standards
SAX Simple API for XML DOM Document Object Model DTD Document Type Definition Namespaces The namespace standard lets you write an XML document that uses two or more sets of XML tags XSL Extensible Stylesheet Language XSLT (+XPATH) Extensible Stylesheet Language for Transformations

Schema Standards
A DTD makes it possible to validate the structure of relatively simple XML documents, but that's as far as it goes.
DTD

Validation

XML

Document Type Definition (DTD)


Data sent along with a DTD is known as valid XML.
In this case, an XML parser could check incoming data against the rules defined in the DTD to make sure the data was structured correctly.

Data sent without a DTD is known as wellformed XML.


Here an XML-based document instance, such as the hierarchically structured weather data shown, can be used to implicitly describe itself.

DTD for our Simple XML


A DTD consists of a left square bracket character ([) followed by a series of markup declarations, followed by a right square bracket character (]). <?xml version="1.0" standalone="yes" ?> <!DOCTYPE SIMPLE [ <!ELEMENT Simple ANY> ] > <Simple> This is the most simplest XML document </Simple>

DTC Declarations
Element type declarations Attribute-list declarations Entity declarations Notation declarations Processing declarations Comments Parameter entity references

Element Type <!ELEMENT Name Contentspec>


<!ELEMENT Title (#PCDATA)>
Title permitted ti have only char data

<!ELEMENT General ANY>


You can use any key word which is legal under General Root tag

<!ELEMENT Image EMPTY>


Element must be empty. It can not have anything

<!ELEMENT Book (Title, Author, Publisher)


Book element must have the 3 elements in the same order

<!ELEMENT Prerequisite ( BE | ME | MS)


Prerequisite element can have either only one of the above

<!ELEMENT Candidate (Qualification+, XMLExposure?, OtherSkills*)


Qualification could be one or more, XMLExpousure is optional and OtherSkills could be zero or more

Element Type (Contd)


order operator , |
Table 2-2 DTD Element Qualifiers Qualifier Name ? * + Question Mark Asterisk Plus Sign

meaning strict sequence choice


Meaning

Optional (zero or one) Zero or more One or more

A Simple Example
<!Element foo (A,(B,C))> <!Element foo (A,B?,C)> <!Element foo (A?,(B,C)| D),E?)> <!Element foo ((A,B)|(C,D))> <!ELEMENT foo (A,(B,C)*,D+)>

<?xml version="1.0" standalone="yes" ?> <!DOCTYPE COLLECTION [ <!ELEMENT Collection (CD)+> <!ELEMENT CD (#PCDATA)> ] > <Collection> <CD>Devotional Songs by Pankaj</CD> <CD>Kajal by Pankaj</CD> <CD>Classical Songs by Pankaj</CD> </Collection>

Attribute List
In a valid XML document you must also explicitly declare all attributes that you might intend to use with the documents elements. You define this by using a type of DTD markup known as an attribute-list declaration. This declaration does the following Defines the names of the attributes associated with that element Specifies the data type of each attribute Specifies for each attribute whether that attribute is required.

Attribute List (Contd)


Attibute-list declaration has the following form: <!ATTLIST Name AttDefs> Name - is the type name of the element associated with the attribute AttDefs - is the attribute definition that defines one atribute

Attribute List (Contd)


attribute default: #REQUIRED #IMPLIED #FIXED Attribute type CDATA ID IDREF IDREFS NOTATIONS

eg.(IDREF)
<Employees> <employee ID="1001"> <ename>Ramesh</ename> <job>Clerk</job> <hiredate>12-4-1999</hiredate> <mgr IDREF="1002"/> <sal>7000/-</sal> <comm>300/-</comm> <deptno>10</deptno> </employee> <employee ID="1002"> <ename>Mohan</ename> <job>Manager</job> <hiredate>12-4-1997</hiredate> <mgr IDREF="1002"/> <sal>12000/-</sal>

NOTATION Example DTD


<!NOTATION jpg SYSTEM "jpgviewer.exe"> <!NOTATION gif SYSTEM "gifviewer.exe"> <!ENTITY photo SYSTEM "file.gif" NDATA gif> <!ELEMENT employees (emp)*> <!ENTITY val "sata"> <!ENTITY % entname "age CDATA #IMPLIED weight CDATA #IMPLIED height CDATA #REQUIRED"> <!ELEMENT emp (empno,empname)> <!ELEMENT empno (#PCDATA)> <!ELEMENT empname EMPTY> <!ATTLIST emp %entname; attr CDATA #REQUIRED> <!ATTLIST empname first CDATA "default"> <!ATTLIST empname last CDATA #IMPLIED>

NOTATION Example XML


//xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE employees SYSTEM "cpc.dtd"> <employees> <emp age="25" weight="66" height="5.8"> <empno>1&lt;2</empno> <empname first= "1&lt;2" last='reddy'/> </emp> <emp age="25" weight="66" height="5.8"> <empno>2</empno> <empname first="" last="naslay"/> </emp> </employees>

Attribute List (Contd)


<!ATTLIST Film Class CDATA >
Simple form of defining an attribute. Attribute contains characters

<!ATTLIST Film Year CDATA #REQUIRED>


you must specify an attribute value

<!ATTLIST Film Color CDATA #IMPLIED>


You can either include or omit the attribute, no default value supplied

<!ATTLIST Film Language CDATA #FIXED "Hindi>


You can either include or omit. If you omit, the processor will use a special default value. If you include, you must specify.

<?xml version="1.0" standalone="yes" ?> <!DOCTYPE VideoLibrary [ <!ELEMENT Film (Title, Class, (Hero | Director | Heroine)) +> <!ATTLIST Film Color CDATA #IMPLIED Language CDATA #FIXED "Hindi" Year CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Class (#PCDATA)> <!ELEMENT Hero (#PCDATA)> <!ELEMENT Heroine (#PCDATA)> <!ELEMENT Director (#PCDATA)> ] > <VideoLibrary> <Film Year = "1994">Hum Aapke Hain Kaun</Film> <Class>Love Story</Class> <Heroine>Madhuri Dixit</Heroine> </VideoLibrary>

Using an External DTD


The XML File MyExternDTD.XML <?xml version="1.0" standalone="yes" ?> <!DOCTYPE VideoLibrary SYSTEM "FirstDTD4.dtd"> <VideoLibrary> <Film Year = "1994">Hum Aapke Hain Kaun</Film> <Class>Love Story</Class> <Heroine>Madhuri Dixit</Heroine> </VideoLibrary>

Using an External DTD (Contd..)


Document Definition file - MyExternDTD.dtd <!ELEMENT Film (Title, Class, (Hero | Director | Heroine))*> <!ATTLIST Film Color CDATA #IMPLIED Language CDATA #FIXED "Hindi" Year CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Class (#PCDATA)> <!ELEMENT Hero (#PCDATA)> <!ELEMENT Heroine (#PCDATA)> <!ELEMENT Director (#PCDATA)>

entity
predefined entities character < > & ' " entityref &lt; &gt; &amp; &apos; &quot;

general entities
DTD Syntax
<!ENTITY ent-name "replacement text">

XML Usage
&ent-name;

external entities
<!ENTITY ent-name SYSTEM "URL"> <!ENTITY ent-name SYSTEM "file.gif" NDATA gif>

parameter entity
<!ENTITY % ent-name "age CDATA #IMPLIED weight CDATA #IMPLIED height CDATA #REQUIRED > using parameter entity <!ATTLIST ele-name %ent-name attr-name CDATA #REQUIRED>

parameter entity (Contd)


Example : <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT employees (emp)*> <!ENTITY val "sata"> <!ENTITY % entname "age CDATA #IMPLIED weight CDATA #IMPLIED height CDATA #REQUIRED"> <!ELEMENT emp (empno,empname)> <!ELEMENT empno (#PCDATA)> <!ELEMENT empname EMPTY> <!ATTLIST emp %entname; attr CDATA #REQUIRED> <!ATTLIST empname first CDATA "default"> <!ATTLIST empname last CDATA #IMPLIED>

parameter entity (Contd)


//xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE employees SYSTEM "cpc.dtd"> <employees> <emp age="25" weight="66" height="5.8"> <empno>1&lt;2</empno> <empname first= "1&lt;2" last='reddy'/> </emp> <emp age="25" weight="66" height="5.8"> <empno>2</empno> <empname first="" last="naslay"/> </emp> </employees>

conditional section
<![INCLUDE [<!element ele-name (#PCDATA)> <!ATTLIST ele-name attr-name CDATA #REQUIRED> ]]> <![IGNORE [<!element ele-name (#PCDATA)> <!ATTLIST ele-name attr-name CDATA #REQUIRED> ]]>

conditional section (Contd)


<?xml version="1.0" encoding="UTF-8"?> <![INCLUDE [<!ELEMENT employees (emp)*> <!ENTITY val "sata"> <!ENTITY % entname "age CDATA #IMPLIED weight CDATA #IMPLIED height CDATA #REQUIRED"> <!ELEMENT emp (empno,empname)> <!ELEMENT empno (#PCDATA)> <!ELEMENT empname EMPTY> <!ATTLIST emp %entname; attr CDATA #REQUIRED> <!ATTLIST empname first CDATA "default"> <!ATTLIST empname last CDATA #IMPLIED> ]]> <![IGNORE [<!ELEMENT emps (#PCDATA)> <!ATTLIST emps eno CDATA #REQUIRED> ]]>

Exercises - ????
DTD for Visiting Card - Is it OK?
<!DOCTYPE VisitngCard [ <!ELEMENT VisitingCard (Name+, Designation+, Group+, Address) > <!ELEMENT Name (#PCDATA) > <!ELEMENT Designation (#PCDATA) > <!ELEMENT Gruoup(#PCDATA) > <!ELEMENT Address(#PCDATA) > ]>