You are on page 1of 32

Overview

XML
formatted
XML data
XSLT
Schema /
DTD

VALIDATING XML XSLT


PARSER PROCESSOR
Name Address Phone Number Email
Rajiv Agarwal C-56/1, Anusandhan Bhavan 91-120-2444711 rajiv@gmail.com
Institutional Area, Sector-62
Noida - 201307
Varun Mehta No. 12 Sudder Street 91-33-22521031 varun@gmail.com
Kolkata
West Bengal 700016
Rupali Iyer N. K. Palayam Road, 91-422-6547769 rupali@gmail.com
Coimbatore,
Tamil Nadu 641005
Meera Kakde 167, P. D'Mello Road 91-22-28738824 meera@gmail.com
Near C.S.T. Railway Station
Mumbai - 400038

<contacts>
<contact>
<name>Rajiv Agarwal</name>
<address>
C-56/1, Anusandhan Bhavan
Institutional Area, Sector-62
Noida - 201307
</address>
<phoneNumber>91-120-2444711</phoneNumber>
<email>rajiv@gmail.com</email>
</contact>
<contact>
<name>Varun Mehta</name>
<address>
No. 12 Sudder Street
Kolkata
West Bengal 700016
</address>
<phoneNumber>91-33-22521031</phoneNumber>
<email>varun@gmail.com</email>
</contact>



</contacts>
Hierarchical Representation
of
Data
<contacts>
<contact>
<name>Rajeev
<address>C-56
<phoneNumber>91
<email>rajiv
</contact>
</contacts>
XML

 XML looks like HTML – that’s where the similarity ends.


 Differences between XML and HTML
HTML XML
Purpose Used for formatting Used to structure,
and layout of web store and exchange
pages. data.
Tags Fixed. Make your own.
(eXtensible)
Rules Relaxed. Strict.

Advantages of XML
 Both machine and human readable.
 Supports Unicode. <नाम></नाम>
 Represents data hierarchically.
 Strict syntax rules make it easy to make parsers.
Disadvantages of XML
 Verbose
XML Syntax Rules

 All tags must be closed

This is legal in HTML but not in XML

<p>First paragraph
<p>Second paragraph

In XML

<p>First paragraph</p>
<p>Second paragraph</p>

If tag has no textual content then

<br />

 Tags are case-sensitive

This tag is incorrectly closed

<phoneNumber>123456</phonenumber>
 Tags must be properly nested

Properly Nested

<contact>
<name>Rajiv Agarwal</name>
</contact>
Improperly Nested
<contact>
<name>
Rajiv Agarwal
</contact>
</name>

 XML Documents must have a single root element.

<contacts>
<contact>...</contact>
<contact>...</contact>
</contacts>

 Attributes must be quoted.

<contacts>
<contact sex="F">...</contact>
</contacts>
 Comments

<!-- This is how comments are written -->

 CDATA Sections

<script language="javascript">
<![CDATA[
function min(a,b)
{
if (a < b) return a;
return b;
}
]]>
</script>

 Viewing XML Files


DTD - Document Type Definition
 Parties that exchange data in the form of XML documents
need to agree on the structure of the document.
 These parties create their own markup languages based
on XML.
 Real World Examples of XML based formats -

o NewsML - Allows news agencies to share


news items.
o MathML - Used to describe structures of
mathematical equations.
o SVG - Scalable Vector Graphics – Flash
like graphics using an XML
format.

 We need to describe our new XML based language to the


world (i.e. the grammar of our language).
 More importantly, when the world communicates with us
using our XML based language, we need to validate their
communication.
 Two main validation technologies – DTDs and XML
Schemas
Anatomy of XML File

Element Attribute

<contacts>
<contact sex=”M”>
<name></name>
<address></address>
<phoneNumber></phoneNumber>
<email>abcd@efg.com</email>
</contact>

Character Data

Modeling Contact Information

 Contacts
o Contact ( Zero or More)
 Sex (Attribute) - must be M or F
 Name
 First Name
 Middle Name (Optional)
 Last Name
 Address
 Street 1
 Street 2
 City
 State
 Pin
 Phone Numbers
 Phone Number (One or more)
 Emails
 Email (One or more)
Example XML

<contacts>
<contact sex="M">
<name>
<firstName>Rajiv</firstName>
<lastName>Agarwal</lastName>
</name>
<address>
<street1>C-56/1, Anusandhan Bhavan</street1>
<street2>Institutional Area, Sector-62</street2>
<city>Noida</city>
<state>Uttar Pradesh</state>
<pin>201307</pin>
</address>
<phoneNumbers>
<phoneNumber>91-120-2444711</phoneNumber>
</phoneNumbers>
<emails>
<email>rajiv@gmail.com</email>
</emails>
</contact>
</contacts>
Making the DTD

 Declaring Elements

Syntax :
<!ELEMENT element-name (element-contents)>

contacts Element (root element):


<!ELEMENT contacts (contact*)>

contact Element:
<!ELEMENT contact (name, address, phoneNumbers,
emails)>

name Element:
<!ELEMENT name (firstName, middleName?,
lastName)>

name Element:
<!ELEMENT address (street1, street2, city, state,
pin)>

firstName Element:
<!ELEMENT firstName (#PCDATA)>

phoneNumbers Element:
<!ELEMENT phoneNumbers (phoneNumber+)>
 Elements with any content (text or elements)

<!ELEMENT abcd ANY>

a. No brackets around ANY


b. Element abcd may contain both text and any tag
defined in DTD

 Elements with no content

<!ELEMENT abcd EMPTY>

a. No brackets around EMPTY.


b. All abcd tags in the document must contain no
other tags or text.

<abcd></abcd>
<abcd />
 Element should contain one of the following elements –

<!ELEMENT phoneNumbers (
(mobileNumber | landline)+
)>

<!ELEMENT mobileNumber (#PCDATA)>


<!ELEMENT landline (#PCDATA)>

<phoneNumbers>

<mobileNumber>
9087654321
</mobileNumber>

<landline>
1234567890
</landline>
<landline>
1234567891
</landline>

<landline>
1234567891
</landline>

</phoneNumbers>
 Declaring Attributes

<!ATTLIST element-name
attribute-name
attribute-type
default-value >

 Attribute that can contain any character data -

<contact
yahooID="rajiv_agarwal@yahoo.co.in">

<!ATTLIST contact
yahooID
CDATA
"not specified">

 Enumerations – Sex can be either M or F

<!ATTLIST contact sex (M|F) "M">


 Possible values for default-value –

Value Description Example


value The default value contact sex
(M|F) "M">
reported by parser if
attribute is not
present.
#REQUIRED The attribute is contact sex
(M|F)
required to be #REQUIRED>
specified every time.
#IMPLIED The attribute is contact
yahooID CDATA
optional. No default #IMPLIED>
value specified.
#FIXED value If the attribute is contact sex
present, its must be (M|F) #FIXED
"F">
equal to value. If it is
absent, the parser
reports value.
Validating an XML document against a DTD

Inline DTDs

<!DOCTYPE contacts [

<!ELEMENT contacts (contact*)>


<!ELEMENT contact (name,
address,
phoneNumbers,
emails)>

]>
<contacts>
<contact>
<name>

External DTDs

<!DOCTYPE contacts SYSTEM "contacts.dtd">


<contacts>
<contact>
<name>
<firstName>Rajiv</firstName>
<lastName>Agarwal</lastName>
</name>
<address>
Validating an XML document against a DTD

Using DHTML & JavaScript


function validateDocument(sPath)
{
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");

xmlDoc.validateOnParse="true";

if(!xmlDoc.load(sPath))
{
var msg = 'An error has occurred: \n';
msg += "After: " + xmlDoc.parseError.filepos + " bytes\n";
msg += "Line Number: " + xmlDoc.parseError.line + "\n";
msg += "Column Number: " + xmlDoc.parseError.linepos + "\n";
msg += "Column Number: " + xmlDoc.parseError.srcText + "\n";
msg += "Reason: " + xmlDoc.parseError.reason + "\n";

alert(msg);
}
else
{
alert('Document validated successfully');
}
}
Usage:

<input type="button"
onclick="validateDocument('file://c:/contacts.xml')"
value="Validate"/>
Using Visual Basic

Public Sub ValidateXML(ByVal strURL As String)

Dim xmlDoc As New DOMDocument

xmlDoc.validateOnParse = True

If Not xmlDoc.Load(strURL) Then

MsgBox(xmlDoc.parseError.reason)

End If

End Sub
Using Java

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

// We want a validating document builder


dbf.setValidating(true);

// Create a new document builder


DocumentBuilder db = dbf.newDocumentBuilder();

db.setErrorHandler(new ErrorHandler() {
public void error(SAXParseException e) throws SAXException {

System.out.println("An error has occured:"


+ "\nLine Number: " + e.getLineNumber()
+ "\nColumn Number: " + e.getColumnNumber()
+ "\nReason: " + e.getMessage()
);
}
// Also fatal error and warning
});

Document doc =db.parse(new File("c:\\contacts.xml"));

XML Namespaces
Need for namespaces –

FINANCIAL
CIRCUIT
APPLICATION
LED DESIGN
ACCOUNT

CUSTOMER BANK SWITCH

LOAN
FLIPFLOP
Original XML file – without namespaces

<?xml version="1.0"?>
<circuit>
<seriesConnection>
<resistor>15</resistor>
<resistor>20</resistor>
<bank type="LED" number="3"></bank>
</seriesConnection>
</circuit>

XML file – with namespaces

<?xml version="1.0"?>
<ckt:circuit xmlns:ckt="http://www.mec.com/ckt">
<ckt:seriesConnection>
<ckt:resistor>15</ckt:resistor>
<ckt:resistor>20</ckt:resistor>
<ckt:bank type="LED" number="3"></ckt:bank>
</ckt:seriesConnection>
</ckt:circuit>

XML file – with default namespace

<?xml version="1.0"?>
<circuit xmlns="http://www.mec.com/ckt">
<seriesConnection>
<resistor>15</resistor>
<resistor>20</resistor>
<bank type="LED" number="3"></bank>
</seriesConnection>
</circuit>

 We need to use xmlns:xyz when we are going to mix elements from two namespaces
– otherwise just use the default namespace attribute.
 Examples where we mix elements from separate namespaces – XML Schemas and
XSL.
 URL is only a unique name provided to the parser – it is not really accessed. Often,
the URL points to a webpage where the namespace is described.
 Logically, Schemas and DTDs define namespaces.
XML Schemas

A better method to validate XML documents – use XML schemas.

Why are they better than DTDs?

1. XML Schemas use XML as the underlying language (unlike DTDs which
have their own language)
o The editor to make a schema can be a normal XML editor (not a big
deal since even notepad can be used to edit XML files).
o The parser used to parse schema is an XML parser (a big deal
because we don’t need two types of parsers).
2. XML Schemas have better support for data types.
3. XML Schemas are object-oriented in their approach - They allow
restricting and extending existing data types to define our own.

schema contacts
element contact
restriction
name
pattern

firstName

Schema file - *.xsd (XML Schema Definition)

<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
targetNamespace=”http://www.abcdefgh.com”
xmlns="http://www.abcdefgh.com"
elementFormDefault="qualified" >


</xsd:schema>

XML File that uses the XSD file

<contacts xmlns="http://www.sunero.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sunero.com contacts.xsd">


</contacts>

Things that go inside the schema tag

 firstName element that contains a string containing the first name of the
contact

<xsd:element name="firstName"
type="xsd:string"
fixed="fixed value"
default="default value" />
Different data types supported

Name Values
string “this is a string”
boolean { true, false, 0, 1 }
decimal 3.14
float 314e-3, 314, 0, -0, +INF, -INF, NaN
double “
dateTime CCYY-MM-DD hh:mm:ss
CC ---- 00 to 99
YY ---- 00 to 99
MM ---- 01 to 12
DD ---- Depends on MM and YY
date CCYY-MM-DD
time hh:mm:ss
gYear, gYearMonth, gMonthDay, gDay, gMonth Part of dateTime – but may need extra
hyphens
integer
 long, int, short, byte
 unsignedLong, unsignedInt,
unsignedShort, unsignedByte
 positiveInteger, negativeInteger,
nonPositiveInteger, nonNegativeInteger
hexBinary, base64Binary, anyURI

Facets

Allow you to create new data types from existing ones, by placing
restrictions.

 sex element which can contain only text ‘M’ or ‘F’

<xsd:element name="sex">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="M" />
<xsd:enumeration value="F" />
</xsd:restriction>
</xsd:simpleType> </xsd:element>
 price element should contain maximum of 2 decimal places

<xsd:fractionDigits value="2" /> -- 3 3.1 3.14 3.145

 length of exam seat number should be exactly nine characters

<xsd:length value ="9" />

 length username should be AT LEAST six characters

<xsd:minLength value ="6" />

 Marks between 0 to 100 (inclusive)

<xsd:minInclusive value ="0" />


<xsd:maxInclusive value ="100" />
minInclusive, maxInclusive, minExclusive, maxExclusive

 Username that begins with alphabet, followed by at least 5 alphanumeric


or underscore characters (thus minimum length of username is 6)

<xsd:pattern value="[a-zA-Z][a-zA-Z0-9_]{5,}" />

 List containing atoms which are either single digits or single alphabets –
example for named types (vs. anonymous types), lists and unions.

<xsd:simpleType name="singleDigit">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]" />
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="singleAlphabet">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[a-z]" />
</xsd:restriction>
</xsd:simpleType>

<xsd:simpleType name="singleDigitOrAlphabet">
<xsd:union memberTypes="singleDigit singleAlphabet" />
</xsd:simpleType>

<xsd:element name="listOfItems">
<xsd:simpleType>
<xsd:list itemType="singleDigitOrAlphabet" />
</xsd:simpleType>
</xsd:element>

<listOfItems >1 2 a b 5 6</listOfItems>  Valid item

 name element that contains firstName, optional middleName and lastName

<xsd:element name="name">
<xsd:complexType>
<xsd:sequence>  Indicator

<xsd:element name="firstName" type="xsd:string" />

<xsd:element name="middleName" type="xsd:string"


minOccurs="0" maxOccurs="1" />

<xsd:element name="lastName" type="xsd:string" />


</xsd:sequence>
</xsd:complexType>
</xsd:element>

 User list with type attribute. Type may be “admin” or “guest”.

<users type="admin"> <users type="guest">


<user>A</user> <user>C</user>
<user>B</user> <user>D</user>
</users> </users>
<xsd:element name="users">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="user" type="xsd:string"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="type" use="required">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="admin" />
<xsd:enumeration value="guest" />
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>

1. Any number of attribute tags may follow after sequence – but they should be
inside complexType tag.
2. Use can take the values required, optional, prohibited .
3. xsd:attribute tag can also have ‘fixed’, ‘default’ and ‘type’ attributes like
xsd:element.

 length element that contains length up to 2 decimal places, with attribute


specifying the units – cm, in or ft.

<length units="cm">3.14</length>

1. Length element has simple data because it will contain values like 3.14
2. But elements with attributes are complex data
3. Its actually simple data ‘extended’ by some complex data
<xsd:element name="length">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:decimal">
<xsd:attribute name="units">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="cm" />
<xsd:enumeration value="in" />
<xsd:enumeration value="ft" />
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>

 Font tag may contain bold text

<font>This is <b>bold</b> text.</font>

<xsd:element name="font">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element name="b" type="xsd:string"
minOccurs="0" maxOccurs="unbounded"
/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

Indicators

sequence all Choice


All of the specified child All of the specified child One of the child elements
elements must occur in the elements must occur in any must occur.
same order as their order, but maximum once.
declaration.
XSLT (eXtensible StyleSheet Language Transformations)

 Allows us to convert one XML document (input) to another (output).


 Usually output XML is an HTML file that is displayed to the user.

An input XML file

<?xml-stylesheet type="text/xsl" href="contacts.xsl"?>


<contacts>
<contact>
<name>
<firstName>Rajeev</firstName>
<lastName>Agarwal</lastName>
</name>
</contact>
<contact>
<name>
<firstName>Varun</firstName>
<lastName>Mehta</lastName>
</name>
</contact>
<contact>
<name>
<firstName>Rupali</firstName>
<lastName>Iyer</lastName>
</name>
</contact>
<contact>
<name>
<firstName>Meera</firstName>
<lastName>Kakde</lastName>
</name>
</contact>
</contacts>
Desired Output

First Name Last Name


Rajeev Agarwal
Varun Mehta
Rupali Iyer
Meera Kakde

Basic XSL File (contacts.xsl)

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">

<html>
<head>
<style type="text/css">
.rowHeader
{
font-weight: bold;
color: white;
background-color: black;
}
</style>

</head>

<body>
<table border="1">
<tr class="rowHeader">
<th>First Name</th>
<th>Last Name</th>
</tr>
<tr>
<td>-</td>
<td>-</td>
</tr>
</table>
</body>
</html>

</xsl:template>

</xsl:stylesheet>
For-each & value-of

<xsl:for-each select="/contacts/contact">
<tr>
<td>
<xsl:value-of select="./name/firstName"/>
</td>
<td>
<xsl:value-of select="./name/lastName"/>
</td>
</tr>
</xsl:for-each>
If

<xsl:for-each select="/contacts/contact">
<xsl:if test="(position() mod 2) = 0">
<tr>
<td>
<xsl:value-of select="./name/firstName"/>
</td>
<td>
<xsl:value-of select="./name/lastName"/>
</td>
</tr>
</xsl:if>
<xsl:if test="position() mod 2 = 1">
<tr bgcolor="#EEEEEE">
<td>
<xsl:value-of select="./name/firstName"/>
</td>
<td>
<xsl:value-of select="./name/lastName"/>
</td>
</tr>
</xsl:if>
</xsl:for-each>

Choose

<xsl:choose>
<xsl:when test="expression">
... some output ...
</xsl:when>
<xsl:otherwise>
... some output ....
</xsl:otherwise>
</xsl:choose>

You might also like