You are on page 1of 67

EXtensible Markup Language(XML)

U. K. Roy

An HTML system
HTML document Web Server

Internet

Web Client
Parser, formatter, interface

U.K.R., 2008

Role of HTML

HTML

Designed to display data Focuses on appearance Has a fixed set of predefined tags Ambiguity

U.K.R., 2008

XML

Role of XML

EXtensible Markup Language W3C recommendation, 1998 Designed to structure, transport and store data Transformation and Dynamic data customization Interoperable way to represent and process documents (not necessarily on web) Self descriptive

U.K.R., 2008

XML

Example
<note> <to>John</to> <from>Ani</from> <heading>Reminder</heading> <body>Return my book on Monday</body> </note>

U.K.R., 2008

XML

Another Example
<song> <title>Requiem</title> <composer>Mozart</composer> </song>

Equivalent HTML code:


<p>Requiem is a song composed by Mozart</p>
U.K.R., 2008 XML 6

Role of XML

Not a replacement of HTML

XML focuses on what data are HTML focuses on how data look Functional meaning depends on application

Tags are custom defined (not predefined)

Everything must be marked up correctly

U.K.R., 2008

XML

XML and Databases

XML brings benefits of DBs to documents

Schema to model information directly Formal validation, locking, versioning, rollback... Not all traditional database concepts map cleanly, because documents are fundamentally different in some ways

But

U.K.R., 2008

XML Building blocks


Element Delimited by angular brackets Identifies the nature of the content it surrounds General format: <element> </element> Empty element: <empty-element/> Attribute Name-value pairs that occur inside start-tags after element name, like: <element attribute=value>
U.K.R., 2008 XML 9

XML Building blocks--Prolog

The part of an XML document that precedes the XML data Includes

A declaration: version [, encoding, standalone]


<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

An optional DTD (Document Type Definition )


<!DOCTYPE greeting SYSTEM "hello.dtd">

Processing Instructions (Optional)


<?xml-stylesheet href="simple.xsl" type="text/xsl"?>

U.K.R., 2008

XML

10

XML Elements

XML Elements are Extensible More and more elements may be added to carry more information XML Elements have Relationships Elements are related as parents and children Elements have Content Elements can have different types of content:

empty content simple content element content mixed content attributes

XML elements must follow the naming rules


XML 11

U.K.R., 2008

XML Elements naming rules

Names can only contain letters, digits and some other special characters. Names can not start with a number or punctuation marks Names must not contain the string xml, XML or Xml Names can not contain while space(s).

U.K.R., 2008

XML

12

Anatomy of an element
Element type

Attribute Attribute Attribute name value

(character) entity reference

<p type="rule">Use a hyphen: &#173;.</p>


Start-tag
Element Content

End-tag

U.K.R., 2008

Element type

The Basic Rules

XML is case sensitive


<Message>This is incorrect</message> <message>This is correct</message>

U.K.R., 2008

XML

14

The Basic Rules

All start tags must have end tags


<composer>Mozart <composer>Mozart</composer>

Empty Element
<BR></BR> <BR/> <img align=center src=logo.gif/> <composer name=Mozart></composer> <composer name=Mozart/>

U.K.R., 2008

XML

15

The Basic Rules

Elements must be properly nested


<b><i>This is incorrect nesting</b></i> <b><i>This is correct nesting</i></b>

U.K.R., 2008

XML

16

The Basic Rules

XML declaration must be the first statement

<?xml version="1.0" encoding="ISO8859-1" stAandalone="yes"?>

U.K.R., 2008

XML

17

The Basic Rules

Every document must contain a root element

<root> <child> <subchild>.....</subchild> </child> </root>

U.K.R., 2008

XML

18

The Basic Rules

Attribute values must be quoted with inverted commas

<note date="12/11/2007"> <to>Ani</to> <from>John</from> </note>

U.K.R., 2008

XML

19

The Basic Rules

Certain characters are reserved for parsing

<message>if salary < 1000 then</message> <message>if salary &lt; 1000 then</message>

U.K.R., 2008

XML

20

Predefined entities
&lt; &gt; &amp; &apos; &quot; < > & ' " less than greater than &ampersand apostrophe quotation mark

U.K.R., 2008

XML

21

The Basic Rules


With XML, white space is preserved With XML, a new line is always stored as LF Comments in XML: <!-- This is a comment -->

Can go almost anywhere (not inside tags)

Schemas can contain comments, too

U.K.R., 2008

XML

22

Common Errors for Element Naming

Do not use white space when creating names for elements Element names cannot begin with a digit, although names can contain digits Only certain punctuation allowed periods, colons, and hyphens

U.K.R., 2008

XML

23

XML Attributes

Located in the start tag of elements Provide additional information about elements Often provide information that is not a part of data Must be enclosed in quotes Should I use an element or an attribute?

metadata (data about data) should be stored as attributes, and that data itself should be stored as elements
XML 24

U.K.R., 2008

Types of XML Documents

XML document

Well Formed XML.

Syntax is correct
Well formed Validated against a DTD/Schema

Valid XML.

U.K.R., 2008

XML

25

Well-Formed XML

Properties

Documents must have a root element Elements must have a closing tag Elements must be properly nested Attribute values must be quoted

Advantage

Avoids fixed nature like HTML Flexible Expandable


XML 26

U.K.R., 2008

Valid XML

Properties

Well Formed Comply with the rules defined in a DTD/Schema

Advantage

Clear Understanding Data verification Interoperability Better document processing

U.K.R., 2008

XML

27

XML Validation
XML document

XML schema

XML Parser

Optimized XML document

Error messages

xmllint --valid sample.xml

U.K.R., 2008

XML

28

Dislaying XML
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="books.xsl"?> <bookstore> <book category="literature"> <title lang="beng">Sanchoita</title> <author>Rabindranath Tagore</author> <year>2009</year> <price>200.00</price> </book> </bookstore>
U.K.R., 2008 XML 29

Document Type Definition

Allows developers to create a set of rules to specify legal content and place restrictions on an XML file Parser generates error, if XML document does not follow the rules contained within DTD Including a DTD

Using internal declaration Using external file Both

U.K.R., 2008

XML

30

Internal (standalone) DTD


For custom documents Uses DOCTYPE declaration


<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>

Specify in XML declaration


<?xml version="1.0" standalone="yes"?>
XML 31

U.K.R., 2008

External DTD

Most common Use DOCTYPE declaration before root element


<!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting>

U.K.R., 2008

XML

32

External plus Internal DTD


Usually to declare entities Use DOCTYPE declaration before root element


<!DOCTYPE greeting SYSTEM "hello.dtd" [ <!ENTITY excl "&#x21;"> ]> <greeting>Hello, world&excl;</greeting>

U.K.R., 2008

XML

33

DTD XML Building Blocks

XML documents consist of following blocks

Elements Attributes Entities

&lt;

&gt;

&amp;

&quot;

&apos

PCDATA

Parsed Character DATA Entities will be expanded Character DATA Entities will not be expanded
XML 34

CDATA

U.K.R., 2008

Declaring Elements
An empty element <!ELEMENT elementName (EMPTY)> Example <!ELEMENT br (EMPTY)> <!ELEMENT Bool (EMPTY)> Usage: <br/> <Bool Value="True"></Bool>
U.K.R., 2008 XML 35

Declaring Elements
Element with data <!ELEMENT elementName (#PCDATA)> Example <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT

from (#PCDATA)> question (#PCDATA)> email ANY> tutorial (#PCDATA)>

U.K.R., 2008

XML

36

Declaring Elements
Usage:

<from>U. K. Roy</from> <question> What is the full form of DTD? </question> <email>u_roy@it.jusl.ac.in</email> <tutorial> This is an XML document </tutorial>
U.K.R., 2008 XML 37

DTD Declarations
Example : Elements with Data
<!ELEMENT Month (#PCDATA)> Valid Usage <Month>April</Month> <Month>This is a month</Month> Invalid Usage: <Month> <!Invalid usage within XML file, cant have children!--> <January>Jan</January> <March>March</March> </Month>
U.K.R., 2008 XML 38

Declaring Elements
Element with Children (sequential) <!ELEMENT elementName (child1, child2,)>

Example <!ELEMENT message (from, to, body)> <!ELEMENT address (street, city, zip)>
U.K.R., 2008 XML 39

Declaring Elements
Inner elements must also be declared
<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT
U.K.R., 2008

message (from, to, body)> from (#PCDATA)> to (#PCDATA)> body (#PCDATA)> address (street, city, zip)> street (#PCDATA)> city (#PCDATA)> zip (#PCDATA)>
XML 40

Declaring Elements
Usage: <?xml version=1.0?> <!DOCTYPE message SYSTEM message.dtd> <message> <from>tom@it.jusl.ac.in</from> <to>jerry@rediffmail.com</to> <body>Learn DTD from www.w3schools.com</body> </message>

U.K.R., 2008

XML

41

Declaring Elements
Usage: <?xml version=1.0?> <!DOCTYPE message SYSTEM address.dtd> <address> <street>S. C. Mallick Road</street> <city>Kolkata</city> <zip>700032</zip> </address>

U.K.R., 2008

XML

42

Using internal DTD


<?xml version=1.0?> <!DOCTYPE email [ <!ELEMENT email (from, to, body)> <!ELEMENT from (#PCDATA)> <!ELEMENT to (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <email> <from>tom@it.jusl.ac.in</from> <to>jerry@rediffmail.com</to> <body>Learn DTD from w3schools.com</body> </email>
U.K.R., 2008 XML 43

DTD Declarations
<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT House (address)> address (person, street, city, zip)> person (#PCDATA)> street (#PCDATA)> city (#PCDATA)> zip (#PCDATA)>

<!Valid usage within XML file--> <House> <address> <person>John Doe</person> <street>1234 Preston Ave.</street> <city>Charlottesville, Va</city> <zip>22903</zip> </address> </House>
U.K.R., 2008 XML 44

Declaring Elements
Occurrence Indicators
Term , |
Meaning Sequence Operators Choice operators

Example a, b, c a|b|c

+
*

One or more
Zero or more

a+
a*

? ()
U.K.R., 2008

Single optional grouping


XML

a? (a)
45

Examples
<!ELEMENT a <!ELEMENT b <!ELEMENT either <!ELEMENT ordered <!ELEMENT list <!ELEMENT dl <!ELEMENT text <!ELEMENT mixed EMPTY> ANY> (one | theother)> (first, second)> (item+)> ((dt?, dd?)*)> (#PCDATA)> (#PCDATA | b | i | em)>

U.K.R., 2008

XML

46

Declaring Elements
Example
<!ELEMENT Book (Front, Chapter+, Back?)> <!ELEMENT Front (Title, Author+, Publisher?)> <!ELEMENT Chapter (Name, Content)> <!ELEMENT Back (ISBN)> <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT
U.K.R., 2008

Title (#PCDATA)> Author (#PCDATA)> Publisher (#PCDATA)> Name (#PCDATA)> Content (#PCDATA)> ISBN (#PCDATA)>
XML 47

Cautions concerning DTDs

All element declarations begin with <!ELEMENT and end with > The ELEMENT declaration is case sensitive Elements declared with the #PCDATA content model can not have children When describing sequences, the XML document must contain exactly those elements in exactly that order.

U.K.R., 2008

XML

48

Declaring Attributes
General Syntax <!ATTLIST elementName attributeName attributeType defaultType> Example <!ATTLIST HDD speed CDATA 7200> <!ATTLIST HDD unit CDATA #IMPLIED> <!ATTLIST price currency CDATA INR> <!ATTLIST question number ID #REQUIRED>

<HDD speed=6000> </HDD> <price currency=USD>10</price> <question number=1> </question>


U.K.R., 2008 XML 49

Declaring Attributes
The attribute-type can be one of the following:
Type
CDATA (en1|en2|..) ID Description The value is character data The value must be one from an enumerated list The value is a unique id

IDREF
IDREFS NMTOKEN NMTOKENS

The value is the id of another element


The value is a list of other ids The value is a valid XML name The value is a list of valid XML names

ENTITY
ENTITIES NOTATION xml:
U.K.R., 2008

The value is an entity


The value is a list of entities The value is a name of a notation The value is a predefined xml value
XML 50

Declaring Attributes
The default-value can be one of the following:
Value value #REQUIRED #IMPLIED #FIXED value Explanation The default value of the attribute The attribute is required The attribute is not required The attribute value is fixed

U.K.R., 2008

XML

51

Examples
<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED>

<!ATTLIST list type (bullets|ordered|glossary) "ordered">


<!ATTLIST form method CDATA #FIXED "POST">

U.K.R., 2008

XML

52

Examples
<!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> Valid XML: <square width="100" />

U.K.R., 2008

XML

53

Examples--#REQUIRED
DTD: <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML:

<person />

U.K.R., 2008

XML

54

Examples--#IMPLIED
DTD: <!ATTLIST contact fax CDATA #IMPLIED> Valid XML: <contact fax="555-667788" /> valid XML:

<contact />

U.K.R., 2008

XML

55

Examples--#FIXED
DTD:
<!ATTLIST sender company CDATA #FIXED "Microsoft">

Valid XML: <sender company="Microsoft" /> Invalid XML: <sender company="W3Schools" />

U.K.R., 2008

XML

56

Examples--Enumerated
DTD: <!ATTLIST payment type (check|cash) "cash"> XML example: <payment type="check" /> or <payment type="cash" />

U.K.R., 2008

XML

57

Elements of Attributes?

attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a DTD

U.K.R., 2008

XML

58

Declaring Entities
General Syntax <!ENTITY entityName entityValue> Example <!ENTITY euro &#x20AC;> // <!ENTITY language XML> <!ENTITY W3C World Wide Web Consortium> <!ENTITY copyright &#x00A9;> // <!ENTITY USD SYSTEM currency.dtd> <tutorial> &language; is standardized by &W3C; &copyright; UKR </tutorial>
U.K.R., 2008 XML 59

Displaying XML

XHTML XML namespace XML DOM XPath XSL (XSLT+XPath)

Client side

By browser Explicitly by author (using JavaScript)

Server side

Schema
XML 60

U.K.R., 2008

Displaying XML

XML documents do not carry information about how to display the data We can add display information to XML with

CSS (Cascading Style Sheets) XSL (eXtensible Stylesheet Language) --- preferred

U.K.R., 2008

XML

61

XML into HTML

XSLT can transform into (called "output method"):

XML HTML text


content in XML served as HTML browser never knows

Server-side XSLT engine


U.K.R., 2008

XML

62

Client-side XSL

XML

XSLT
FO

U.K.R., 2008

XML

63

Server-side XSL

XML
XSLT

XSLT engine

HTML

U.K.R., 2008

XML

64

XML DOM
<bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
U.K.R., 2008 XML 65

XML DOM Tree

U.K.R., 2008

XML

66

Questions?

U.K.R., 2008

XML

67