This action might not be possible to undo. Are you sure you want to continue?
Information about a book:
– Title, Author, Chapters
Lecture 2 Week 2
Creating a Valid XML Document Using Document Type Definitions (DTDs)
Decide on a way to structure the information, in this case:
<?xml version="1.0"?> <book> <title>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book>
1 DWAX 2010.1 2
Parsers, Well-formed and Valid XML Documents
• Checks if document is well formed (contains no syntax errors and conforms to the XML specifications)
Uses Extended Backus-Naur Form (EBNF) grammar Used to define the structure of an XML document:
– – – – – – Ensure all required elements are present in the document Prevent undefined elements from being used Enforce a specific data structure Specify the use of attributes and define their possible values Define default values for attributes Describe how the parser should access non-XML or non-textual content
• Checks if document is well formed • Checks if XML document is Valid (conforms to the rules set out in it’s DTD or Schema) – By definition, a valid document is also well formed
Declaring a DTD
A document type definition is a collection of rules or declarations that define the content and structure of the document. A document type declaration attaches those rules to the document’s content.
Document Type Declaration
Document Type Declaration
– Introduce DTDs into XML documents – Placed in XML document’s prolog – Begins with ‘<!DOCTYPE’ and ends with ‘>’ – Often referred to as the DOCTYPE declaration – Can point to • External subsets – Declarations outside document – Exist in different file – typically ending with .dtd extension
• Internal subsets – Declarations inside document – Visible only within document in which it resides
Declaring an internal DTD
The DOCTYPE declaration for an internal subset is:
<!DOCTYPE root [ declarations ]>
Adding a Document Type Declaration
(Note: no definitions added yet, this is not valid)
<?xml version="1.0"?> <!DOCTYPE book [ ]> <book> <title>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book>
Where root is the name of the document’s root element, and declarations are the statements that comprise the DTD.
Starting the Document Type Definition: Declaring Document Elements
Every element used in the document must be declared in the DTD for the document to be valid. An element type declaration specifies the name of the element and indicates what kind of content the element can contain. The element declaration syntax is:
<!ELEMENT element content-model>
Declaring Document Elements
The element name is case sensitive. DTDs define five different types of element content:
– Any elements. No restrictions on the element’s content.
• <!ELEMENT element ANY>
– Empty elements. The element cannot store any content.
• <!ELEMENT element EMPTY>
– Character data. The element can only contain a text string.
• <!ELEMENT element (#PCDATA)> • The keyword #PCDATA stands for “parsed-character data” and is any wellformed text string.
– Elements. The element can only contain child elements.
Where element is the element name and content-model specifies what type of content the element contains.
• <!ELEMENT element (child elements)>
– Mixed. The element can contain both character data and child elements.
• <!ELEMENT element (#PCDATA|child1|child2|…)*> • The parent element can contain character data or any number of the specified child elements, or it can contain no content at all.
Types of Element Content
Sequences The declaration
– Specify order in which elements occur – Comma (,) is used as delimiter
<!ELEMENT element (child1, child2, …)>
<!ELEMENT customer (phone)> <!ELEMENT phone (#PCDATA)>
indicates the Customer element can only have one child, named Phone. You cannot repeat the same child element more than once with this declaration. The phone element can only contain character data. Valid XML markup could be: <customer> <phone>02 3333 4444</phone> </customer>
<classroom> <teacher>1</teacher> <student>20</student> </classroom>
Defining the classroom element in a DTD:
<!ELEMENT classroom (teacher, student)>
The order of the child elements in the XML file must match the order defined in the element declaration
<!ELEMENT chapters (chapter, chapter, chapter)>
– Specify element’s frequency – Plus sign (+) indicates minimum one occurrence of the element
<!ELEMENT album ( song+ )>
The above element declaration indicates that the element chapters must contain exactly three child elements named chapter.
– Asterisk (*) indicates zero or more (optional) element
<!ELEMENT library ( book* )>
– Question mark (?) indicates zero or one occurence of element
<!ELEMENT seat ( person? )>
Pipe Characters (Choice)
Pipe characters (|)
– Specify choices – Presents a set of possible child elements – Syntax:
<!ELEMENT element ( child|child )>
DTD – Internal subset
<?xml version="1.0"?> <!DOCTYPE book [ <!ELEMENT book(title,author,chapters)> <!ELEMENT title(#PCDATA)> <!ELEMENT author(#PCDATA)> <!ELEMENT chapters(chapter+)> <!ELEMENT chapter(#PCDATA)> ]> <book> <title>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book>
15 DWAX 2010.1 16
<!ELEMENT dessert ( iceCream|pastry )>
Declaring an external DTD
The real power of XML comes from an external DTD that can be shared among many documents written by different authors. Each XML document can only be linked to one external DTD The DOCTYPE declaration for an external subset is: <!DOCTYPE root SYSTEM “URL”> <!DOCTYPE root SYSTEM “URL” [ declarations ]> Where root is the name of the document’s root element, URL is the location and name of the external dtd file, and declarations are the statements that comprise the DTD.
DTD – External Subset
Book.xml: <?xml version="1.0"?> <!DOCTYPE book SYSTEM “book.dtd”> <book> <title>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book> Book.dtd: <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT
book(title,author,chapters)> title(#PCDATA)> author(#PCDATA)> chapters(chapter+)> chapter(#PCDATA)>
Internal/External subset precedence
If a document contains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.
Combining an External and Internal DTD Subset
This figure shows how to combine an external and an internal DTD subset
For a document to be valid, all the attributes associated with elements must also be declared.
– You must add an attribute-list declaration to the document’s DTD.
Declaring Element Attributes
The syntax to declare a list of attributes is:
<!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3…>
– Specifies all the attributes an element has – Uses ATTLIST attribute list declaration
• • • • Lists the names of all attributes associated with a specific element Specifies the data type of the attribute Indicates whether the attribute is required or optional Provides a default value for the attribute, if necessary
– element is the name of the element associated with the attributes, – attribute is the name of an attribute, – type is the attribute’s data type, and – default indicates whether the attribute is required or implied, and whether it has a fixed or default value.
Book.xml: <?xml version="1.0"?> <!DOCTYPE book SYSTEM “book.dtd”> <book> <title isbn=“0-22-4444”>Web Applications</title> <author>John Doe</author> <chapters> <chapter>Introduction</chapter> <chapter>ASP</chapter> <chapter>XML</chapter> </chapters> </book> Book.dtd: <!ELEMENT book(title,author,chapters)> <!ELEMENT title(#PCDATA)> <!ATTLIST title isbn CDATA #REQUIRED> <!ELEMENT author(#PCDATA)> <!ELEMENT chapters(chapter+)> <!ELEMENT chapter(#PCDATA)>
DWAX 2010.1 23
– Strings (CDATA)
• No constraints on attribute values – Except for disallowing <, >, &, ’and ” characters
– Tokenized attributes
• ID, IDREF, ENTITY and NMTOKEN
– Enumerated attributes
• Most restrictive, limited to a set of possible values • The general form of an enumerated type is: attribute (value1 | value2 | value3 | …) • For example, the following declaration: <!ATTLIST Customer CustType (home | business )> • restricts CustType to either “home” or “business”
<?xml version = "1.0"?>
This figure shows the seven attribute tokens
2 3 4 5 6 7 8 9 10 11 12 13 ]> 14 15 <bookstore> 16 17 18 19 <shipping shipID = "s1"> <duration>2 to 4 days</duration> </shipping> <!DOCTYPE bookstore [ <!ELEMENT bookstore ( shipping+, book+ )> <!ELEMENT shipping ( duration )> <!ATTLIST shipping shipID ID #REQUIRED> <!ELEMENT book ( #PCDATA )> <!ATTLIST book shippedBy IDREF #IMPLIED> <!ELEMENT duration ( #PCDATA )>
Attribute shippedBy points to shipping element by matching shipID attribute Each shipping element has a unique identifier (shipID)
<!-- Fig. 6.8: IDExample.xml
<!-- Example for ID and IDREF values of attributes -->
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
<shipping shipID = "s2"> <duration>1 day</duration> </shipping>
Declare book elements with attribute shippedBy
<book shippedBy = "s2"> Java How to Program 3rd edition. </book>
<book shippedBy = "s2"> C How to Program 3rd edition. </book>
• The attribute must appear in element • Document is not valid if attribute is missing
• The attribute is optional.
<book shippedBy = "s1"> C++ How to Program 3rd edition. </book>
• The attribute is optional but if one is specified, it must match the default.
Entities are storage units for a document’s content. The most fundamental entity is the XML document itself and is known as the document entity. Entities can also refer to:
– – – – a text string a DTD an element or attribute declaration an external file containing character or binary data
Entities can be declared in a DTD. How to declare an entity depends on how it is classified. There are three factors involved in classifying entities:
– The content of the entity – How the entity is constructed – Where the definition of the entity is located.
Working with Entities
This figure shows the three entity classifications
General Parsed Entities
General entities are declared in the DTD of a document. The syntax is:
<!ENTITY entity “value”>
General External Entities
General entities can refer to values located in external files. The syntax is:
<!ENTITY entity SYSTEM “URL”>
Where entity is the name assigned to the entity and value is the general entity’s value. For example, an entity named “Pixal” can be created to store a company's official name:
<!ENTITY Pixal “Pixal Digital Products”>
For example, in the declaration:
<!ENTITY headlines SYSTEM http://www.newsflash.com/stories.xml>
After an entity is declared, it can be referenced anywhere within the document.
<Title>This is the home page of &Pixal;</Title>
This is interpreted as
<Title>This is the home page of Pixal Digital Products</Title>
An entity named “headlines” gets its value from the document stories.xml, located at http://www.newsflash.com/stories.xml
Parameter entities are used to store the content of a DTD. For internal parameter entities, the syntax is: <!ENTITY % entity “value”> where entity is the name of the parameter entity and value is a text string of the entity’s value. For external parameter entities, the syntax is: <!ENTITY % entity SYSTEM “URL”> where URL is the name assigned to the parameter entity.
Parameter entity references can only be placed where a declaration would normally occur, such as an internal or external DTD. Parameter entities used with an internal DTD do not offer any time or effort savings. However, an external parameter entity can allow XML to use more than one DTD per document by combining declarations from multiple DTDs.
Using Parameter Entities to Combine Multiple DTDs
Go to the DTD section of w3schools for some examples of DTDs:
Carey: Tutorial 3 – Creating a valid XML Document
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.