CHAPTER 3 Introduction In the previous chapter we learnt DTD that is Traditional way of validating an XML document, which were inherited from SGML. Over many times people have complained to the W3C about the complexity of DTDs and have asked for something simpler. W3C for the above complaint assigned a committee to work on the problem, and came up with a solution, which is more complex than DTDs called XML Schemas. On the other hand XML Schemas are also far more powerful than DTDs everwere. What DTDs cannot provide? Specific data types for attribute type. But where as Schemas supports data types for attributes. A Schema is a set of rules for constraining the structure and articulating the information set of XML documents.
Advantages of Schema over DTDs XML Schema is based on XML, not some specialized syntax. XML can be parsed and manipulated just like any other XML document. XML Schemas support a verity of data types (int, floats, Booleans, dates, Strings...) XML Schemas present an open-ended data model, which allows you to extend vocabularies and establish inheritance relationships between elements without invalidating documents. XML Schemas support namespace integration, which allows you o associate individual nodes of a document with type declarations in a schema. XML Schemas support attribute groups, which allows you to logically combine attributes. One of the original proponents of XML Schemas was Microsoft. Microsoft documentation on XML frequently decried DTD as being too complex and said that schemas would fix the problem. In fact, the Microsoft implementation of XML Schemas in IE was promptly outdated not long after it was introduced.
XML Schemas in Internet Explorer As with many other developers, Microsoft got caught basing its software on a relatively early XML specification, which promptly changed. As implemented in IE, Microsoft's Schemas are based on the XML data.
Writing XML Schema The DTD is very straight forward, primarily because XML schema is a pretty simple vocabulary by most standards. The root element of all XML schema documents is schema, which is declared in the DTD as potentially containing three child elements: AttributeType, ElementType and Description. In addition to these elements, the XML schema vocabulary declares several other elements that are used to describe document schemas. The following are the elements that make you the XML schema vocabulary
Schema Datatype ElementType Element Group Attribute type Attribute Description
Serves as the root element for XML schema documents Describes data types for elements and attributes Describes a type of element Identifies an element that can occur with in another element type Organizes elements into groups for ordering purposes Describes a type of attribute Identifies an attribute that can occur within an element type Provides documentation for an element or attribute
The Schema Element The schema element serves as the root (document) element for XML schema documents and acts as a container for all other schema content. The schema element includes two attributes Name Xmlns The name of the schema The namespace for the schema
The name attribute establishes the name of the schema. The Xmlns attribute is very important in that it establishes the namespace for the schema. This attribute must be set to urn: schemas-microsoft-com: xml -data in order to use Microsoft's XML schema implementation.
<Schema name="myschema" xmlns="urn:schemas-microsoft-com: xml -data"> <!--schema content goes here--> </ Schema > NOTE Namespace are used in XML documents to guarantee uniqueness among element and attribute names associated with a given XML vocabulary. Namespaces take the form of URLs, which are often the familiar URLs, used to identify resources on the Web. In addition to specifying the namespace for the schema, usually it is also necessary to specify the namespace for XML schema data types. The data type namespace is typically assigned to the
xmlns:dt attribute and is set to urn: schemas-microsoft-com:datatypes. You must set this namespace in order to use any of the XML schema datatypes, such as date, time, int and float. <Schema name="myschema "xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <! --Schema content goes here--> </schema> The schema element can contain child elements of type AttributeType, ElementType and Description. The AttributeType and ElementType elements define attribute types and element types. The ElementType Element The Element Type element is used to define element types that establish the schema of documents. The ElementType element can contain datatypes, element, group, AttributeType, attribute and Description child elements. The element attribute identifies an instance of a child element with in the element; you use the element attribute to establish the content model for an element type. Attributes for an element type are established using the AttributeType and attribute elements. The AttributeType element defines a type of attribute, while the attribute element identifies an actual attribute of the element type. Any attribute types defined with in an ElementType element are considered local to that element. The ElementType element includes several attributes for defining the specific parameters of the element type:
name model content order dt:type
The name of the element Whether the content model is open or closed The type of content contained within the element The order of the child elements and groups contain within the element The type of the element
The following are the examples of element types defined using the ElementType element: <ElementType name="name" content="textOnly" dt:type="string"/> <ElementType name="type" content="textOnly" dt:type="string"/> <ElementType name="product" content="eltOnly" model="closed" order="seq"> <element type="name"/>
<element type="type"/> <ElementType/> <ElementType name="products" content="eltOnly" model="closed" order="seq"> <element type="product"/> <ElementType/> Notice that the name and type elements are first declared using the ElementType element, and then are identified within the content model of the session element using the element. The name and model Attributes The name attribute is used to specify the name of the ElementType and is required attribute. This value must be unique for element types within the scope in which it is defined. The model attribute specifies whether the schema document adheres to an open or closed content model. An open model allows additional elements to be defined within the element type that aren't declared in schema, for a very extensible schema. Element types will assume an open model by default. The content Attribute The content attribute of ElementType is used to establish the type of content contained within the element type. The following are acceptable values for this attribute: empty textOnly eltOnly mixed The element type doesn't contain any content The element type can only contain text (if the content model is open, the element type may also contain other unspecified elements) The element type can only contain the specified child elements The element type can contain the mixture of text and specified child elements (if the content model is open, the element type may also contain other unspecified elements) The order Attribute The order attribute is used to establish the order and frequency of the group of child elements contained within the element type. The following are acceptable values for this attribute: one seq many Only one of a set of elements is allowed The elements must occur in the specified sequence The elements can occur any number of times in any order
The dt:type Attribute The dt:type Attribute is used to establish the type of content contained within the element type. The types allowed in the dt:type attribute match those that are allowed in the datatypes element. XML Schema datatypes will be covered later. The element Element The Element element is used to declare an instance of an element with a group or element type. The Element element includes three attributes for describing additional information about an element instance: type minoccurs maxoccurs The type of element The minimum number of times the element must occur The maximum number of times the element must occur
The type attribute is used to specify the type of the element. The value assigned to the type attribute must be the name of an element type already declared in the schema. The minoccurs and maxoccurs attributes are used to establish the number of times an element can occur within a group or element type. Both attributes have default values of 1 in the XML-Data note, which means that an element must occur exactly one by default. The relationship between the minoccurs and maxoccurs Attributes and the number of times an Element or Group can occur minoccurs 0 1 0 1 >0 >maxoccurs Any value Note The table applies to the group element, because groups also have minOccurs and maxOccurs attributes that serve the same purpose. The following is an example of the element used to declare element instances within an element type: maxoccurs 1 1 * * * >0 <minoccurs # Of Times Element /Group can occur 0or1 1 Any number of times At least once At least minoccurs times 0 0
<ElementType name="location" content="textOnly"/> <ElementType name="comments" content="textOnly"/> <ElementType name="session" model="closed" content="eltOnly" order="seq"> <element type="location" minOccurs="1" maxOccurs="1"/> <element type="comments" minOccurs="0" maxOccurs="1"/> </ElementType> The Group Element The group element is used to group elements for organizational purpose and for establishing complex content models. A complex content model consists of more than one group of elements. The group element includes three attributes for fine-tuning groups: order minoccurs maxOccurs The order of the child elements contained within the group The minimum number of times the group must occur The maximum number of times the group must occur
The order attribute works exactly like its counterpart in the ElementType element. The following are acceptable values for this attribute: one seq many Only one of a set of elements is allowed within the group. The elements must occur in the specified sequence in the group. The elements can Occur any number of times and in any order in the group.
The minOccurs and maxOccurs attributes play the exact same role in the group element as they did in the element, which is constraining the number of times the group can Occur. The AttributeType Element The attribute type element is used to define attribute types for use in elements. Similar to the ElementType element, the attribute type element simply defines an attribute type. To actually declare an attribute as part of an element, you must use the attribute element, which reference an attribute type element. Attribute type may be defined at the top level of a schema document or within individual element type. This allows you to create either global attributes or local attributes within a given scope. Global attributes are handy because they can be used in multiple elements. On the other hand, local attributes can be used within a given scope to supercede another attribute of the same name.
The AttributeType element includes the following attributes to allow you to fully describe an attribute type: name dt:type dt:values dt:type default required The name of the attribute type The data type of the attribute type The list of possible values for an enumerated attribute; only applicable when is set to enumeration The default value for the attribute Flag indicating whether the attribute must be provided in the element
The name attributes specifies the name of the attribute type and is a required attribute. This name must be unique among attributes within a given scope. The dt:type attribute specifies the data type of the attribute. The dt:values attribute is used to specify a list of possible values for enumerated attributes. This attribute is applicable only when dt:type is set to enumeration. The list of enumerated attribute values is specified as a single string with spaces between each possible value. The following is an example of an enumerated attribute definition: <AttributeType name="type" dt:type="enumeration" dt:value="running cycling swimming"/> In this example, the available values that can be assigned to the type attribute are running, cycling and swimming. Any value other than one of these three will be considered an error during validation. The default Attribute of the AttributeType element is used to establish the default value for the attribute type. The following is an example of establishing the default value of an attribute: <AttributeType name="type" dt:type="enumeration" dt:values="running cycling swimming"
default="running"/> The required attribute is basically a flag that is used to specify whether the attribute type is required of the element in which it is defined. Acceptable values for the required attribute are yes and no, which indicate the requirement of the attribute type.
The Attribute Element The attribute element is used to declare an instance of an attribute for an element type. The attribute element includes three attributes for describing additional information about an attribute instance: type default required The type of the attribute The default value for the attribute Flag indicating whether the attribute must be provided in the element
The type attribute is used to specify the type of the attribute. The value assigned to the type attribute must be the name of an attribute type already declared in the schema. The type attribute is what ties attribute instances to their associated attribute types. The default and required attribute serve the same purposes as their equivalents in the AttributeType element, and they will supercede the equalent attributes if they are set in the attribute type. The following is an example of the attribute element used to declare attribute instances within an element type. <AttributeType name="type" dt:type="enumeration" dt:values="running cycling swimming"/> <AttributeType name="date" dt:type="date"/> <ElementType name="session" content="eltOnly" order="seq"> <element type="duration" minoccurs="1" maxoccurs="1"/> <element type="distance" minoccurs="1" maxoccurs="1"/> <element type="location" minoccurs="1" maxoccurs="1"/> <element type="comments" minoccurs="0" maxoccurs="1"/> <attribute type="type" default="running"/> <attribute type="date"/> </ElementType> In this example, the type and date attributes are first declared using the AttributeType element and then associated with an element type using the attribute element. Notice that the default value of the type attribute is set in the attribute element instead of the AttributeType element. Note There is no constraint on the order of attributes within an element, but there can be no more than one attribute of a given name per element.
The description Element The last element used in XML Schema documents is the description element, which simply provides a means of placing a text description within a schema. The description element is a text only element that is designed for documentation purposes. You can use description element in any way you choose to provide documentation about an XML Schema construct. The following is an example of how you might add documentation to element type:
<ElementType name="trainlog" content="eltOnly"> <description> This element type represents training log consisting of one or more training sessions. </description> <element type="session" minOccurs="1" maxOccurs="*"/> </ElementType> XML Schema Data Types As you know, XML DTDs offer a limited number of data types and they are rather primitive. For all practical purposes, XML really only supports a string data type, which is extremely limiting if you're creating structured document schemas. The XML-Data note defines a number of rich data types that can be used to specify familiar data types, such as integers, floating point numbers, dates, and times, to name a few. As of Internet Explorer 5.0, XML Schema supports all of these data types in elements and hopefully will support them for attributes at some point in the future. XML Schema data types are referenced from the urn:schema-microsoft-com: datatypes data types namespace. To make referencing the data types easier, you must declare this namespace at the document level of your schema documents. The data type namespace is typically assigned to the xmlns:dt attribute, which means that you reference XML Schema data type by preceding them with dt:. Example <Schema </Schema> The whole point of declaring the XML Schema data type namespace is so you can use the data name="Myschema" xmlns="urn:schema-microsoft-com:xml-data" xmlns:dt="urn:schema-microsoft-com:datatypes">
types it supports. The following is a list of these data types, which go far beyond the limited data types supported in XML 1.0: char boolean int float number fixed.14.4 i1 i2 i4 r4 r8 ui1 ui2 ui4 bin.hex bin.base64 date dateTime dateTime.tz time time.tz uri uuid Character (text string with a length of one) Boolean (0 or 1) Whole number (integer) Real (floating point) number with fractional part and optional exponent Real number (same as float) Real number with 14 whole digits and 4 fractional digits One-byte integer Two-byte integer Four-byte integer Four-byte real number Eight-byte real number (same as float) One-byte unsigned integer Two-byte unsigned integer Four-byte unsigned integer Hexadecimal (base 16) number Base 64 number Date (without time or zone) Date with optional time (without time zone) Date with optional time and time zone Time (without data and time zone) Time with time zone (without data) Universal Resource Identifier (URI) Global identifier
The following are the primitive data types available for use in XML Schema: string enumeration notation entity entities id idref idrefs A string type An enumerated type (attributes only) A NOTATION type The ENTITY type The ENTITIES type The ID type The IDREF type The IDREFS type
nmtoken nmtokens Employees.xml
The NMTOKEN type The NMTOKENS type
<?xml version="1.0"?> <employees xmlns="x-schema:employees.xml"> <employee> <eid id="A100">A100</eid> <ename>Surya</ename> <sal>50000.00</sal> <desig>CEO</desig> <phno>3751135</phno> <email>email@example.com</email> </employee> <employee> <eid id="A101">A101</eid> <ename>Rajesh</ename> <sal>30000.00</sal> <desig>Director</desig> <phno>3751238</phno> <email>firstname.lastname@example.org</email> </employee> </employees> empSchema.xml Note Microsoft Schema extension is .xml, whereas W3C Schema file extension is .xsd <?xml version="1.0"?> <Schema <!-Above is a Microsoft namespace for Schemas data & datatypes xmlns=>XML Namespace urn=> Uniform Resource Namespace dt=> datatype xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoftcom:datatypes">
--> </Schema> How do you associate a schema with this document as far as Internet Explorer is concerned? You do so by specifying a default namespace attribute in the root element, and prefacing the name of the schema file with x-schema: like this: <?xml version="1.0"?> <programming_team xmlns="x-schema:schema1.xml"> <programmer>Fred Samson</programmer> <programmer>Edward</programmer> </programming_team> Here, I'm naming the schema file schema1.xml (IE Schema does not insist on any special extension for schema file) Creating Schema file you can name the schema using the name attribute in Schema. <Schema name="schema1" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="programming" content="textonly" model="closed"/> <ElementType name="programming_team" content="eltonly" model="closed"> <element type="programming" minOccurs="1" mixOccurs="*"/> </ElementType> </Schema> One of the advantages of using schemas is that they allow you to specify the actual data types that you want to use, but those data types weren't fully fleshed out at the time Microsoft decided to implement schemas, so Microsoft implemented its own. To create a schema for Internet Explorer, you set up a default namespace