Page 1 of 5L has two main advantages: first, it offers a standard way of structuring data, and, second, we can specifythe vocabulary the data uses. We can define the vocabulary (what elements and attributes an XMLdocument can use) using either a document type definition (DTD) or the XML Schema language.DTDs were inherited from XML's origins as SGML (Standard Generalized Markup Language) and, as such,are limited in their expressiveness. DTDs are for expressing a text document's structure, so all entities areassumed to be text. The XML Schema language more closely resembles the way a database describesdata.Schemas provide the ability to define an element's type (string, integer, etc.) and much finer constraints (apositive integer, a string starting with an uppercase letter, etc.). DTDs enforce a strict ordering of elements;schemas have a more flexible range of options (elements can be optional as a group, in any order, in strictsequence, etc.). Finally schemas are written in XML, whereas DTDs have their own syntax.As you'll see in this article, schemas themselves are quite straightforward—I find them easier than DTDs asthere is no extra syntax to remember. The difficulties arise in using XML Namespaces and in getting theJava parsers to validate XML against a schema.In this article, I first cover the basics of XML Schema, then validate XML against some schema using severalpopular APIs, and finally cover some of the more powerful elements of the XML Schema language. But first,a short detour.A detour via the W3CXML, the XML Schema language, XML Namespaces, and a whole range of other standards (such asCascading Style Sheets (CSS), HTML and XHTML, SOAP, and pretty much any standard that starts with anX) are defined by the World Wide Web Consortium, otherwise known as the W3C. A document only
XML if it conforms to the XML Recommendation issued by the W3C.Various experts and interested parties gather under the umbrella of the W3C and, after much deliberation,issue a recommendation. Companies, individuals, or foundations such as Apache, will then writeimplementations of those recommendations.This article's documents are a combination of these three recommendations:
XML SchemaXML 1.0 or 1.1XML exists in two versions: 1.0 defined in 1998 and 1.1 defined in 2004. XML 1.1 adds very little to 1.0:support for defining elements and attributes in languages such as Mongolian or Burmese, support for IBMmainframe end-of-line characters, and almost nothing else. For the vast majority of applications, thesechanges are not needed. Plus, a document declared as XML 1.1 will be rejected by a 1.0 parser. So stickwith 1.0.Well-formed and valid XMLFor an application to accept an XML document, it must be both
. These terms aredefined in the XML 1.0 Recommendation, with XML Schema extending the meaning of
, an XML document must follow these rules:
The document must have exactly one root element.