You are on page 1of 44

What is XML?

XML s a way of addng ntegence to your documents. It ets you dentfy each
eement usng meanngfu tags and t ets you add nformaton ("metatdata") about
each eement.
XML s very much a part of the future of Web, and part of the future for a eectronc
nformaton.
XML s a syntax for markng up data and t works wth many other technooges to
dspay and process nformaton. It ooks and fees very much ke HTML.
XML sn't gong to repace everythng ese you've aready earned; t compements t
and extends t.
XML sn't gong to change the way your Web pages ook. You' st need to use CSS--
Cascading Style Sheets-- (wth XML) to dene font coors or |avaScrpt (agan, wth
XML) to make your mages y around. Yet XML w change the way you and others
read documents and t w change the way documents are ed and stored. It's a new
technoogy and you certany don't need to use t n order to bud a great Web ste --
but you w want to be aware of t as you ook at the Web of the future.
What's the Fuss About?
XML ets you make documents smarter, more portabe, and more powerfu -- that's
the promse of XML and that's what a the fuss s about.
XML aows you to use your own tags to dene parts of a document. You can do ths
because XML s a descrptve, not a procedura, anguage. That s, XML descrbes
what somethng is rather than performng an acton.
For exampe, take a ook at the front page of a newspaper. You' see dherent font
szes, dherent sectons, and coumns.
If you were to create a Web page for that newspaper--usng the same formattng and
styes--you woud use tags such as <H1> and <font color="red"> to dene the sze
and coor of a arge headne, or <i> to tacze a word such as a byne, n order to
dstngush t from the rest of the text.
But |ust try to wrte tags that actuay expan that you've got a Headne and that the
words "|ohn Smth" make up a byne. HTML won't know what you're takng about f
you create tags such as <Headline> or <byline> or <advertisement>.
XML, wth hep from other technooges such as CSS, understands what the eements
are and how to dspay them.
That means, n the future, when you're searchng on the web for say, a Barbe do for
your nece's brthday, you' get Barbe the DOLL nstead of some other type of
Barbe, because the Barbe do page mght be marked up ke ths:
<DOLL>Barbie</DOLL>.
Pretty coo, huh?
XML documents can be moved to any format on any patform -- wthout the eements
osng ther meanng. That means you can pubsh the same nformaton to a web
browser, a PDA, or a network-enabed bread machne and each devce woud use the
nformaton appropratey.
The most mportant thng to remember about XML, though, t that s doesn't stand
aone. It needs other technoogoes, ke CSS, n for you to see ts resuts.
If a of ths seems ke a pan, and you don't want to mess wth XML, t's OK. You
don't need t to make a great web page. But you never know when organzaton w
come n handy.
Where Did XML Come From?
1
XML s a smped verson of SGML and a cousn of HTML. It was deveoped by
members of the W3C and reeased as a recommendaton by the W3C n February
1998.
SGML, the parent of XML, s an nternatona standard that has been n use as a
markup anguage prmary for technca documentaton and government appcatons
snce the eary 1980s. It was deveoped to standardze the producton process for
arge document sets. Thnk: Medca records. Company databases. Arcraft parts
cataogs. Other reay huge documents.
Markng-up documents n SGML aows nformaton to be passed from one system to
the next wthout osng nformaton. Wth databases marked-up n SGML you can see
what Wdget A s a about and go check to see f Wdget A s n stock.
Eary on, peope thought that SGML woud be usefu for the Web. In fact, HTML s
reay an very basc appcaton of SGML! But HTML qucky became used for vsua
ayout, so a group of peope returned to the bascs, determned to create somethng
that had the strengths of SGML wthout beng so dmcut to mpement -- and had the
ease of use of HTML, but wth more structura power. The resut was XML.
The desgn goas of XML, taken from the XML Specifcation are:
XML sha be straghtforwardy usabe over the Internet.
XML sha support a wde varety of appcatons.
XML sha be compatbe wth SGML.
It sha be easy to wrte programs whch process XML documents.
The number of optona features n XML s to be kept to the absoute mnmum, deay
zero.
XML documents shoud be human-egbe and reasonaby cear.
The XML desgn shoud be prepared qucky.
The desgn of XML sha be forma and concse.
XML documents sha be easy to create.
Terseness n XML markup s of mnma mportance.
In other words, XML s easy to create, easy to read, and desgned for use over the
Internet. What more coud a Web desgner ask for?
What Does XML Look Like?
If you've ever used HTML, XML s gong to ook very famar!
When you vew the source of a document wrtten n XML the rst thng you' see s
the XML decaraton, whch ooks ke ths:
<?xml version="1.0"?>
Then, n the body of the document, you' see a ot of tags. The tags ook famar at
rst -- they start wth the usua ess than sgn and end wth the usua greater than
sgn, ke ths:
<name>
But then you' notce that the tags mght not be qute the names you've come to
expect! You' see tags that seem to be made-up tag names. Tags ke <dogchow> and
<badcars> and <species>. In fact, f you vew the source of an XML document, you'
see tags surroundng ots of words, maybe every word n the document. These tags
dene exacty what the content s. And the creator of the document had the power to
create hs or her own specc set of tags.
Suppose you're ookng at a Web page marked up n XML on The Canterbury Taes by
Chaucer. You're ookng speccay at nes 282-286 of "The Physcan's Tae." The
document source for that secton mght ook ke ths:
<?xml version="1.0"?>
2
<CANTERBURY-TALES>
<SECTION name="physician">
The Physician's Tale
<LINE number="282">
That no man woot therof but God and he.
</LINE>
<LINE number="283">
For be he lewed man, or ellis lered,
</LINE>
<LINE number="284">
He noot how soone that he shal been afered.
</LINE>
<LINE number="285">
Therfore I rede yow this conseil take --
</LINE>
<LINE number="286">
Forsaketh synne, er synne yow forsake.
</LINE>
</SECTION>
</CANTERBURY-TALES>
The tags smpy dene that:
1) Ths document s the Canterbury Taes.
2) Ths secton s the Physcan's Tae.
3) Each ne of the Physcan's Tae s dened.
4) Each ne ends, and the Physcan's Tae and The Canterbury Taes end.
If the entre document were marked up such as ths, you coud easy |ump to a
certan ne or secton. The entre document s annotated for easy reference and
searchng, and nstead of vewng the entre document, users coud request ony
specc sectons of a document--smpy by cang the specc tags they want. Oh,
and we don't recommend that you manuay type out each ne n the Canterbury
Taes. Get a computer to count the nes for you.
XML Versus HTML
3
HTML and XML are cousns. They draw oh the same nspraton, SGML. They both
dentfy eements n your page. They both use a very smar syntax. If you are
famar wth HTML, XML w aso fee famar.
The bg dherence between HTML and XML s that HMTL has evoved nto a markup
anguage that descrbes the ook, fee and acton of a Web page. An <H1> s a
headne that s dspayed n a certan sze, for exampe.
In contrast, XML doesn't descrbe how a page ooks, how t acts or what t does. XML
descrbes what the words n a document ARE. Ths s a crtca dstncton! Whe
HTML combnes structure and dspay, XML separates them. Ths means that XML
documents are more portabe and can be used n many dherent types of
appcatons.
In the near future, we' see both XML and HTML documents. Eventuay, XML w
probaby repace HTML, or HTML w become an appcaton of XML. But that doesn't
mean you shoud toss out everythng you know! In many ways, XML buds on HTML
and f you know HTML, XML w be easer to work wth.
Valid and Well-Formed XML
You' sometmes hear an XML document referred to as a "vad" XML document or a
"we-formed" XML document. Ths dstncton touches on one of the nce thngs
about XML.
When you used SGML, you had to create somethng ca a Document Type Denton
(DTD, for short) n order make the SGML document usefu. DTDs were fary compex
and requred a ot of work to create. They were one of the roadbocks to wdespread
use of SGML.
Wth XML you have an opton. You can make a we-formed XML document by smpy
foowng the XML syntax rues. You don't have to create a separate DTD f you don't
want to.
If you do create an set of rues -- a DTD -- and make your document conform to those
rues, t s consdered a vad XML document.
DTDs descrbe the structure of your document. We' be dscussng DTDs n deta
ater on. Rght now, a you need to know s that the man dherence between Vad
and We-Formed XML s that Vad XML refers to and conforms to a DTD and We-
Formed XML doesn't.
Struture
XML applies structure to documents. Documents are sets of related information.
The term struture seems to bring some unpleasant imagery with it, especially for creative souls
who want to make this medium work in new and innovative ways. But when one is dealing with
publishing, the term structure is uite positive. !t is the way we put a skeleton behind information,
so that the pieces of information work together and make sense as a whole.
There are two key principles behind a structured model"
#. $ach part %% or element %% has a relationship with other elements. This series of relationships
defines the structure.
&. The meaning of the element is separate from its visual appearance.
Douments
'e can(t really talk about structure without first talking a bit about documents. Doument is
another of those terms that con)ures up somewhat negative images* one tends to picture +dusty
4
stacks of documents+ or +attorney(s documents+ or +document processing.+ But in this case, a
document is simply a collection related information.
,or e-ample, this page is a document. .our favorite (/ine is set of documents. .our intranet is
probably comprised of hundreds if not thousands of documents.
0ometimes documents are created as a single unit. 0ometimes they are built on demand, pulling
pieces from a database and assembling into a document as the reader reuests. !n both cases,
structure makes the document easier to create, maintain, and display.
Doument Struture
The doument struture defines the elements which make up a document, the information you
want to collect about those elements, and the relationship those elements have to each other.
.ou use XML to markup the document, following the structure you have decided upon.
By treating a document as a collection of elements, you free it from the constraints of time, place,
and presentation format. .ou can move the structured document from a word processor to a 1D2
to a web browser. The structure is intact on each* you )ust alter the display characteristics for
each device.
The document structure is called the doument tree. The main trunk of the tree is the parent. 2ll
the branches and leaves are children. Document trees are usually visually represented as a
hierarhal hart.
Struture !s" Format
The most important thing to remember about a structured document is that it is defined by the
elements it contains, not by how it looks.
0tructure says that an element is a paragraph. ,ormat says to display the paragraph in #& point
Times.
0tructure says the element is a book title. ,ormat says to display the book title in green bold body
te-t.
0tructure say the element is a social security number. ,ormat says to hide and not display the
social security number.
Learning to separate structure from format is critical in making good use of XML.
Metadata
Metadata is data about data. 2 key use of XML is to collect and work with metadata.
2t its most basic level, XML is a metadata language. That is, it is a way of assigning information
to pieces of data. The most obvious use of this is to identify a piece of data as a certain structural
element. But this is )ust the beginning.
XML is about much more than marking up documents for use in a web browser. XML is really
about adding layers of information to your data, so that the data can be processed, used, and
transferred between applications.
Metadata in HTML
!f you(ve built a website, you(ve almost certainly worked with metadata. The keyword and
description meta tags are simple uses of metadata. 'ith these meta tags you can assign the
document as a whole information about the general type of content it contains. This information
doesn(t display in a web browser, but it does display in search engine results.
2nother use of meta tags is to store information such as creator name and creation date. 0ome
servers are structured to work with these meta tags, allowing you to sort by creation date or
display based on creator name.
5
#oin$ Further %ith XML
XML takes this basic idea much further. 'ith XML, you can describe where you found your data,
you can uantify, ualify, and further define it. .ou can then use this metadata to validate
information, perform searches, set display constraints, or process other data.
3ere(s )ust a few e-amples"
XML initiatives are under way which will allow for digital signature verification and validated
form submission. This could make it possible for forms, with signatures, to be submitted online
and be legally binding.
XML initiatives are under way to help catalog web content. 4sing metadata, the web can be be
inde-ed better and search more effectively.
XML is being used to transfer data, based on factors )ust as date entered, between unlike
databases. The metadata is both a means to find the correct data bits and a common
language of transfer between databases which do not speak each other(s specific language.
The &DF 'ro(osal
5ne '67%blessed use of metadata which you may have heard about is a proposal called the
8esource Description ,ramework, or &DF. 8D, is an application of XML for making metadata
machine%processable. !t allows applications to e-change information about data automatically.
This has implications in inde-ing, content rating, intellectual ownership, e%commerce, and privacy,
among other things. The '67 says"
RDF with digital signature will be the key to building the "Web of
Trust" for electronic commerce, collaboration, and other
applications.
Dis(la) *ssues
XML alone will not display a page. .ou must use a formatting technology, such as 700 or X0L to display
XML%tagged documents in a 'eb browser.
XML is about separating structure and format. 2n XML document doesn(t know anything about
how to display itself. !t relies on other technologies for this.
2lthough XML does not deal with form, it contains a great deal of information about the document
and its elements. This, when combined with style tools, gives you a whole new strength and
fle-ibility in displaying your documents without having to maintain multiple copies of the
document.
XSL
$-tensible 0tylesheet Language, XSL, is the future of XML display. !t is an XML%based languages
for e-pressing stylesheets.
'ith X0L, you can make conte-t%sensitive display decisions. ,or e-ample, you could
automatically display the document one way in a 'eb browser and another on a 1D2.
X0L can also transform XML into 3TML, so that older browsers can view XML documents.
CSS
7ascading 0tyle 0heets, CSS-+ and CSS-,, are the current way to display XML documents in a
'eb browser. 700 is a means of assigning display values to page elements.
6
!f you are going to be working with XML and you will be concerned with displaying pages, learn
700. The CSS &e-erene #uide contains a guide to the 700%# properties.
.eha!iors
Behaviors are a non%standard, !$9 techniue that lets you do some interesting display actions
with XML tags. They combine scripting and 700 in a component file. This component can be
attached to a particular tag and used in many different documents. The .eha!iors Librar)
shows some of the things you can do with this techniue.
The D/M
The Document 5b)ect Model lets you address, change, and manipulate any individual portion of the 'eb
page.
The phrase +document ob)ect model+ means that you treat your document as a collection of
individual ob)ects, rather than a single solid unit. The '67 D5M is the set of rules for doing this
in a standard way in a 'eb browser, with 3TML and XML files.
/ is -or /b0et
!n an ob)ect%oriented approach, the program or the document is made up of many smaller
components called ob)ects. The smaller components can be re%arranged, added to, or removed
dynamically.
The idea of ob)ects has become uite popular in both software and documents. The
programming language :ava and the scripting language :ava0cript each has an ob)ect%oriented
philosophy at its core. The adoption of the standard D5M enables 'eb pages to share that
ob)ect approach too.
'ith an ob)ect model, you manage the small pieces, combining them and reusing them as it
makes sense %% instead of writing one huge applications program or one huge document. .ou
might think of an ob)ect approach as being a little like a collection of Lego blocks ... different
pieces do different things, but you can combine and recombine them into many different finished
pro)ects.
$ach ob)ect type acts a template. .ou can use an instane of the same ob)ect over and over
again. ,or e-ample, you might have multiple instances of the ;canine< element in a document.
2ll the ob)ects share the same name, canine, and work the same way, but each one represents
its own set of data and can be addressed individually.
The A'*
!t isn(t enough to merely know that an ob)ect is an ob)ect. .ou also need to know how to talk to
that ob)ect and give it commands. That(s where the 21! comes in.
21! stands for A((liation 'ro$rammin$ *nter-ae. 2n 21! is a set of rules that describes how
you can access and manipulate an ob)ect. The D5M specification describes the 21! for 3TML
and XML documents.
The D5M, by providing a standard 21!, defines the naming conventions, programming models,
and other rules for communicating with an ob)ect in an 3TML or XML page.
#ettin$ -rom XML to /b0ets
7
!n an XML document, each element is actually an ob)ect %% it has a name and it has attributes that
describe it.
The browser, combined with a stylesheet, displays each of the XML elements=ob)ects in a web
page. Because they are ob)ects, you can address and change them individually.
2h, but )ust knowing that every piece is an ob)ect isn(t enough. .ou need to have a set of rules,
an 21!, to describe how to address those ob)ects when they are placed in a web page. That(s
where the D5M comes in.
The D5M does three things %% you might think of it as e-plaining the +who, what, and how+ of the
web page.
#. ,irst, it describes who %% which ob)ects are a web page and how XML ob)ects are represented
there>
&. 0econd, it defines what %% what can these ob)ects do and how do they work with others>
6. Third, it defines how %% how can these ob)ects can be addressed>
The D5M is the translator, the interface that lets all the pieces be represented properly, talk to
each other, and communicate with scripts and other action tools.
!t is XML that lets you add and identify data, but it is the D5M that lets the script manipulate and
display that data on command in the web browser window.
'ullin$ *t All To$ether
.ou(ll typically be working with four technologies that combine to create an interactive 'eb page" XML ?or
3TML@, a scripting language, 700, and the D5M. This illustration shows their relationship.
XML identi-ies data" ,or e-ample" +Aing Lear+ is a title element.
CSS stores in-ormation about dis(la) !alues -or elements and delivers the information to
the browser. ,or e-ample" Titles are displayed in #B point black courier type.
8
The sri(t 1talks1 to the ob0ets and sends messa$es to and -rom the bro%ser about the
ob0ets" Typically these are +change your display+ or +do this+ messages based on user
actions or other variables. ,or e-ample" !f a particular title is out of stock, display it in red.
The D/M (ro!ides the ommon inter-ae through which various scripts and ob)ects talk to
one another and display in the 'eb browser.
The bro%ser dis(la)s the results to the end user.
!f any of these pieces are missing, you can(t create a dynamically%changing presentation of your
document.
2lement
2n element is the basic building block of 3TML and XML documents.
$lements are identified by a tag. The tag consists of angle brackets and content, and looks like
this"
<!T"#R$Thadius %. Frog<&!T"#R$
!n 3TML, you use a pre%defined set of elements. !n XML you create your own set of elements.
Attribute
Attributes are like ad)ectives, in that they further describe elements. $ach attribute has a name
and a value.
2ttributes are entered as part of the tag, like this"
<!T"#R dob="'()*"$Thadius %. Frog<&!T"#R$
Ta$
.ou use a tag to identify a piece of data by element name.
Tags usually appear in pairs, surrounding the data. The opening tag contains the element name.
The closing tag contains a slash and the element(s name, like this"
<AUTHOR>Thadius %. Frog</AUTHOR>
Attribute Value
2ttributes contain an attribute !alues. The value might be a number, a word, or a 48L.
2ttribute values follow the attribute and an eual sign, like this"
<!T"#R dob+"1874"$Thadius %. Frog<&!T"#R$
!n XML, attribute values are always surrounded by uotation marks.
Delaration
.ou begin an XML file with an XML declaration. The declaration states that this is an XML file.
The -ml declaration looks like this"
<?xml version="1.0"?>
DTD
Document Type Defintion. The DTD defines the elements, attributes, and relationships between
elements for an XML document.
9
2 DTD is a way to check that the document is structured correctly, but you do not need to use
one in order to use XML.
The XML Doument
2n XML file is an 207!! te-t file with XML markup tags. !t has a .-ml e-tension, like this"
booklist.,ml
*nside an XML File
2n XML file contains three basic parts"
#. 2 declaration that announces that this is an XML file*
&. 2n optional definition about the type of document it is and what DTD it follows*
6. 7ontent marked up with XML tags.
Clik on this (ara$ra(h to see a !er) sim(le e3am(le o- an XML doument" Clik on an
(art o- the doument to learn more about it"
T)(es o- XML Douments
There are two types of XML documents" well%formed or valid. The only difference between the
two is that one uses a DTD and the other doesn(t.
Well--ormed
Well--ormed documents conform with XML synta-. They contain te-t and XML tags. $verything
is entered correctly. They do not, however, refer to a DTD.
Valid
Valid documents not only conform to XML synta- but they also are error checked against a
Document Type Definition ?DTD@. 2 DTD is a set of rules outlining which tags are allowed, what
values those tags may contain, and how the tags relate to each other.
Typically, you(ll use a valid document when you have documents that reuire error checking, that
use an enforced structure, or are part of a company% or industry%wide environment in which many
documents need to follow the same guidelines.
DTDs
2 Document Type Definition ?DTD@ is a set of rules that defines the elements, element attribute
and attribute values, and the relationship between elements in a document.
'hen your XML document is processed, it is compared to its associated DTD to be sure it is
structured correctly and all tags are used in the proper manner. This comparison process is
called !alidation and is is performed by a tool called a parser.
8emember, you don(t need to have a DTD to create an XML document* you only need a DTD for
a valid XML document.
3ere(s a few reasons you(d want to use a DTD"
.our document is part of a larger document set and you want to ensure that the whole set
follows the same rules.
.our document must contain a specific set of data and you want to ensure that all reuired
data has been included.
.our document is used across your industry and need to match other industry%specific
documents.
10
.ou want to be able to error check your document for accuracy of tag use.
Deidin$ on a DTD
4sing a DTD doesn(t necessarily mean you have to create one from scratch. There are a number
of e-isting DTDs, with more being added everyday
Shared DTDs
2s XML becomes wide%spread, your industry association or company is likely to have one or
more published DTDs that you can use and link to. These DTDs define tags for elements that are
commonly used in your applications. .ou don(t need to recreate these DTDs %% you )ust point to
them in your doctype tag in your XML file, and follow their rules when you create your XML
document.
0ome of these DTDs may be public DTDs, like the 3TML DTD. 5thers may belong to your
company. !f you are interested in using a DTD, ask around and see if there is a good match that
already e-ists.
Create 4our /%n DTD
2nother option is to create your own DTD. The DTD can be very simple and basic or it can be
large and comple-. The DTD will be a reflection of the needs of your document.
!t is perfectly acceptable to have a DTD with )ust four or five basic elements if that is what your
document needs. Don(t feel that creating a DTD necessarily needs to be a huge undertaking.
3owever, if your documents are comple-, do plan on setting aside time %% several days or several
weeks %% to understand the document and the document elements and create a solid DTD that
will really work for you over time.
Make an *nternal DTDs
.ou can insert DTD data within your D57T.1$ definition. !f you(re worked with 700 styles, you
can think of this as being a little like putting style data into your file header. DTDs inserted this
way are used in that specific XML document. This might be the approach to take if you want to
validate the use of a small number of tags in a single document or to make elements that will be
used only for one document.
8emember, the primary use for a DTD is to validate that the tags you enter in your XML
document are entered as specified in the DTD. !t is an error%checking process that ensures your
data conforms to a set a rules.
XML S)nta3
Tagging an XML document is, in many ways, similar to tagging an 3TML document. 3ere are
some of the most important guidelines to follow.
&ule 5+6 &emember the XML delaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools
that this document contains XML tags. The declaration looks like this"
<-,ml .ersion+"'./" standalone+"yes&no" encoding+"!TF0("-$
.ou can leave out the encoding attribute and the processor will use the 4T,%B default.
&ule 5,6 Do %hat the DTD instruts
11
!f you are creating a valid XML file, one that is checked against a DTD, make sure you Anow
what tags are part of the DTD and use them appropriately in your document. 4nderstand what
each does and when to use it. Anow what the allowable values are for each. ,ollow those rules.
The XML document will validate against the specified DTD.
&ule 576 Wath )our a(itali8ation
XML is case%sensitive. ;1< is not the same as ;p<. Be consistent in how you define element
names. ,or e-ample, use 2LL 7210, or use !nitial caps, or use all lowercase. !t is very easy to
create mis%matching case errors.
2lso, make sure starting and ending tags use matching capitali/ation, too. !f you start a
paragraph with the ;1< tag, you must end it with the ;=1< tag, not a ;=p<.
&ule 596 :uote attribute !alues
!n 3TML there is some confusion over when to enclose attribute values in uotes. !n XML the rule
is simple" enclose all attribute values in uotes, like this"
<123 dob+"'45/"$6en %ohnson<&123$
&ule 5;6 Close all ta$s
!n XML you must close all tags. This means that paragraphs must have corresponding end
paragraph tags. 2nchor names must have corresponding anchor end tags. 2 strict interpretation
of 3TML says we should have been doing this all along, but in reality, most of us haven(t.
&ule 5<6 Close 2m(t) ta$s= too
!n 3TML, empty tags, such as <br> or <img>, do not close. !n XML, empty tags do close. .ou
can close them either by adding a separate close tag ?;=tagname<@ or by combining the open
and close tags into one tag. .ou create the open=close tag by adding a slash, =, to the end of the
tag, like this"
<br&$
23am(les
This table shows some 3TML common tags and how they would be treated in XML.
Tag Comment End-Tag
<P> Techncay, n HTML, you're supposed to cose
ths tag. In XML, t's essenta to cose t.
</P>
<ELEMENT> A Eements n XML must have a Start-tag
and an end-tag.
</ELEMENT>
<LI> Ths tag must be cosed n XML n order to
ensure a We-Formed XML document.
</LI>
<META
name="keywords"
content="XML, SGML,
HTML">
META tags are consdered empty eements n
XML, and they must cose.
<META
name="keywords"
content="XML, SGML,
HTML"/>
<BR> Break tags are consdered empty eements. <BR/>
<IMG src= Ths s an empty eement tag. <IMG src=
12
"coopctures.htm"> "coopctures.htm"/>
7opyright C #DDB%DD
Well--ormed XML
2 document that conforms to the XML synta- rules is called +well%formed.+ !f all your tags are
correctly formed and follow XML guidelines, then your document is considered a well%formed
XML document. That(s one of the nice things about XML %% you don(t need to have a DTD in order
to use it.
.e$in the Well--ormed Doument
To begin a well%formed document, type the XML declaration"
<-,ml .ersion+"'./" standalone+"yes" encoding+"!TF0("-$
!f you are embedding XML, it will go after the ;3TML< and ;3$2D< tags, and before any
:avascript.
!f you are creating an XML%only document, it will be the first thing in the file.
Version
.ou must include the version attribute for the XML declaration. The version is currently +#.E.+
Defining the version lets the browser know that the document that follows is an XML document,
using XML #.E structure and synta-.
Standalone
The ne-t step is to declare that the document +stands alone.+ The application that is processing
this document knows that it doesn(t need to look for a DTD and validate the XML tags.
2nodin$
,inally, declare the encoding of the document. !n this case, the encoding is 4T,%B, which is the
default encoding for XML. .ou can leave off this attribute and the processor will default to 4T,%B.
&emember the &oot 2lement
2fter the declaration, enter the tag for the root element of your document. This is the top%most
element, under which all elements are grouped.
Follo% XML S)nta3
Fow, enter the rest of the your content. 8emember to follow XML synta-"
8emember that capitali/ation matters*
Guote all attribute values*
7lose all tags*
8emember to close empty tags too, like this"
<br&$
1retty easy, isn(t it> That(s all there is to itH
Valid XML
2 valid document conforms to the XML synta- rules and follows the guidelines of a Document
Type Definition ?DTD@.
13
The process of comparing the XML document to the DTD is called !alidation. This process is
performed by a tool called a parser.
.e$in the Valid XML Doument
To begin a well%formed document, type the XML declaration"
<-,ml .ersion+"'./" standalone+"no" encode+"!TF0("-$
!f you are embedding XML, it will go after the ;3TML< and ;3$2D< tags, and before any
:avascript.
!f you are creating an XML%only document, it will be the first thing in the file.
Version
.ou must include the version attribute for the XML declaration. The version is currently +#.E.+
Defining the version lets the browser know that the document that follows is an XML document,
using XML #.E structure and synta-.
Standalone
The standalone="no" attribute tells the computer that it must look for a DTD and validate the XML
tags.
2nodin$
,inally, declare the encoding of the document. .ou can leave off this attribute and the processor
will default to 4T,%B.
Create a D/CT4'2 De-inition
The second element in a valid XML document is the D57T.1$ definition. This identifies the type
of document and DTD in use.
!f you look at 3TML source files, you(ll often see a HD57T.1$ definition, especially if the file was
created by a '.0!'.I tool. The D57T.1$ definition points to an 3TML DTD.
!n a valid XML file, HD57T.1$ tells the program that is processing your XML file two things" the
name of the type of document and the name and location of the DTD against which to validate
the file(s contents.
The D57T.1$ definition looks like this"
<7D#8T9:3 type0of0doc ;9;T32&:!6<=8 "dtd0name"$
>D/CT4'2
This says that you are defining the D57T.1$.
t)(e-o--do
This is the name of the type of document contained in this file. Typically, this is the same name as
the DTD.
S4ST2M?'@.L*C
0.0T$M tells the processor to look for the private DTD at the following location. 14BL!7 tells the
processor to look for a public DTD at the following location.
1dtd-name1
The 48L after 0.0T$M or 14BL!7 is the name of the dtd file. 2ll DTDs end with the e-tension
.dtd.
!f you want, instead of pointing to an e-ternal DTD, you could place the DTD information within
the D57T.1$ definition, making it local to your XML document. .ou should do this only if you
want to define a few simple elements and you want them permanently attached to a particular
document.
&emember the &oot 2lement
2fter the declaration, enter the tag for the root element of your document. This is the top%most
element, under which all elements are grouped.
Follo% XML S)nta3
14
Fow, enter the rest of the your content. 8emember to follow XML synta-"
8emember that capitali/ation matters*
Guote all attribute values*
7lose all tags*
8emember to close empty tags too, like this"
<br&$
2lements
$lements are the basic building blocks of XML ?and 3TML, for that matter@. $ach element is a
piece of data, identified by a tag. The tag contains the name of the element and any of its
attributes, like this"
<!T"#R dob+"'(5*"$Thadius %. Frog<&!T"#R$
Thadius :. ,rog is now identified as an author element. This particular author element as a date
of birth ?dob@ attribute value of #BJK.
Chose 4our /%n
XML is an e3tensible markup language. This means you create a set of elements that work for
your content %% and that you(ll be able to use consistently within the document.
'hether you use a DTD or not, you(ll still want to sit down and write a list of the element names
that you will be using in your document. XML is case%sensitive, so as you(re thinking about the
element names, be sure the think about how you capitali/e them also.
0elect names that are both easy to rememberer and easy to type. !deally, your tags should have
some inherent meaning too. This makes them easier to use. ,or e-ample, if you want to identify
+last name+ as an element, consider naming the element something like +last%name+ or
+surname.+
Be consistent in your use of names. !t is easier to apply one set of general rules to &E different
tags than it is to remember eight discrete tags that follow no particular pattern. ,or e-ample, if
your document is a listing of classes, you could use these elements"
<list0of0classes$
<name0class$
<instructor0name$
<;ec$
<T=23$
<descprt$
But you(re )ust asking for confusionH
There(s a mi- of capitali/ation. There(s a mi- of abbreviation and full words. !n one case the
phrase +name+ is the first part of tag* in the other it is the second part of the tag. !t isn(t logical to
remember this set of names.
'ouldn(t names like this be easier to use>
<classlist$
<class$
<section$
<time$
<instructor$
<description$
Theses names are all lowercase, full words, no plurals %% and easy set of criteria to remember.
Fous on Struture= Aot Format
5ne of the goals of using XML is to separate structure ?+this is an author+@ from format ?+display
this in #E point 3elvetica+@. $lements remain identified as elements, no matter what platform you
move the data to. 2n XML document is completely interpretable.
15
'hen you think about elements, think about the role they play and the data they contain. Don(t
think about how the elements will look on the page. 2ppearance is handled separately.
.ou are using elements to identify data within your document as playing a certain role or
belonging to a certain category of data.
Dis(la)in$ 2lements
.ou can use any tag name you want, as long as you follow proper XML synta-. 5f course, those
tags alone won(t do anything. They will )ust sit there uietly, marking up your data.
2fter you data is marked up, you(ll use style sheets or other processing tools to display the XML
document. .ou can control the display based on information contained in the elements.
@sin$ 2lements
!n a well%formed XML document, you can insert any element tag you want, as long as you follow
proper synta-.
!n a valid XML document, only the elements which are specified in the DTD will pass muster. !f
you randomly add other elements, their use will be flagged as an error.
'hen you use elements in an XML document, you must follow standard XML synta-"
The element name surrounds the data which it defines. ,or e-ample" <chapter0
head$Tying >nots<&chapter0head$.
2ll elements, including empty elements, must end. This means having an open and close tag
for regular elements and a tag that closes with a slash for empty elements.
The element name is case sensitive" <!T"#R$, is not the same as <author$.
DTDs and 2lements
5ne of the ways to define and codify all your elements is to create a DTD. 2 DTD defines the
allowable elements, their attributes ?if any@ are, and their relationship is to other elements.
By validating your XML document against a DTD, you can test to be sure that elements in the
documents are being used correctly.
Attributes
2ttributes provide additional information about elements.
.ou use elements and attributes all the time in 3TML. ,or e-ample, in 3TML, a tag such as <"'
align+"center"$ includes an element" "', and an attribute" align and an attribute value"
center.
!n 3TML, attributes allow you to specify additional information about your elements. 5ften this
information is formatting%related, such as align or si/e. !n XML, attributes allow you to specify
additional data about an element, but it is never formatting%related. !t is, instead, additional data
about that particular element.
Let(s say, for e-ample, you(re creating documents about late &Eth century popular music. !n your
DTD you(ve created an element called <;#1?$ which identifies each musical title. .ou have
music that falls into different decade categories %% the LE(s, the BE(s and the DE(s. .ou can give
the song element an attribute called era. Fow, you(ll be able to know from what era each song
dates.
By using an attribute, you can identify different versions of the same song %% +!(ve Iot .ou Babe+
from the #DJEs and +!(ve Iot .ou Babe+ from the #DBEs. Later on, you can use this data to
display all LEs songs in green, or to sort the displayed titles by era.
.ou would use the attribute like this"
<;#1? era+"5/s"$=@.e ?ot 9ou 6abe<&;#1?$
16
<;#1? era+")/s"$6illy Don@t 6e a "ero<&;#1?$
<;#1? era+"(/s"$=@.e ?ot 9ou 6abe<&;#1?$
+!(ve Iot .ou Babe+ is identified as a +song+ element with an +era+ attribute value of +JEs+. +Billy
Don(t Be 2 3ero+ is identified as a +song+ element with an +era+ attribute value of +LEs+. +!(ve Iot
.ou Babe+ is identified as a +song+ element with an +era+ attribute value of +BEs+.
2ttributes and their allowable values are created in your DTD, when you specify elements. They
are specified through an attribute list. Like element names, attribute names are case%sensitive,
so be aware of your use of capitali/ation when you select and use attribute names.
5ne other important thing to remember about attributes in XML tags is that the attribute values
must always be contained inside uotes. !n 3TML it(s a mi-ed bag, but in XML the rule is easy to
remember" uote all attribute values.
Comments
7omments are a way to add your own notes to an XML document. The browser and the XML
processors will ignore anything inside comments.
.ou aren(t going to remember what you were thinking three months later when you return to edit
the document, so don(t be afraid to add comments as reminders or as markers of work that you
have done.
To create a comment"
#. Type a less than sign, followed by an e-clamation point and two dashes like this"
<700
&. Type the te-t you want inside the comment. Be sure the te-t D5$0 F5T contain two dashesH
<700Tis de!ines " lis#in$ o! boo%s
6. Fow, close the comments, with two dashes and a closing greater than tag"
<700This defines a listing of books&&>
CDATA
7D2T2 stands for +character data.+ 7haracter data are letters, numbers and other symbols that
are used e-actly as they are typed. They are not parsed or processed, or treated as if they have
any special meaning.
.ou can create a 7D2T2 section within your XML document. 2 7D2T2 section is handy way to
show code e-amples or to use characters, such as < that would otherwise take on a special
meaning. .ou can use 7D2T2 instead of using a series of Mlt*, for e-ample.
To create a 7D2T2 section"
#. 2t the place in the document where you want the 7D2T2 section to appear, begin a 7D2T2
definition with the less than sign and an e-clamation point.
<7
&. Type an open suare brace and the letters 7D2T2.
<7'()ATA
6. Type another open suare brace.
17
<7A8DT'
K. Fow type the 7D2T2 itself. !n this e-ample, we are typing some sample code.
<7A8DTA<*A+, -ommon="!redd." breed"s/rin$er&s/"niel">0ir 1redri-% o!
2ed."rd3s ,nd</*A+,>
9. $nd the section with two closing suare bracket and a greater than symbol.
<7A8DTA<123 common+"freddy" breed"springer0spaniel"$;ir Fredrick of
<edyard@s 3nd<&123$44>
7lick anywhere on this code to see how it would be displayed in a browser, assuming of course,
that it is linked to a stylesheet"
<H,A)1>
,n#erin$ " 5ennel (l6b +ember
</H,A)1>
<),0(R78T7O*>
,n#er #e member b. #e n"me on is or er /"/ers. Use #e *A+, #"$. Te *A+,
#"$ "s #9o "##rib6#es. (ommon :"ll in lo9er-"se; /le"se<= is #e do$3s -"ll
n"me. >reed :"lso in "ll lo9er-"se= is #e do$3s breed. 8le"se see #e breed
re!eren-e $6ide !or "--e/#"ble breeds. ?o6r en#r. so6ld loo% some#in$ li%e
#is@
</),0(R78T7O*>
<,AA+82,>
<<'()ATA'<*A+, -ommon="!redd." breed"=s/rin$er&s/"niel">0ir 1redri-% o!
2ed."rd3s ,nd</*A+,>44>
</,AA+82,>
Aames(aes
Famespaces are a way of using elements from more than one DTD within the same XML
document.
0ometimes you may be working with material that draws on several sets of element tags. ,or
e-ample, you might have an online store selling tropical fish and you(d like to use the
;05487$< tag to identify both the geographic location from which each species comes and the
wholesaler from whom you buy it. Famespaces are a way to do this.
2n XML namespace is a collection of names, identified by a 48! reference, which are used in
XML documents as element types and attribute names. !n practice, namespaces let you match a
tag you are using with a particular set of tags.
!n the beginning of your document ?or at the start of a particular element of your document@, you
identify the namespaces you(ll be using and where the tag information is located. Then, when you
use the tag to identify an element in your document, you precede it with the appropriate
namespace name.
Delarin$ Aames(aes
2t the beginning of your document, you(ll want to identify the namespaces you are using in your
document. This process is called delarin$ the namespace. !n this e-ample, you are creating a
namespace called +sales.+ The 48! for sales is the mythical fishworld.org=schema"
<document ,mlnsB;<3;+@httpB&&fishworld.org&schema@$
@sin$ Aames(aes
18
'hen you use the tag to create the element that is defined in one of the namespaces, the
namespace is the first part of the tag, like this"
<;<3;B ;#!R83$Fish0o0Rama Wholesalers and ;uppliers to the Trade<&;#!R83$
'hen you use your own tag you )ust use the tag name, like this"
<;#!R83$2e,ico, 8entral merica<&;#!R83$
!n :anuary #DDD, Famespaces became a W7C &eommendation.
XML 2ntities
2n entity is a short cut to a set of information.
'hen you use an entity, it +e-pands+ to its full meaning, but you need only type the shorter entity
name during data entry. .ou might think of an entity as being a bit like a macro %% it is a set of
information that can be used by calling one name.
XML defines two types of entities. The $eneral entit), which we(ll talk about here, is used in XML
document. The parameter entity is used in DTDs. Ieneral entities are easy to spot" they begin
with the ampersand and end with the semicolon, like this"
Centity0nameD
@ses -or 2ntities
$ntities are a way to make entering and managing data easier.
.ou(ve probably already used entities without calling them that. !f you(ve ever entered the
characters Mlt* to create the ; symbol, you(ve used an entity. This keystroke combination is a
standard predefined entity in both 3TML and XML that lets you access a particular ascii character
without having to memori/e the character set number.
3ere are a few reasons you might want to define and use entities"
2ntities sa!e t)(in$" 0uppose you have a paragraph, like a copyright notice, that you use in
every single document. .ou could type that notice over and over again. 5r, you could use an
entity to call it forth in place.
2ntities an redue errors. By the #E#st time you type that copyright notice, it is likely your
poor fingers will be so tired you(ll make an error and set your copyright for #DBD instead of
#DDD. 4sing an entity can reduce the potential for these types of errors.
2ntities are eas) to u(date" !t is time to update that copyright notice %% with an entity you can
make the change in one place and be done with it. 'ithout an entity you(d be searching and
replacing throughout your document set.
2ntities an at as (laeholders -or T.D in-ormation" Maybe legal hasn(t uite finali/ed
what they want that copyright notice to say. That doesn(t have to stop production %% you can
use and entity and when the final wording comes down, the entity will automatically display the
new, corrected version in all your documents.
.ou can get uite creative with the use of entities, and even have documents that are constructed
entirely from entities. 3ere(s an e-ample"
.ou want to create different documents, each contains a set of bios for members of your staff.
.ou(ll have an e-ecutive set, a set for each product line, a set for si- different regions around the
world ... subsets of the same content appears in each.
5ne approach you could take is creating #E or #& separate flat files, with the appropriate
biography information into each. But an easier way is to create a small file for each bio, then call
each into the e-ecutive page, the $uropean page, the ,lying Toys Division page and so on via an
entity.
3ere(s how the content code for your ,lying Toys Division 1age might look. 4pon display, the
entities would e-pand and you(d see the full bios of each person. !f you needed to change the
bios, you could do it in one place. !f the product manager changed, all your pages would be
automatically updated with the new person.
7lick anywhere in the code to see how it might e-pand into a displayed document"
19
<H,A)>Te 1"-es >eind 1l.in$ To.s<</H,A)>
<>7O>Bbio&!#&div&e"dC</>7O>
<>7O>Bbio&!#&/rod&m$rC</>7O>
<>7O>Bbio&!#&desi$nerC</>7O>
<>7O>Bbio&!#&le"d&en$ineerC</>7O>
De-inin$ 2ntities
.ou can define entities in your local document as part of the D57T.1$ definition. .ou can also
link to e-ternal files that contain the entity data. This, too, is done through the D57T.1$
definition. 2 third option is to define the entities in your e-ternal DTD.
4se a local definition when the entity is being used only in this one particulars file. 4se a linked,
e-ternal file when the entity being used in many document sets.
To define an entity"
#. 0tart your D57T.1$ definition as usual, like this"
<7D#8T9:3
&. Fow mark that you are defining some data by entering a suare bracket"
<7D#8T9:3 A
6. 0tart the entity definition, with a less than sign, an e-clamation mark, and the phrase $FT!T.,
all in caps"
<7D#8T9:3 A
<731T=T9
K. Type the name of the entity. Type it using the capitali/ation that you will use when calling it
later on.
<7D#8T9:3 A
<731T=T9 copyright
9. !f you are defining the entity locally, type the value of the entity, surrounded by uotes, and
then close the entity definition with a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
authoriFation contact legalGworldspins.com."$
J. !f you are defining an entity in an e-ternal, ascii te-t file, put in a pointer to the e-ternal file,
then close the entity definition with a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright ;9;T32
"httpB&&www.worldspins.com&legal&copyright.,ml"$
L. 7reate all your entity definitions. 'hen you are done, close the D57T.1$ definition with a
suare brace and a greater than sign.
<7D#8T9:3 A
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
20
authoriFation contact legalGworldspins.com."$
<731T=T9 trademark ;9;T32
"httpB&&www.worldspins.com&legal&trademark.,ml"$
H
$
@sin$ 2ntities
To use an entity in your document, )ust call it by name. The name begins with an M and ends with
a semi%colon.
7lick anywhere on this code to see how it would display, assuming of course, that it was linked to
a style sheet.
<?xml version="1.0">
<<)O(T?8, '
<<,*T7T? -o/.ri$# "(o/.ri$# D000; As Te Eorld 0/ins (or/. All ri$#s
reserved. 8le"se do no# -o/. or 6se 9i#o6# "6#oriF"#ion. 1or "6#oriF"#ion
-on#"-# le$"lG9orlds/ins.-om.">
<<,*T7T? #r"dem"r% 0?0T,+ "##/@//999.9orlds/ins.-om/le$"l/#r"dem"r%.xml">
4
>
<8R,00R,2,A0,>
<H,A)>+ini&$lobe revol6#ioniFes %e.-"in ind6s#r.</H,A)>
<2,A)>
Tod". As Te Eorld 0/ins in#rod6-es " ne9 "//ro"- #o %e. -"ins. Ei# #e ne9
+7*7&H2O>, %e.s -"n be %e/# inside " -"in; -"lled !or 6/on dem"nd; "nd s#ored
s"!el.. *ever more 9ill -ons6mers lose " %e. or s#"nd "# " door !li//in$
#ro6$ " s#"-% o! %e.s see%in$ #e ri$# one.
</2,A)>
<2,HA2>
B#r"dem"r%C
B-o/.ri$#C
</2,HA2>
</8R,00R,2,A0,>
XML DTDs6 *ntrodution
Nalid XML documents follow a set of rules defined in a associated DTD. This Document Type Definition
defines elements, attributes, and relationships between elements.
DTDs are saved in an ascii te-t file with the e-tension .dtd, like this"
mypage.dtd
'hen your XML document is processed, it is compared to its associated DTD to be sure it is
structured correctly and all tags are used in the proper manner. This comparison process is
called !alidation and is is performed by a tool called a parser.
8emember, you don(t need to have a DTD to create an XML document* you only need a DTD for
a valid XML document.
.e-ore 4ou .e$in
21
There are a handful of terms you(ll be hearing as you work with an XML DTD. Take a couple of
minutes to become familiar with them before you begin. 7lick on any of the terms to see its
definition.
Shema
2 shema is a description of the rules for data.
2 schema does things"
#. !t defines the elements in a data set and their relationship to each other.
&. !t defines the content that can be contained in each element.
DTDs are a schema for XML documents.
DTD
Document Type Defintion. The DTD defines the elements, attributes, and relationships between
elements for an XML document.
2 DTD is a way to check that the document is structured correctly, but you do not need to use
one in order to use XML.
Doument Tree
2 doument tree is the representation of the hierarchy of elements in a document.
2 document tree has one root element. 2ll other elements are part of this top%level element. The
first tag in your XML document is always the root element.
&oot 2lement
The root element is the top%most element in the hierachy. 2ll other elements in a document are
children of this element.
!n an XML file, the first tag is the root element(s tag.
!n the DTD, the root element is the first element you should define.
'arent 2lement
2 (arent element is a element which contains other elements. The other elements are called
children.
,or e-ample, a list is a parent. The list items are children.
2 parent element is sometimes referred to as a branh element. $ach branch sprouts off the
tree* from the branch hang other brances and individual leaves. The branches and leaves
+belong+ to the parent branch.
Child 2lement
The hild element a sub%set of the parent element.
2n element may be both a parent and a child at the same time. ,or e-ample, the list element
may be a child of the root element. 2t the same time it is the parent of the list item element.
!f a child element is the outer%most element in the hierachy and does not contain any other
elements it is sometimes called a lea- element.
22
'arser
2 (arser is a software tool that checks to be sure a document follows a particular synta-.
XML (arsers come in two varieties"
2 non-!alidatin$ (arser checks a document to be sure XML synta- rules are followed and
builds a document tree from the element tags.
2 !alidatin$ (arser checks the synta-, builds the tree, and compares the use of element tags
to be sure they conform with the rules specified in the document(s associated DTD.
1aresers can be either e-ternal programs or part of the editing tool or browsing tool.
The XML 8eference section includes a list of some of the XML parsers
DTD Contents
2 DTD is a way to ensure that an XML document uses elements correctly. !t contains a set of
rules. 'hen your XML document is processed, it is compared to its associated DTD to be sure it
is structured correctly and all tags are used in the proper manner.
2 DTD"
2lways contains rules that define elements.
2lways contains rules that define the relationship between elements.
May contain rules that define attributes for elements, althought not all elements have
attributes.
May contain rules that define entities.
May may contain rules that define notations
Findin$ a DTD
4sing a DTD doesn(t necessarily mean you have to create one from scratch. There are a number
of e-isting DTDs, with more being added everyday.
Shared DTDs
2s XML becomes wide%spread, your industry association or company is likely to have one or
more published DTDs that you can use and link to. These DTDs define tags for elements that are
commonly used in your applications. .ou don(t need to recreate these DTDs %% you )ust point to
them in your doctype tag in your XML file, and follow their rules when you create your XML
document.
0ome of these DTDs may be public DTDs, like the 3TML DTD. 5thers may belong to your
company. !f you are interested in using a DTD, ask around and see if there is a good match that
already e-ists.
Create 4our /%n 23ternal DTD
2nother option is to create your own DTD. The DTD can be very simple and basic or it can be
large and comple-. The DTD will be a reflection of the needs of your document.
!t is perfectly acceptable to have a DTD with )ust four or five basic elements if that is what your
document needs. Don(t feel that creating a DTD necessarily needs to be a huge undertaking.
3owever, if your documents are comple-, do plan on setting aside time %% several days or several
weeks %% to understand the document and the document elements and create a solid DTD that
will really work for you over time. 8emember, you(ll be able to use this DTD with many individual
documents, so it is worth the time to think it through and craft it well.
23
Create 4our /%n *nternal DTDs
.ou can insert DTD data within your D57T.1$ definition in an individual XML document. !f
you(re worked with 700 styles, you can think of this as being a little like putting style data into
your file header.
DTDs inserted this way are used in that specific XML document only. This might be the approach
to take if you want to validate the use of a small number of tags in a single document or to make
elements that will be used only for one document.
*nternal DTDs
.ou can insert DTD data within your doctype declaration. This type of DTD is used only by the
one specific XML document that contains it.
This is a very simple e-ample of DTD data within the doctype declaration. 7lick on any line of the
code to learn what it does.
<<)O(T?8, boo%s '
<<,2,+,*T #i#le :I8()ATA=>
<<,2,+,*T "6#or :I8()ATA=>
<<,*T7T? -o/.ri$# "(o/.ri$# 1JJJ; 1l.in$ To.s 7n-.; "ll ri$#s reserved.">
4>
23ternal DTDs
DTDs are stored as ascii te-t files with the e-tenstion .dtd. $ach file begins with a D57T.1$
definition and includes a seres of element definitions, attribute lists, entity defintions and notation
definitions. 3ere(s an e-ample* this might be the DTD for a set of documents about books. 7lick
on any line for more information about it"
<<&&Tis de!ines " lis#in$ o! boo%s&&>
<<)O(T?8, boo%lis# '
<<,2,+,*T boo%lis# :#i#le; "6#or=>
<<,2,+,*T #i#le :I8()ATA=>
<<,2,+,*T "6#or :I8()ATA=>
<<ATT270T #i#le :/"/erK-lo#K"rd= "/"/er">
<<,*T7T? -o/.ri$# "(o/.ri$# 1JJJ; 1l.in$ To.s 7n-.; "ll ri$#s reserved.">
4
>
DTDs can be much more comple- than this e-ample %% and they typically are %% but this gives you
a sense of what they can do. !t(s )ust a matter of structuring your data and figuring out the +parts+
of your content.
&eadin$ a DTD
$ven if you don(t plan to build a DTD from scratch, it is helpful to know how to read one and to
understand the document it is describing.
,rom reading a DTD you should be able to compile a list of elements and their attribute, and how
and when to use them. .ou should also be able to compile a list of entities that you can use
within the document.
0ome people find it helpful to actually sketch out a document tree as they go through the DTD, to
visuali/e the structure of the document.
Chek List
3ere(s a list of things to look for as you go through a DTD"
&ead the Comments
24
Aote the .asi 2lements
&ead the 2lement Delaration
Look -or 'arent?Child &elationshi(s
&ead Attribute Lists
Find Attribute Aames -or 2ah 2lement
Determine Attribute Value T)(es
See the Attribute's De-ault
&ead 2ntit) Delarations
&ead the Comments
8ead the commentsH 7omments can tell you a lot about the DTD, how to use it, and what to be
aware of when using it.
Most DTD authors will include information that you should know before using the DTD. This might
range from use restrictions to how%to information.
7omments look like this"
<700 "ere@s a comment 00$
Aote the .asi 2lements
Look through the DTD and identify the element names that comprise the document. Fote how
they are capitali/ed. .ou might want to develop a reference sheet of elements, that you can make
notes on as you work your way through the DTD.
$lements begin like this"
<73<3231T
The te-t immediately after the element declaration is the element(s name.
&ead the 2lement Delaration
$ach element declaration provides the name of the element and the content which it contains.
0ometimes the content is te-t. 5ther times is other elements, arranged in a certain order or used
a certain number of times.
7lick on each portion of these element declarations to learn about the rules they describe.
<<,2,+,*T ,+82O?,, :17R0T; +7; 2A0T=>
<<,2,+,*T 17R0T :I8()ATA=>
<<,2,+,*T +7 :I8()ATA=>
<<,2,+,*T 2A0T :I8()ATA=>
Look -or 'arent?Child &elationshi(s
The element rules build a hierarchy of element, describing how one element is related to another.
2nd element that is contained within another is considered a child of the element in which it is
contained. 4se these relationships to sketch out your document tree.
The parent=child relationship is defined in the content type portion of the element definition. !f the
content type is another element, then those elements are children of the element whose definition
you are reading. ,or e-ample" ,!80T, M!, and L20T are children of $M1L5.$$"
<73<3231T 32:<#933 IF=R;T, 2=, <;TJ$
The DTD can reuire that the child elements be used in a ertain order or that they be used
one, none, or man) times. !t can also $rou( elements to create more detailed rules.
&ead Attribute Lists
2fter element definitions, you may see attachment lists. 2n attachment list begins like this"
<7TT<=;T
25
$ach attribute list defines the attributes for an element. Many attributes may be defined in one
2TTL!0T.
The 2TTL!0T is structure like this"
<7TT<=;T element0name attribute0name attribute0type default0data$
See Whih 2lement the Attribute De-ines
8ight after the 2TTL!0T declaration is the name of an element. This is the element that the
attribute list defines. ,or e-ample, this 2TTL!0T defines the 75MM$FT element"
<7TT<=;T 8#2231T attribute0name attribute0type default0data$
Find Attribute Aames -or 2ah 2lement
,ollowing the element name is the name of the first attribute declared in this list. This name is the
attribute name you type into the element tag in the XML file. ,or e-ample, this 2TTL!0T defines
the attribute +category+ for the element 75MM$FT.
<7TT<=;T 8#2231T category attribute0type default0data$
2dd the attribute information to the element reference list you are building.
Determine Attribute Value T)(es
2ttributes can be one of several different types. The attribute%type describes the t)(e o- !alue
that the attribute may contain. ,or e-ample, this 2TTL!0T says that the +category+ attribute for the
element 75MM$FT contains one of four values" red, green, blue, or other.
<7TT<=;T 8#2231T category Ired K greenK blueK otherJ default0data$
See the Attribute's De-ault
The final part of the 2TTL!0T is the default value of the attribute. The de-ault !alue has a strong
effect on how the attribute is used and what values it might have if you don(t use it in the XML
tag. .ou can make the value reuired ?O8$G4!8$D@ or optional ?O!M1L!$D@. 5r, you can provide
a default value that will be used automatically if the attribute is not entered.
&ead 2ntit) Delarations
2long with element and attribute definitions, you may also see entity definitions. Typically, these
will appear in a group, often at the beginning of the DTD, and usually with e-planatory comments.
2n entity definition begins like this"
<731T=T9
2fter the declaration, is the entity(s name and the contents of the entity. The contents may be te-t
or it may be a pointer to another e-ternal file. ,or e-ample, this defines two entities, one called
+copyright+ and one called +trademark.+ 7opyright is defined within the definition, while trademark
points to another file.
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll rights
reser.ed. :lease do not copy or use without authoriFation. For authoriFation
contact legalGworldspins.com."$
<731T=T9 trademark ;9;T32 "httpB&&www.worldspins.com&legal&trademark.,ml"$
Makin$ 2lements
$lements are the basic building blocks of XML. .ou define elements in a DTD* you use them in a
document. 2 basic element definition looks like this"
<73<3231T D3;8R=:T=#1 IL:8DT, D3F=1=T=#1JM$
26
2lement Delaration
$ach element begins with an element declaration, ;H$L$M$FT. This announces that you are
defining an element.
2lement Aame
2fter the declaration is the element(s name. The way the name appear in the element definition is
e-actly the way it must be used in the XML document. 7apitali/ation countsH
2lement &ule
2fter the name comes a rule that describes what the element can contain. Through this
description, the element take on hierarchal relationships with each other.
2lthough the basic bits of the rules are simple, they can be grouped and combined to create uite
comple- definitions.
This table summari/es the element rule definitions.
Contents
$lements can contain te-t, other elements, a combination of te-t and other elements, or they may
be empty.
Te3t" $lements can contain te-tual data.
/ther 2lements" $lements can contain only other specified elements and no te-t. The contained
element are called children of the containing element. The containing element is the parent of the
child elements.
Combination" $lement can contain a mi- of te-tual data and other specific elements.
2m(t)" $mpty elements get their value from their attributes. 2n empty element will typically have
at least one attribute. !n 3TML, the !MI tag is a good e-ample of an empty element. !t gets its
value from the src attribute.
Aumber o- /urenes
.ou can specify the number of times a child element is used within its parent.
/ne and onl) one" The element listed by itself indicates that it can be used once and only
once"
DTD definition 4sed in document
<73<3231T 3N31T<=;T
I3N31TJ$
<3N31T<=;T$
<3N31T$6alsa Wood Flyer Days<&3N31T$
<&3N31T<=;T$
At least one= or man) times" The element followed by a plus sign indicates that this element
can be used many times with the parent"
DTD definition 4sed in document
<73<3231T 3N31T<=;T
I3N31TOJ$
<3N31T<=;T$
<3N31T$6alsa Wood Flyer Days<&3N31T$
<3N31T$;undays in the :ark<&3N31T$
27
<3N31T$Teach 9our 8hild to Fly<&3N31T$
<&3N31T<=;T$
/ne or not at all" The element followed by a uestion mark indicates that this element can be
used either one time or not at all"
DTD definition 4sed in document
<73<3231T 3N31T I<#8T=#1,
;:#1;#R-J$
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<&3N31T$
or
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T$
/ne= not at all= or a man) times as )ou %ant" The element followed by an asterisk indicates
that this element can be used as many time as needed.
DTD definition 4sed in document
<73<3231T 3N31T I<#8T=#1M,
3N31T0123J$
<3N31T$
<<#8T=#1$West 6ay 6allpark<&<#8T=#1$
<<#8T=#1$1orth ;ide :ark<&<#8T=#1$
<3N31T0123$;undays in the :ark<&3N31T0
123$
<&3N31T$
or
<3N31T$
<3N31T0123$;undays in the :ark<&3N31T0
123$
<&3N31T$
/rder
.ou can specify the order in which child elements appear.
S(ei-i order" 7hild elements can be defined to be used in a specific order. The comma ?,@
separates elements that are listed in a specific order. ,or e-ample, you could set a rule that
creates an $N$FTL!0T. !n the list, you must always use the $N$FT element, followed by the
015F058 element.
DTD definition 4sed in document
<3N31T<=;T I3N31T,
;:#1;#RJ$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T<=;T$
28
2ither /r" .ou can define child elements so that one or another can be used. The bar ?P@
separates either or choices.
DTD definition 4sed in document
<3N31T I3N31T0123 K
;:#1;#RJ$
<3N31T$
<3N31T0123$6alsa Wood :lane Days<&3N31T0
123$
<&3N31T$
or
<3N31T$
<;:#1;#R$Flying Toys<&;:#1;#R$
<&3N31T$
#rou(s
Iroups can be used to create comple- rules, that combine elements and different usage option.
,or e-ample, when groups are combined with a +use many times+ symbol, you can create a rule
that allows multiple uses of elements %% either in in any order or as repeated sets. ,or e-ample,
here the element $N$FTL!0T can contain multiple sets of $N$FT and 015F058 groups"
DTD definition 4sed in document
<3N31T<=;T I3N31T,
;:#1;#RJM$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T0123$
<;:#1;#R$Flying Toys<&;:#1;#R$
<3N31T$;undays in the :ark<&3N31T0123$
<;:#1;#R$Deer =sland Recreation
Department<&;:#1;#R$
<&3N31T<=;T
3ere, the $N$FTL!0T can contain either the $N$FT element or the 015F058 element, but this
either or group can be used many times.
DTD definition 4sed in document
<3N31T<=;T I3N31T K
;:#1;#RJM$
<3N31T<=;T$
<3N31T$6alsa Wood :lane Days<&3N31T0123$
<;:#1;#R$Flying Toys<&;:#1;#R$
<;:#1;#R$Deer =sland Recreation
Department<&;:#1;#R$
<&3N31T<=;T
Hints -or 2lement Aames
0elect names that are both easy to remember and easy to type.
Iive your tags should have some inherent meaning. ,or e-ample, if you want to identify +last
name+ as an element, consider naming the element something like +last%name+ or +surname.+
4se names that are consistent with current processes. !f people call +social security number+
00F, create an element called 00F. Don(t create an unfamiliar +socsecnum+ element.
29
Be consistent in your use of names. !t is easier to apply one set of general rules to &E different
tags than it is to remember eight discrete tags that follow no particular pattern.
Attribute Lists
$lements can have attributes, which describe the element in more detail. 'hen you create an
element in your DTD, you can also an create an attribute list for the element.
2ttribute lists define the name, data type, and default value ?if any@ of each attribute associated
with an element.
!n this very simple e-ample, we(re adding some attributes to the title element from our book list.
'e want to be able to specify the edition date and whether the book is paperback or hardcover.
7lick on any of the attribute list code to see what it does.
<700This defines a listing of books00$
<7D#8T9:3 books A
<73<3231T booklist Ititle, authorJ$
<73<3231T title IL:8DTJ$
<<ATT270T #i#le
edi#ion :()ATA= IR,LU7R,)
#./e :/"/erK-lo#K"rd= "/"/er">
<73<3231T author IL:8DTJ$
H
$
Here's ho% )ou'd use these attributes in an XML -ile" Fotice the use of the edition attributes in
each title tag. Fotice how one title tag also uses the type attribute to indicate that this book is a
hardcover title.
Attribute T)(es
2ttributes can have one of se!en di--erent t)(es of data, but the two most common are"
CDATA. 7haracter data. This allows the attribute value to be te-tual data. .ou use it like this"
<7TT<=;T edition date I8DTJ$
're-de-ined !alues. .ou can list a string of specific values that the attribute can have. The value
set is enclosed in parenthesis and each value is separated with a vertical bar, like this"
<7TT<=;T edition type IpaperKhardKclothJ$
De-ault Values
.ou can specify a default value for the attribute, or make the attribute reuired or optional. The
de-ault !alue has a strong effect on how the attribute is used and what values it might have if
you don(t use it in the XML tag.
5&2:@*&2D6 the attribute must have a value every time the element is listed. .ou specify that an
attribute is reuired like this"
<7TT<=;T edition date I8DTJ LR3P!=R3D$
5*M'L*2D6 the procesor ignores this attribute unless it used as part of the element. !t does not
assume any default value.
5F*X2Dvalue" an attribute is not reuired for the element, but if it occurs, it must have the
specified value. ,or e-ample, if the new attribute is used, it must have the value of +yes+"
<7TT<=;T edition new LF=Q3D "yes"$
VAL@2 defaultvalue provides a default value for that attribute. !f the attribute in not included in
the element, the processing program assumes that this is the attribute(s value. ,or e-ample, this
gives the type attribute a default value of +hard+"
<7TT<=;T edition type IpaperKclothKhardJ "hard"$
2ntities
30
2n entity is a short cut to a set of information.
'hen you use an entity, it +e-pands+ to its full meaning, but you need only type the shorter entity
name during data entry. .ou might think of an entity as being a bit like a macro %% it is a set of
information that can be used by calling one name.
XML defines two types of entities.
The $eneral entit) is one that )ou de-ine in a DTD and use in a doument. Ieneral entities
are easy to spot. They are defined with the entity declaration, ;H$FT!T., and when they are used
they begin with the ampersand and end with the semicolon, like this"
Centity0nameD
The (arameter entit) is one that )ou de-ine and use %ithin a DTD. The content of a
parameter entities may be either included in the DTD or stored in an e-ternal file. !n addition,
parameter entities must be parsed* they cannot be unparsed. That is, they must contain te-tual
data that is processed rather than a I!, or other non%te-tual data type.
!t too is defined with a entity declaration, but it is called with a percent sign, like this"
RinfoD
De-inin$ a #eneral 2ntit)
To define an entity"
#. 0tart the entity definition, with a less than sign, an e-clamation mark, and the phrase $FT!T.,
all in caps"
<731T=T9
&. Type the name of the entity. Type it using the capitali/ation that you will use when calling it
later on.
<731T=T9 copyright
6. !f you are defining the entity locally, type the value of the entity, surrounded by uotes, and
then close the entity definition with a greater than sign.
<731T=T9 copyright "8opyright E///, s The World ;pins 8orp. ll
rights reser.ed. :lease do not copy or use without authoriFation. For
authoriFation contact legalGworldspins.com."$
K. !f you are defining an entity in an e-ternal, ascii te-t file, put in a pointer to the e-ternal file,
then close the entity definition with a greater than sign.
<731T=T9 copyright ;9;T32
"httpB&&www.worldspins.com&legal&copyright.,ml"$
@sin$ a #eneral 2ntit)
.ou won(t be using a general entity in a DTD. .ou will only be defining it here. .ou will be using it
in an XML file, where it is called by tying an ampersand, the entity name, and a semi%colon,
Mentity%name*
De-inin$ a 'arameter 2ntit)
To declare a parameter entity"
#. Type the entity declaration"
<731T=T9
31
&. Type a space, followed by a percent sign. !t is important to remember the spaceH
<31T=T9 R
6. Type another space, followed by the name of the entity"
<731T=T9 R list
K. Type the value of the entity, surrounded by uotation marks"
<731T=T9 R list "name 8DT LR3P!=R3D gender Im K fJ "f" color Ired K
fawn K merle K blackJ"
9. $nd the declaration with an end tag symbol.
<731T=T9 R info "name 8DT LR3P!=R3D gender Im K fJ LR3P!=R3D color
Ired K fawn K merle K black KotherJ LR3P!=R3D"
5ne thing to notice about entities in a DTD is that when they are defined there is a space
between the percent sign and the entity name%%but when the entity is used there is no space
between the percent sign and the entity name.
@sin$ a 'arameter 2ntit)
!t is uite simple to use a parameter entity. 0imply enter the entity name, preceded by a percent
sign and followed by a semi%colon, like this"
<"#!1D I123J$
<7TT<=;T "#!1D RinfoD$
<W#R>=1? I123J$
<7TT<=;T W#R>=1? RinfoD$
<8#2:1=#1 I123J$
<7TT<=;T 8#2:1=#1 RinfoD$
'hen the DTD is processed, the entity will be e-panded. !n this e-ample, Qinfo* will be replaced
with a set of attribute data, which was defined in the info entity declaration.
2gain, remember that when a parameter entity is defined, there is a space between the percent
sign and the entity name%%but when the entity is used there is no space between the percent sign
and the entity name.
XML 'arsers
1arsing is the process of checking the synta- of your document and creating the +tree structure.+
!f you are using a validating parser, the process will also compare the XML file to its DTD.
/n-line 'arsers
There are a number of online parsers. To use these, you typically type in the 48! of your file and
tell the process to begin.
5nline validating parser, from the '67
The '67 offers an online parser. Type the 48L of the file into the form and the XML file is both
parsed and validated.
Nalidating 1arser from Brown 4niversity 0cholarly Technolgy Iroup
This is the most easily accessible and understandable presentation of the online parsers.
32
Do%nloadable 'arsers
There are many parsers that you can download and run on your local machine. Most of these
reuire you to have either a 'indows or 4F!X machine. They are written in a variety of
langauges* this is a cross section of some of the many which are available.
:ames 7lark(s e-pat parser
:ames 7lark is amost a brand in the 0IML=XML world. 3is rendition of an XML parser is
widely used.
:ava%based Nalidating XML 1arser
,rom !BM(s 2lpha'orks group, this parser claims to be #EEQ pure :ava.
Microsoft XML 1arser in 7RR
2 parser from Microsoft.
XML 1arser written in 1ython
This is a validating parser.
XML 1arser written in :ava0cript.
This parser is non%validating and checks XML synta- only.
0i8127, 0imple 8D, 1arser and 7ompiler
,rom the '67.
XML S)nta3
Tagging an XML document is, in many ways, similar to tagging an 3TML document. 3ere are
some of the most important guidelines to follow.
&ule 5+6 &emember the XML delaration
This declaration goes at the beginning of the file and alerts the browser or other processing tools
that this document contains XML tags. The declaration looks like this"
<-,ml .ersion+"'./" standalone+"yes&no" encoding+"!TF0("-$
.ou can leave out the encoding attribute and the processor will use the 4T,%B default.
&ule 5,6 Do %hat the DTD instruts
!f you are creating a valid XML file, one that is checked against a DTD, make sure you Anow
what tags are part of the DTD and use them appropriately in your document. 4nderstand what
each does and when to use it. Anow what the allowable values are for each. ,ollow those rules.
The XML document will validate against the specified DTD.
&ule 576 Wath )our a(itali8ation
XML is case%sensitive. ;1< is not the same as ;p<. Be consistent in how you define element
names. ,or e-ample, use 2LL 7210, or use !nitial caps, or use all lowercase. !t is very easy to
create mis%matching case errors.
2lso, make sure starting and ending tags use matching capitali/ation, too. !f you start a
paragraph with the ;1< tag, you must end it with the ;=1< tag, not a ;=p<.
&ule 596 :uote attribute !alues
!n 3TML there is some confusion over when to enclose attribute values in uotes. !n XML the rule
is simple" enclose all attribute values in uotes, like this"
<123 dob+"'45/"$6en %ohnson<&123$
33
&ule 5;6 Close all ta$s
!n XML you must close all tags. This means that paragraphs must have corresponding end
paragraph tags. 2nchor names must have corresponding anchor end tags. 2 strict interpretation
of 3TML says we should have been doing this all along, but in reality, most of us haven(t.
&ule 5<6 Close 2m(t) ta$s= too
!n 3TML, empty tags, such as <br> or <img>, do not close. !n XML, empty tags do close. .ou
can close them either by adding a separate close tag ?;=tagname<@ or by combining the open
and close tags into one tag. .ou create the open=close tag by adding a slash, =, to the end of the
tag, like this"
<br&$
23am(les
This table shows some 3TML common tags and how they would be treated in XML.
Tag Comment End-Tag
<P> Techncay, n HTML, you're supposed to cose
ths tag. In XML, t's essenta to cose t.
</P>
<ELEMENT> A Eements n XML must have a Start-tag
and an end-tag.
</ELEMENT>
<LI> Ths tag must be cosed n XML n order to
ensure a We-Formed XML document.
</LI>
<META
name="keywords"
content="XML, SGML,
HTML">
META tags are consdered empty eements n
XML, and they must cose.
<META
name="keywords"
content="XML, SGML,
HTML"/>
<BR> Break tags are consdered empty eements. <BR/>
<IMG src=
"coopctures.htm">
Ths s an empty eement tag. <IMG src=
"coopctures.htm"/>
2lement and Attribute &ules
The -irst table contains the basic guidelines for creating element rules in an XML DTD.
The seond contains attribute value types.
The third contains attribute default options.
2lement &ules6
S)mbol Meanin$ 23am(le
O17D2T2 7ontains parsed
character data, or
te-t.
<:#WIL:8DTJ$
The 15' element contains te-tual data.
34
O17D2T2,
element%
name
7ontains te-t and
another element.
O17D2T2 is always
listed first in a rule.
<:#WIL:8DTT, 123J$
The 15' element must contain both te-t and the F2M$
element.
,
?comma@
4se in this order
<:#W I123, R1>, ;3R=<J$
The 15' element must contain the F2M$ element,
followed by the 82FA element, followed by the 0$8!2L
element.
P
?bar@
4se either or
< :#WI123 K R1> K ;3R=<J$
The 15' element must contain either the F2M$
element, or the 82FA element, or the 0$8!2L element.
name
?by itself@
4se one time only
<:#W I123J$
The 15' element must contain the F2M$ element,
used e-actly one time.
name> 4se either once or
not at all
<:#WI123, R1>-, ;3R=<-J$
The 15' element must contain the F2M$ element used
e-actly oncee, followed by one or none 82FA elements,
and one or none 0$8!2L elements.
nameR 4se either once or
many times
<:#WI123O, R1>-, ;3R=<J$
The 15' element must contain at least one but maybe
more F2M$ elements, followed by one or none 82FA
elements, and e-actly one 0$8!2L elements.
nameS 4se once, use many
times, or don(t use it
at all.
<:#WI123M, R1>-, ;3R=<J$
The 15' element must contain at one, many, or none
F2M$ elements, followed by one or none 82FA
elements, and e-actly one 0$8!2L elements.
? @ !ndicated groups,
may be nested.
<:#WIL:8DT K 123JM$
The 15' element contains one more use uses of either
or both te-t and the F2M$ element.
<:#WII123M, R1>-, ;3R=<JM K 8#2231TJ$
The 15' element must contain many instances of the
group that contains one, many, or none F2M$ elements,
followed by one or none 82FA elements, and e-actly
one 0$8!2L elements. 58, it may contain one
75MM$FT element.
<:#WI123 K R1>JO$
The 15' element must contain a F2M$ or 82FA
element. The F2M$ or 82FA option may appear once or
35
may be repeated many times.
Attribute Values6
T)(e Meanin$ 23am(le
7D2T2 7haracter data, te-t.
<TT<=;T 8#2231T category
()ATA LR3P!=R3D$
The 75MM$FT element has an
attribute named category. This
attribute contains letters, numbers,
or punctuation symbols.
FMT5A$F Fame token, te-t with some restrictions.
The value contains number and letter.
3owever, it cannot begin with the letters
+-ml+ and the only symbols it can contain
are T, %, ., and "..
<TT<=;T 8#2231T category
*+TO5,* LR3P!=R3D$
The 75MM$FT element has an
attribute named category. This
attribute contains a name token.
?value%# P
value%& P
value%6@
value list
2 value list provides a set of acceptable
options for the attribute to contain. !n
general, you should always include +other+
as one of the options.
<TT<=;T 8#2231T category
:red K $reen K bl6e K
o#er= "other"$
The 75MM$FT element has an
attribute named category. The
category can be +red,+ +green,+
+blue,+ or +other.+ The default value
is +other.+
!D The keyword !D means that this attribute
has an !D value that idenifies this particular
element.
<TT<=;T 8#2231T category
7) L=2:<=3D$
The 75MM$FT element has an
attribute named category. The
category will contain an !D value. !D
and !D8$, work together to create
cross%references.
!D8$, The keyword !D8$, means that this
attribute has an !D reference value that
points to another instance(s !D value.
<TT<=;T 8#2231T category
7)R,1 L=2:<=3D$
The 75MM$FT element has an
attribute named category. The
category will contain an !D8$,
value. !D and !D8$, work together
to let you do cross%reference
elements.
$FT!T. The keyword $FT!T. means that this
attribute(s value is an entity. 2n entity is a
<TT<=;T 8#2231T category
,*T7T? L=2:<=3D$
36
value that has been defined elsewhere in
the DTD to have a particular meaning.
The 75MM$FT element has an
attribute named category. The
category will contain an entity name
rather than te-t.
F5T2T!5F The keyword F5T2T!5F means that this
attribute(s value is a notation. 2 notation is
a description of how information should be
processed. .ou could set up a notation
that allows only numbers to be used for the
value, for e-ample.
<TT<=;T 8#2231T category
*OTAT7O* L=2:<=3D$
The 75MM$FT element has an
attribute named category. The
category attribute will contain a
notation name.
Attribute De-ault /(tions6
T)(e Meanin$ 23am(le
O8$G4!8$D The attribute must always be
included when the element is
used.
<TT<=;T 8#2231T category 8DT
IR,LU7R,)$
The 75MM$FT element has an attribute named
category. This attribute contains letters, numbers,
or punctuation symbols. The attribute must always
be used with the element. !f you omit the attribute,
the parser will give you an error message.
O!M1L!$D The attribute is optional. !f
you see the keyword
L=2:<=3D, you know that
this attribute will be ignored
unless it is included in the
element tag. !t won(t take on
any default values.
<TT<=;T 8#2231T category 8DT
I7+827,)$
The 75MM$FT element has an attribute named
category. .ou may use the attribute or omit the
attribute, as the instance reuires.
O,!X$D The attribute is optional, but
if it is used, it must always
have a certain value. !f you
see the keyword O,!X$D,
you know that this attribute
will always have the
specified value when it is
entered.
<TT<=;T 8#2231T confirm I17A,)
".es"$
The 75MM$FT element has an attribute named
confirm. !f it is used, its value will be +yes.+ !f it is
not used, it will not have a value.
+value+ 2 value in uotes is the
default value of this attribute.
!f you don(t enter the
attribute in the element tag,
the processor will assume
the attribute has this default
value.
<TT<=;T 8#2231T category IredKgreenK
blueKotherJ "o#er"$
The 75MM$FT element has an attribute named
category. !f you don(t use the attribute in the
element tag, the attribute will automatically receive
the value +other.+
37
*nteration .et%een Com(onents
XML, 700, script, the D5M, and the browser work together to let you create interactive presentations of
your content. 7lick on each piece to learn what role it plays.
7opyright C #DDB%DD
DevX.com, !nc.
XML 'arsers
1arsing is the process of checking the synta- of your document and creating the +tree structure.+
!f you are using a validating parser, the process will also compare the XML file to its DTD.
/n-line 'arsers
There are a number of online parsers. To use these, you typically type in the 48! of your file and
tell the process to begin.
5nline validating parser, from the '67
The '67 offers an online parser. Type the 48L of the file into the form and the XML file is both
parsed and validated.
Nalidating 1arser from Brown 4niversity 0cholarly Technolgy Iroup
This is the most easily accessible and understandable presentation of the online parsers.
Do%nloadable 'arsers
There are many parsers that you can download and run on your local machine. Most of these
reuire you to have either a 'indows or 4F!X machine. They are written in a variety of
langauges* this is a cross section of some of the many which are available.
38
:ames 7lark(s e-pat parser
:ames 7lark is amost a brand in the 0IML=XML world. 3is rendition of an XML parser is
widely used.
:ava%based Nalidating XML 1arser
,rom !BM(s 2lpha'orks group, this parser claims to be #EEQ pure :ava.
Microsoft XML 1arser in 7RR
2 parser from Microsoft.
XML 1arser written in 1ython
This is a validating parser.
XML 1arser written in :ava0cript.
This parser is non%validating and checks XML synta- only.
0i8127, 0imple 8D, 1arser and 7ompiler
,rom the '67.
*ntrodution to .eha!iors
Behaviors are an enhancement to !nternet $-plorer 9 that allow designers to add scripting elements
without having to do the scripting needed to make them work. Behaviors are also a way in which scripters
can write a script once and turn it over to designers for use whenever needed.
0o what can behaviors do> By using XML we can link behaviors to any element in a 'eb page and
manipulate that element. 'e can, for e-ample, copy that element(s te-t into a pulluote area on the page.
'e could offer a way to magnify small type on a page. Many of the everyday things we do with scripting
can be transfered to behaviors and by combining them with XML we can have greatly enhanced 'eb
pages that will work down the browser foodchain with no ill effects.
2t the left you will find links to several behaviors created here at 1ro)ect 7ool. $ach link will take you to a
page that not only demonstrates the behavior but also shows you )ust how simple they are to implement.
'e(ve divided our behaviors into two categories"
-3 % 0pecial $ffects behaviors don(t add value neccessarily, but do add eye%catching special effects that can
make your page stand out if used appropriatly.
(ublishin$ % These behaviors can add value and utility to pages of te-t content. They make your pages much
more usable for the viewer or add new ways to get them involved in the te-t.
0o what are you waiting for> 7lick one of the links to the left and start e-ploring what you can do with
behaviors and XML.
7opyright C #DDB
2arthBuake>
This behavior falls into the realm of special effects. !t(s really not useful but it could help provoke mood on a
website. To see it in action )ust run your mouse over the headline.
'hile it would probably be easy to implement this in the document directly we(ve chosen to use it as a
behavior. 1art of the beauty of behaviors is that they allow a designer to take pre%written code and effects
and insert them into a webpage without having to be a programmer. By having effects like $arthuake
available as behaviors a designer can build of an astonishing repetoire of web display tools without
needing to learn :ava0cript.
$arthuake is set up via XML so you(ll need to create an appropriate namespace before you can use it. 'e(re doing it
as XML so that older browsers aren(t affected adversely. !t also let(s us define a brand new tag. The namespace is set
up in the ;html< tag on your webpage. 3ere(s the one we(re using on this page"
<htm xmns:fx>
The ne-t step is to define XML tag we(ll be using. This is done in the specific media type. !n this case the
behavior will apply to the screen so will place its 700 properties there and we associate it with our
namespace by prefi-ing the namespace to the declaration. 5ur declaration looks like this"
<stye>
<!--
@meda screen{
fx\:EARTHOUAKE { behavor:ur(earthquake.htc) }
}
-->
</stye>
39
2s you can see, the only part that is needed is the behavior property. !t must point to the behavior file,
earthuake.htc. .ou can download the earthuake.htc file here. 5nce you have it )ust make sure it(s on
your server and that the url is specified properly in your 700.
2ll that(s left then is to place the XML tags around the item you wish to trigger the earthuake behavior.
$arthuake will be triggered when someone runs their mouse over the item. The tagging is very simple and
looks like this"
<fx:EARTHOUAKE>Shake t, baby!</fx:EARTHOUAKE>
Fow you(ve got it, everything you need to know to create your own earthuakes. 0o...uh....0hake it, babyH
T)(e%riter .eha!ior
0ure it owes its heritage to movies and computer gaming, but a typewriter effect can be uite eye%catching if used
properly. 'e(d bet you(re reading this as it types. !t(s not for every 'eb site though, so use it sparingly.
This behavior can be set to type at whatever speed you need. The above e-ample types at a speed of one
character every #EE milliseconds.
Typewriter is set up via XML so you(ll need to create an appropriate namespace before you can use it.
'e(re doing it as XML so that older browsers aren(t affected adversely. !t also let(s us define a brand new
tag. The namespace is set up in the ;html< tag on your 'eb page. 3ere(s the one we(re using on this
page"
<htm xmns:fx>
The ne-t step is to define XML tag we(ll be using. This is done in the specific media type. !n this case the
behavior will apply to the screen so will place its 700 properties there and we associate it with our
namespace by prefi-ing the namespace to the declaration. 5ur declaration looks like this"
<stye>
<!--
@meda screen{
fx\:TYPEWRITER { behavor:ur(typewrter.htc);
heght: 4em;
font-famy: "ocr a extended", courer;
}
}
-->
</stye>
The most important part of that is the behavior property. !t(s the only part really needed and it must point to
the behavior file, typewriter.htc. .ou can download the typewriter.htc file here. 5nce you have it )ust make
sure it(s on your server and that the url is specified properly in your 700.
2ll that(s left then is to place the XML tags around the te-t you wish to have typed onto the page. That(s
simple too"
<fx:TYPEWRITER speed="120">Type ths text.</fx:TYPEWRITER>
Fotice that we(ve set the speed to #&E. !f you don(t set a speed the typing will appear with the default
setting of #EE.
That(s really about all you need to know to use it. Be aware that this behavior only runs once and only when the page
is first loaded. 0o if you use this, make sure it(s someplace that your users will be able to see it.
Fow start typingH
Footnote .eha!ior
!f you(ve ever seen a 'eb document with footnotes you know what a problem it is to read a relevant
footnote and then scroll back up the document to find where you had stopped reading. This behavior
changes that. !t will bring the footnotes to the user?#@ without the need for them to scroll away from their
place in the page.
Let(s face it too, footnotes can be ugly things tacked to the bottom of a page. By implementing a footnote
tag via a behavior and XML we can give a designer complete control over what the footnote is going to look
like when it appears for the user. $verything about the way the footnote looks can be ad)usted via 700.
0ince ,55TF5T$ is set up via XML so you(ll need to create an appropriate namespace before you can
use it. 'e(re doing it as XML so that older browsers aren(t affected adversely. !t also let(s us define a brand
new tag. The namespace is set up in the ;html< tag on your webpage. 3ere(s the one we(re using on this
page"
<htm xmns:pub>
40
The ne-t step is to define XML tag we(ll be using. This is done in the specific media type. !n this case the
behavior will apply to the screen so will place its 700 properties there and we associate it with our
namespace by prefi-ing the namespace to the declaration. 5ur declaration looks like this"
<stye>
<!--
@meda screen {
pub\:FOOTNOTE {behavor:ur(footnote.htc)}
.footstye {wdth: 250;
poston: absoute;
eft:-1000;
coor: back;
background-coor: #9999cc;
text-agn: |ustfy;
border-coor: #404040;
border-wdth: thn;
border-stye: sod;
paddng: 1em;
font-famy: ara;
font-sze: 10pt;
}
.coser {cursor: hand;
coor="#hh00";
text-agn: rght;
margn-top: 1em;
}
.fhte {cursor: hand;
coor: chocoate;
font-famy: "Ara";
text-decoraton: none;
}
}
-->
</stye>
2s you can see, ,55TF5T$ only needs the behavior property. !t must point to the behavior file,
footnote.htc. .ou can download the footnote.htc file here. 5nce you have it )ust make sure it(s on your
server and that the url is specified properly in your 700.
There are three 700 classes defined in the namespace as well. These are all used by the footnote
behavior. The first is footstyle. This defines how the footnote will look when the user calls it. !t should and
applied to the division holding the footnote and it(s important that it have at least three properties"
wdth sets the display width in pi-els of all footnotes.
eft property is used to hide the footnote until it is called.
postton: absoute frees the footnote so that it can be postitioned anywhere on the page.
The coser class describes how the word +close+ will look in the displayed footnote block. This word is
added to the bottom right corner of footnotes so that there is an option to remove them from the page
display.
Lastly, the class fhte describes how the footnote link will appear and adds a hand cursor for user
feedback.
.ou(ll need to create individual divisions for each footnote to be displayed. 3ere(s what one from this page
looks like"
<dv d=foot1 cass=footstye>
<a name="footnote1"></a>
(1) A user used to be someone who was heavy nto drugs.
Here a user smpy refers to the person usng a Web page.
In ths case, you.
</dv>
The id of the division is e-tremely important. !t is via this id that the behavior manipulates the footnote. The
name can be anything you like as long as it is uniue. .ou(ll be using it in the ,55TF5T$ tag to link the
action to the division. !n this case we used the id of foot1. This would be referenced in the ,55TF5T$ tag
as footName="foot1"?&@.
Let(s take a look now at how that last footnote was called"
<pub:FOOTNOTE footName="foot2">
41
<a href="#footnote2">(2)</a></pub:FOOTNOTE>
!t(s that simple. Fotice we(ve placed it around working 3TML which would scroll down to the footnote in
older browsers. The footnote behavior will erase that for !$9 and replace it with appropriate 3TML to call
our enhanced footnotes leaving )ust the te-t that is present within the tag.
.ou should note that footName is a reuired property. !f you forget to include it you won(t get an error
message. The enhanced footnote behavior will simply do nothing.
5k, so consider yourself armed, er, footed. .ou should now know everything you need to apply footnotes to
your pages
Ma$ni-) .eha!ior
!t(s become commonplace today to see websites that have lots of te-t crammed into a small area.
5ftentimes some of that te-t is in the tiniest possible font. ! can(t speak for everyone, but in the wee hours
of the morning it can be hard to read that te-t. 5ften !(ve wished for a way to magnify it without resi/ing the
fonts in my browser.
!t seems a natural that having an easy way to magnify )ust a portion a page would be ideal. By creating a
behavior for this and linking it to the page via XML it makes it possible for a magnify effect to be used
nearly anywhere yet have the pages still work seamlessly for older browsers.
0ee how easy it is to read magnified te-t by clicking the icon> 2fter you(ve opened this you can close it by
clicking the close icon on the bottom right.0ee how easy it is to read magnified te-t by clicking the icon> 2fter
you(ve opened this you can close it by clicking the close icon on the bottom right.
!f you look to your right you(ll see an area of small te-t. !f you are using !$9 beta &R you(ll also see an icon
of a magnifying glass. 5lder browsers won(t show this icon since it was inserted into the page via the
magnify behavior. !f you click the icon a te-t window will display a magnified version of the e-act te-t that is
contained in the block along with an icon that will allow you to close it. 2lso, if there is any 3TML formatting
in that te-t, such as a link, it will be applied to the magnified version as well.
This behavior was designed so that nearly all the control is in the hands of the designer. The only e-ception
being the names of the icons used to indicate magnify and close magnify. These must be set in the .3T7
file controlling this behavior. $verything else is done in the 'eb page itself using 700 and XML.
0ince M2IF!,. is set up via XML so you(ll need to create an appropriate namespace before you can use
it. 'e(re doing it as XML so that older browsers aren(t affected adversely. !t also let(s us define a brand new
tag. The namespace is set up in the ;html< tag on your webpage. 3ere(s the one we(re using on this page"
<htm xmns:pub>
The ne-t step is to define XML tag we(ll be using. This is done in the specific media type. !n this case the behavior will
apply to the screen so will place its 700 properties there and we associate it with our namespace by prefi-ing the
namespace to the declaration. 5ur declaration looks like this"
The ne-t step is to define the XML and the tag properties for M2IF!,.. !n doing this we also create a class
called +magstyle+ that defines what the magnified te-t will look like. This is done in the specific media type.
!n this case the behavior will apply to the screen so will place its 700 properties there and we associate it
with our namespace by prefi-ing the namespace to the declaration. 5ur declaration looks like this"
<stye>
<!--
@meda screen {
pub\:MAGNIFY {behavor:ur(magnfy.htc)}
.magstye {coor: back;
background-coor: godenrod;
border-coor: #back;
border-wdth: thn;
border-stye: sod;
paddng: 1em;
font-famy: ara;
font-sze: 16pt;
poston: absoute;
eft:-1000;
}
}
-->
</stye>
42
The most important part of that is the behavior property. !t must point to the behavior file, magnify.htc. .ou
can download the magnify.htc file here. 5nce you have it )ust make sure it(s on your server and that the url
is specified properly in your 700.
5ne thing to notice about the magstyle class is that it specifies a left position of %#EEE. This is so that the
3TML that the behavior creates will be hidden from the user by appearing far off to the left of the display
window. 'e(re doing this in part because of a small display glitch in the version of !$ used to create this
and also because it(s always been my prefered way to hide content. !t(s )ust as easy to specify a new
postion as it is to specify hidden=visible.
.ou(ll also need to download the two icons used by this behavior. 8ight click on each one and then
select +save picture+ to save magnify.gif and unmag.gif. This behavior looks for these icons in a
directory called images. .ou can change these icons to others by editing the magnify.htc file to point to
other images. .ou need one icon to represent the magnify option and one to indicate close magnify.
2ll that(s left then is to place the XML tags around the te-t for which you wish to offer a magnified view. !t(s this simple"
<pub:MAGNIFY newId="1" wdth="400" agn="eft">The text that
you wsh to be magnabe.</pub:MAGNIFY>
!(m sure you noticed the properties we are passing to the magnify behavior. The first one, newId, is
reuired. 'hile we could have added a comple- random identification generation routine to the behavior
we chose to keep it simple and simply ask the designer to assign a uniue name to each magnifiable
section. 2lways be sure to assign a value to newId. This is needed to link the icon to the newly generated
3TML of the magnified te-t.
The other two properties are optional, they don(t need to be specified. wdth specifies how wide the
magnified area should be on the display. !t defaults to 69E pi-els if no width is specified. By making this a
specifiable property the designer is given control of how the te-t will fit the screen with each magnified
area.
The other property, agn, specifies the alignment of the magnify icon. 5nly +left+ and +right+ are correct
values here. 2ny other value, or no specification at all, will cause left alignment to be used.
By now you should be ready to apply magnification to your own 'eb pages. !f you still feel a bit
uncomfortable trying this then view the source of this page to see how we(ve done it.
Fow go forth and magnifyH
'ullBuote .eha!ior
!f you(ve ever picked up a maga/ine then the odds are good that you(ve seen a pulluote. 2 pulluote is
where a bit of te-t from the body of an article or story is pulled from the te-t and highlighted in some way to
catch your eye. !t(s hoped that the uote will tease you enough to get you to read the story.
4p until now +...it(s been a pain to do pulluotes in a 'eb page.+it(s been a pain to do pulluotes in a 'eb
page. !t always reuired working them into the 3TML code and hand copying the te-t to be uoted. ,or
those reasons pulluotes have been a bit scarce on the 'eb.
By using a pulluote behavior it(s now possible for anyone to put a pulluote into a 'eb page without
having to do comple- layout tricks. !t(s as simple as putting a tag and some basic 700 into a 'eb page.
.ou setup 14LLG45T$ via XML so you(ll need to create an appropriate namespace before you can use it. 'e(re
doing it as XML so that older browsers aren(t affected adversely. !t also let(s us define a brand new tag. The
namespace is set up in the ;html< tag on your 'eb page. 3ere(s the one we(re using on this page"
<htm xmns:pub>
The ne-t step is to define XML tag we(ll be using. This is done in the specific media type. !n this case the
behavior will apply to the screen so will place its 700 properties there and we associate it with our
namespace by prefi-ing the namespace to the declaration. 5ur declaration looks like this"
<stye>
<!--
@meda screen {
pub\:PULLOUOTE {behavor:ur(puquote.htc)}
.pustye {wdth: 200;
coor:back;
text-agn: eft;
border-coor:#9966cc;
border-wdth:thn;
border-stye:sod;
border-rght: none;
border-eft: none;
paddng: 1em;
margn: 6pt;
font-famy: ara;
43
font-stye: tac;
font-sze: 14pt;
}
}
-->
</stye>
2s you can see, 14LLG45T$ only needs the behavor property. !t must point to the behavior file,
footnote.htc. .ou can download the pulluote.htc file here. 5nce you have it )ust make sure it(s on your
server and that the url is specified properly in your 700.
'e also define a class called pustye. This is the 700 description of how a pulluote will look when
rendered on a 'eb page. The behavior will apply this style to the pulluote that it creates. .ou have
complete control over the appearance by changing the properties and values in pustye.
2ll that(s left then is to place the XML tags around the te-t for which you wish to offer a magnified view.
3ere(s how we marked the pulloute near the top of this page"
<pub:PULLOUOTE agn="rght" ps="pre">t's aways
been a pan to do puquotes n a Web page.</pub:PULLOUOTE>
The agn property specifies whether the pulluote will align on the left or the right of the page. !ts valid
values are, surprisingly enough, eft or rght. !f you don(t specify an alignment then the pulluote will align
on the left.
The second property is ps. This is our abbreviation for ellipsis. +2n ellipsis is a series of three dots...+2n
ellipsis is a series of three dots that can be used at the beginning of a uote, at the end, or on both ends.
.ou can see an ellipsis in the pulluote to your left. The acceptable values for ps are pre, post, or both.
2ll other values will be ignored. This is an optional property but it is very useful if you are only uoting part
of a sentence.
,inally, )ust a few thoughts on proper use.
2 pulluote should contain te-t that will draw the reader in. !t should only contain a small amount of
relevant te-t and not several sentences. .ou(ll probably want to consider using it near the top of a page so
that it will be seen immediately by a prospective viewer. .ou also shouldn(t make the style too different from
the rest of the page. !t should fit in yet be immediately visible.
'ith those thoughts in mind, as well and your newfound knowledge of how to apply this behavior, it(s time
for you go to out there and pull one over on someone. 2 uote, that is.
44

You might also like