You are on page 1of 4

XML Namespaces

Namespaces are a simple and straightforward way to distinguish names used in XML documents, no matter
where they come from. However, the concepts are a bit abstract, and this specification has been causing some
mental indigestion among those who read it. The best way to understand namespaces, as with many other
things on the Web, is by example.

So let's set up a scenario: suppose XML.com wanted to start publishing reviews of XML books. We'd want to
mark the info up with XML, of course, but we'd also like to use HTML to help beautify the display. Here's a
tiny sample of what we might do:

<h:html xmlns:xdc="http://www.xml.com/books"
xmlns:h="http://www.w3.org/HTML/1998/html4">
<h:head><h:title>Book Review</h:title></h:head>
<h:body>
<xdc:bookreview>
<xdc:title>XML: A Primer</xdc:title>
<h:table>
<h:tr align="center">
<h:td>Author</h:td><h:td>Price</h:td>
<h:td>Pages</h:td><h:td>Date</h:td></h:tr>
<h:tr align="left">
<h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
<h:td><xdc:price>31.98</xdc:price></h:td>
<h:td><xdc:pages>352</xdc:pages></h:td>
<h:td><xdc:date>1998/01</xdc:date></h:td>
</h:tr>
</h:table>
</xdc:bookreview>
</h:body>
</h:html>

In this example, the elements prefixed with xdc are associated with a namespace whose name is
http://www.xml.com/books, while those prefixed with h are associated with a namespace whose name is
http://www.w3.org/HTML/1998/html4.

The prefixes are linked to the full names using the attributes on the top element whose names begin. xmlns:.
The prefixes don't mean anything at all - they are just shorthand placeholders for the full names. Those full
names, you will have noticed, are URLs, i.e. Web addresses. We'll get back to why that is and what those are
the addresses of a bit further on.

Why Namespaces?

But first, an obvious question: why do we need these things? They are there to help computer software do its
job. For example, suppose you're a programmer working for XML.com and you want to write a program to
look up the books at Amazon.com and make sure the prices are correct. Such lookups are quite easy, once you
know the author and the title. The problem, of course, is that this document has XML.com's book-review tags
and HTML tags all mixed up together, and you need to be sure that you're finding the book titles, not the
HTML page titles.
The way you do this is to write your software to process the contents of <title> tags, but only when they're
in the http://www.xml.com/books namespace. This is safe, because programmers who are not working for
XML.com are not likely to be using that namespace.

XPath

Once you have data in XML format, you will want to be able to navigate and search its nodes. You don't
want to have to parse the whole XML document to find that there are two Employee nodes. That would
be terribly inefficient. You want to apply an XPath query, which then gives you all the matching nodes. To
find all Employee nodes, you would run the following XPath query:

//Employee

If you use the table above, this query will return two nodes, each representing an employee. This makes
it very easy to find matching nodes and walk through the result set.

Operator Description
/ (child operator) Refers to the root of the XML document when used at the beginning
of the XPath expression. The child operator is used to specify the next
child to select. The expression "/Employees/Employee", for, example
says, start at the root of the XML document, select the Employees
node and then select all the Employee child nodes within the
Employees node. This will return the two Employee nodes in the
sample XML document.
// (recursive descendant operator) The recursive descendant operator indicates to include all descendant
nodes in the search. Using the operator at the beginning of the XPath
expression means you start from the root of the XML document. The
expression "//LastName" starts at the root and finds any LastName
node. The expression "/Employees//LastName" selects the Employees
node and then, within that node, finds any LastName node. It yields
the same result, but searches in a different way.
* (wildcard operator) The wildcard operator finds any node. The expression "/*" finds any
node under the root, which in our case is Employees. The expression
"/Employees/*" means find any node under the Employees node,
which in our case results with the two Employee nodes. Now what is
the difference between the "/Employees" and "/Employees/*"
expression? The first expression returns the Employees node but the
second node finds any node under the Employees node, meaning it
returns the two Employee nodes. The expression "//*" means to
select any node including descendant nodes, so it will effectively list
every single node in the complete XML document.
. (current context operator) The current context operator refers to the current context. For
example, you have written some code that selected the Employees
node and then from there you run the expression "./Employee", which
means it starts out from the currently selected Employees node and
then selects the two Employee nodes. The expression "Employee"
would yield the same result because it also starts out from the
current context. Similar the expression ".//LastName" means start
from the current context, the Employees node, and find any
LastName node including any descendant nodes.
.. (parent operator) The parent operator refers to the parent. For example, the expression
"/Employees/Employee/.." returns the Employees node because you
navigate down to the Employee nodes and then tell it to return its
parent, which is the Employees node.
@ (attribute operator) The attribute operator refers to an attribute instead of an element.
The expression "/Employees//@ID" selects any ID attribute it finds
under the Employees node. Now, keep in mind that the XPath query
always returns the selected node. In the case of an attribute, the
node below it is its value. So, the expression really two returns
nodes, each with the value of each selected attribute. Furthermore,
you can use the wildcard operator with attributes, so
"/Employees//@*" means any attribute underneath the Employees
node.
[ ] (filter operator) You can apply a filter operator to filter the selected nodes. This works
with attributes and with elements. The expression
"/Employees/Employee[@ID=1]" returns any Employee node under
the Employees node that has an ID attribute with the value one. You
also can apply filters that just say that an attribute or element with
that name needs to be present. For example, the expression
"/Employees/Employee[WebAddress]" returns Employee nodes that
have a WebAddress node as child. The expression
"/Employees/Employee[FirstName='Klaus']" returns the Employee
node that has a FirstName node with the value Klaus.
text() function The "text()" function refers to the text of the selected node or
attribute. The expression "//Employee//text()" does not list all the
descendant nodes of all Employee nodes but rather the value for each
descendant node. The expression
"//Employee/FirstName[text()='Klaus']" lists all FirstName nodes
which have a value of Klaus.
[ ] (collection operator) When your expression returns more then one node with the same
name, you have a collection returned. The expression "//Employee"
returns two Employee nodes, which is nothing more than a collection
of Employee nodes. You can apply a collection operator and specify
which item from the collection you want to select. Keep in mind that
the index starts at one. The expression "//Employee[2]" returns the
second Employee node. The order of the selected nodes is the same
order as in the XML document. You can use the collection operator in
any blend, such as "//Employee[1]/LastName", which selects the first
Employee node and then from there the LastName node.
( ) (group operator) The collection operator can sometimes have some odd side effects.
Assume you have two Employee nodes and each has two Job nodes.
What does the expression "//Employee/Job[1]" return? It returns the
first Job node for each selected Employee node. But, using the group
operator allows you to apply explicit precedence to selections. The
expression "(//Employee/Job)[4]" first selects all Job nodes for all
Employee nodes and from that collection it returns the fourth node.
The group operator can only be applied to the top level expression;
for example, "//Employees/(Employee/FirstName)" is invalid.
comment() function Returns a comment node. The expression "//comment()" returns any
comment node in the XML. The expression "/Employees/comment()"
returns the comment nodes under the Employees node.
node() function XML documents consist of elements, attributes, and their values, each
being a node. So, in XPath expressions you can use a node() function
instead of a node name or the text() function. It is a generic way to
address a node. The expressions "//Employee/JobTitle/node()" and
"//Employee/JobTitle/text()" return the same result, the value of both
JobTitle nodes. But, "//Employee//node()" will not just return the
elements but also the values of each element, because both are
nodes.
| (union or set operator) Returns the union of one or more location paths. The expression
"//LastName | //FirstName" returns all the LastName and FirstName
nodes. It preserves the order of the elements as in the XML and does
not return any duplicates. The two location paths
"//Employee[@ID=1] | //Employee[FirstName='Klaus']" return the
same nodes but the union of these two returns just the one unique
node.

The table does not represent a complete list, but it lists the most basic operators and functions. As you
can see from the samples, this already enables you to build fairly complex XPath queries. Keep in mind
that the precedence of the operators is the group operator, followed by the filter operator, the child
operator, and recursive descendant operator followed by the rest.

You might also like