You are on page 1of 33

XML Document Object Model

XML Parser
• All major browsers have a built-in XML parser to access and
manipulate XML.
• Before an XML document can be accessed, it must be loaded
into an XML DOM object.
• All modern browsers have a built-in XML parser that can
convert text into an XML DOM object.
• Javascript-an object-oriented computer programming
language commonly used to create interactive effects within
web browsers.
The Document Object Model
• a programming interface for HTML and XML documents.
It defines the way a document can be accessed and
manipulated. 
• Using a DOM, a programmer can create a document,
navigate its structure, and add, modify, or delete its
elements.
• The W3C DOM has been designed to be used with any
programming language.
• provides a standard programming interface that can be
used in a wide variety of environments and applications.
XML DOM and HTML DOM
• The XML DOM is a standard for how to
get, change, add, or delete XML elements.
It presents an XML document as a tree-
structure.
• The HTML DOM defines a standard way
for accessing and manipulating HTML
documents. It presents an HTML document
as a tree-structure. 
DOM specification levels
• DOM Level 1 and Level 2 specifications
are W3C Recommendations.
• Level 1 allows navigation around an HTML
or XML document, and manipulation of the
content in that document.
•  Level 2 extends Level 1 with a number of
features: XML Namespace support, filtered
views, ranges, events, etc.
DOM implementation and DOM
application
• A DOM implementation (also called a host
implementation) is that piece of software which takes
the parsed XML or HTML document and makes it
available for processing via the DOM interfaces. A
browser contains a hosting implementation
• A DOM application (also called a client application) is
that piece of software which takes the document made
available by the implementation, and does something
to it. A script which runs in a browser is an example
of an application.
What programming languages can I use
with the DOM?
• This will depend on what hosting implementation you
want to use it with. 
• A browser might implement a JavaScript or VBScript
interface, so you can use those scripting languages
within the page itself to manipulate the page or change
the CSS style sheet.
• A editor might implement a Scheme or Java interface so
you can write an executable in those languages that talks
to your editor to manipulate the page. 
• DOM is a set of interfaces; different companies will be
able to implement these interfaces in different ways. It is
unlikely that any one company will give you a choice of
C++ and Java and Scheme and Perl and Python and ...,
but interfaces in all these languages will be possible,
since the DOM itself is language-neutral.
JavaScript in IE 5.0
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");

VBScript
set xmlDoc = CreateObject("Microsoft.XMLDOM")

ASP
set xmlDoc = Server.CreateObject("Microsoft.XMLDOM")
W3C DOM specification

• Is divided into three parts


• DOM Core-defines the basic set of interfaces and
objects for any structured document.
• XML DOM-specifies the standard set of objects and
interfaces for XML documents only.
• HTML DOM- standard set of objects and interfaces for
HTML documents only.
Objective of DOM
Identify
• Interfaces and objects to be used to
represent, access and manipulate
documents.
• Semantics of these objects and interfaces
including both attributes and behavior
• Collaboration and relationships among
these objects and behavior.
Advantages of XML DOM
• XML DOM is language and platform independent.
• XML DOM is traversable - Information in XML DOM
is organized in a hierarchy which allows developer to
navigate around the hierarchy looking for specific
information.
• XML DOM is modifiable - It is dynamic in nature
providing the developer a scope to add, edit, move or
remove nodes at any point on the tree.
Disadvantages of XML DOM

• It consumes more memory (if the XML structure is


large) as program written once remains in memory all
the time until and unless removed explicitly.
• Due to the extensive usage of memory, its operational
speed, compared to SAX is slower.
Parser evaluates a XML document as a DOM structure by
traversing through each node
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>
Node Tree
• From the above flowchart, we can infer −
• Node object can have only one parent node object. This
occupies the position above all the nodes. Here it is bookstore.
• The parent node can have multiple nodes called the child nodes.
These child nodes can have additional nodes called
the attribute nodes. In the above example, we have two attribute
nodes category and lang. The attribute node is not actually a
child of the element node, but is still associated with it.
• These child nodes in turn can have multiple child nodes. The
text within the nodes is called the text node.
• The node objects at the same level are called as siblings.
• The DOM identifies −
– the objects to represent the interface and manipulate the
document.
– the relationship among the objects and interfaces.
DOM Nodes
• All components of an XML document are represented by
different kinds of nodes.
• Document
• DocumentFragment
• DocumentType
• EntityReference
• Element
• Attr
• ProcessingInstruction
• Comment
• Text
• CDATASection
• Entity
• Notation
Node Types
DOM Nodes
• Document
– Represents entire XML document
– Only one Document type node exists for one XML.
– Possible children: Element (maximum of one),
ProcessingInstruction, Comment, DocumentType (maximum
of one)
• Element
– Represents the element in the XML document
– Possible children: Element, Text, Comment,
ProcessingInstruction, CDATASection, EntityReference
• Attr
– Represents an attribute of Element node
– Possible children:Text, EntityReference
DOM Nodes
• Text
– Represents the textual content of an Element type node
– Possible children: No children
• CDATASection- represents CDATA section in the XML document. No
lexical check is done on the content of CDATA section.
• Possible children: No children
• DocumentFragment
– Document object can be heavyweight as a large number of
methods and properties have been defined for it.
– DocumentFragment is a lightweight or minimal Document
object that represents a portion of the document and is really
useful for the purpose mentioned.
– Possible children: Element, ProcessingInstruction, Comment,
Text, CDATASection, EntityReference
• DocumentType
– Provides interfaces to get information about the document,
including the list of entities defined for this document.

• ProcessingInstruction
• Represents a processing instruction, which is used in XML
to provide specific information about the document to the
processor.
• Possible children: No children
The Node Interface
• XML parser can be used to load an XML
document into the memory of your computer
• information can be retrieved and manipulated by
accessing the Document Object Model (DOM).
• The DOM represents a tree view of the XML
document
• The documentElement is the top-level of the tree
• This element has one or many childNodes that
represent the branches of the tree
• A Node Interface is used to read and write
(or access if you like) the individual
elements in the XML node tree
• The childNodes property of the
documentElement can be accesses with a
for/each construct to enumerate each
individual node
XML DOM Parser: language-neutral
programming model

• Supports JavaScript, VBScript, Perl, VB,


Java, C++ and more
• Supports W3C XML 1.0 and XML DOM
• Supports DTD and validation
Loading pure XML text into the parser
<html>
<body>
<p id="demo"></p>
<script >
var text, parser, xmlDoc;

text = "<bookstore><book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book></bookstore>";
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");

document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
Example explained
• xmlDoc = parser.parseFromString(text,"text/xml");

– xmlDoc - the XML DOM object created by the


parser.
• xmlDoc.getElementsByTagName("title")
[0].childNodes[0].nodeValue;
– getElementsByTagName("title")[0] - get the first
<title> element
– childNodes[0] - the first child of the <title> element
(the text node)
– nodeValue - the value of the node (the text itself)
XML DOM Properties
These are some typical DOM properties:
• x.nodeName - the name of x
• x.nodeValue - the value of x
• x.parentNode - the parent node of x
• x.childNodes - the child nodes of x
• x.attributes - the attributes nodes of x
• x.firstChild –the first child of x
• x.lastChild- the last child of x
• x.nextSibling- the next sibling of x
• x.previousSibling-the previous sibling of x
• Note: In the list above, x is a node object.
XML DOM Methods

• x.getElementsByTagName(name) - get all elements with


a specified tag name
• x.appendChild(node) - insert a child node to x
• x.removeChild(node) - remove a child node from x
• Note: In the list above, x is a node object.
Nodes structure
Usage of documentElement.childNodes and length
• <!DOCTYPE html>
<html>
<body>
<p id="demo"></p>
<script>
var x, i ,xmlDoc;
var txt = "";
var text = "<book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book>";
Usage of documentElement.childNodes and length
parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");

// documentElement always represents the root node


x = xmlDoc.documentElement.childNodes;
for (i = 0; i < x.length ;i++) {
    txt += x[i].nodeName + ":
" + x[i].childNodes[0].nodeValue + "<br>";
}
document.getElementById("demo").innerHTML = txt;
</script>
</body>
</html>

You might also like