You are on page 1of 21

DOCUMENT OBJECT MODEL (DOM)

DOM is a language neutral and platform independent object model used to


represent XML documents.
It helps scripts and programs to access, add, delete and edit content, structure
and style of XML documents dynamically.
It is standardized by the W3C.
The primary objective of this standard was to model an HTML document in an
object-oriented way so that it could be exposed to scripts and scripts could
access and manipulate HTML documents through this object model dynamically.
To provide a precise specification of the DOM interfaces independent of
languages, W3C chose to use Interface Definition Language(IDL) – defined by
Object Management Group (OMG) in the CORBA2.2 specification.
OMG IDL is widely used to specify interfaces in a language-independent and
implementation-neutral way.
The Document Object Model (DOM) is an application programming interface
(API) for HTML and XML documents. It defines the logical structure of
documents and the way a document is accessed and manipulated.
W3C DOM specification is divided into 3 major parts:
1. Core DOM defines the basic set of interfaces and objects for any
structured documents
2. HTML DOMdefines the interfaces and objects for HTML documents
3. XML DOM specifies the standard set of objects and interfaces for XML
documents only
DOM Model:
DOM models a document as a hierarchical (tree) structure consisting of different
kinds of nodes such as nodes with children and leaf nodes.
Each of these nodes represents a specific portion of the document.
DOM is an object-oriented model that encompasses not only the document
structure but also the behavior of the document .
I.e., each node of the document is not a datastructure, it is an object which has
identity and activity.

Properties methods
Primary objective of DOM is to identify
 Interfaces and objects to be used to represent, access and manipulate
documents.
 Semantics of these objects and interfaces including both attributes and
behavior
 Collaboration and relationship among these objects and interfaces
DOM tree consists of many types of nodes.
Each of these nodes represent a particular component of the XML component.
Nodes in a DOM tree have structural relationships among them.
The topmost node is the root node.
<?xml version="1.0"?>
<Company>
<Employee category="Technical">
<FirstName>Tanmay</FirstName>
<LastName>Patil</LastName>
<ContactNo>1234567890</ContactNo>
</Employee>
<Employee category="Non-Technical">
<FirstName>Taniya</FirstName>
<LastName>Mishra</LastName>
<ContactNo>1234667898</ContactNo>
</Employee>
</Company>

Category Category

Every DOM tree has exactly one root node


Except root node, all other nodes have exactly one parent node
Except leaf nodes, may have any number of children
Nodes having common parent are called siblings.
The fundamental datatype in the DOM is the Node interface.
All kinds of nodes implement Node interface and inherit all the properties and
methods of the Node interface.
A specific set of methods and properties are available on each node type.
DOM NODES:The most common types of nodes in XML are:
Document Node: It represents an entire XML document structure. Only one
Document node exists for each XML document. It is a container of all
components of an xml document such as xml declaration, elements, attributes,
comments, entities and so on.
Element Node: It represents an element in the XML document.This is also the
only type of node that can have attributes i.e., elements may have attributes
and can be obtained from the attributes property inherited from Node interface
Attr Node: It represents an attribute of an Element node. They contain
information about an element node, but are not actually considered to be
children of the element.
Text Node: It represents the textual content of an Element type node. The
document texts are considered as text node. It can consist of more information
or just white space.
DocumentFragment Node: It repesents the root of any sub-tree in the
document structure.
Document object can be heavy weight as a large number of methods and
properties have been defined for it.
DocumentFragment is a “lightweight” or “minimal” Document object that
represents a portion of a document.
DocumentFragment behaves like a context-free container of zero or more DOM
nodes.
If a DocumentFragment node is inserted or appended to a DOM tree, the
DocumentFragment object itself disappears and its content is inserted or
appended to the context position.
DocumentType Node: It provides interfaces to get information about the
document, including the list of entities defined for this document.
All the properties on this node are read-only.
EntityReference Node: A node of this type represents an entity reference in the
document.
ProcessingInstruction Node:This interface represents a “processing instruction”
which is used in XML to provide specific information about the document to the
processor.
Comment Node:It represents a comment in an XMl document.
<!- -comment content - - >
CDATASection Node: It represents a CDATA section in the XML document.
Entity Node: It represents either unparsed or parsed entity.
W3C DOM level 3 does not allow the editing of Entity nodes.If users want to
change the content of an Entity node, the desired changes have to be made to
the related EntityReference node.

NODE Interface:
It is the primary datatype for the entire DOM. All the properties and methods of
the Node interface are inherited by other kind of node.
NODE Properties: some properties are read-only and some are read-write.
1. nodeType  is read-only. It holds a positive integer that indicates the type
of the context node.
Node nodeT Constant defined meaning
ype
value
Element 1 ELEMENT_NODE element type node
Attr 2 ATTRIBUTE-NODE An attribute type
node
Text 3 TEXT_NODE A text node type
node
CDATASection 4 CDATA_SECTION_NODE A CDATASection
type node
EntityReference 5 ENTITY_REFERENCE_NOD entity reference
E type node
Entity 6 ENTITY_NODE entity type node
Processinginstuction 7 PROCESSING_INSTRUCTI Processinf
ON_NODE instruction type
node
Comment 8 COMMENT_NODE Comment type node
Document 9 DOCUMENT_NODE Documente type
node
DocumentType 10 DOCUMENT_TYPE_NODE DocumentType type
node
DocumentFragment 11 DOCUMENT_FRAGMENT DocumentFragment
_NODE type node
Notation 12 NOTATION_NODE Notation type node
Java provides getNodeType() method on the Node object to inspect this
property.

2. nodeName  is read-only, holds the name of a node


Node Value of nodeName
Element Name of the element
Attr Name of the attribute
Text “#text”
CDATASection “#cdata-section”
EntityReference Name of the entity referenced
Entity Name of the entity
Processinginstuction Target of the PI
Comment “#comment”
Document “#document”
DocumentType Name of the DTD
DocumentFragment “#document-fragment
Notation Name of the notation

Java Node interface provides the getNodeName() method to inspect this


property.
3. nodeValue  is read-write, holds the value of the node
Node Value of nodeValue
Element null
Attr Value of the attribute
Text Content of the text node
CDATASection Content of the CDATA section
EntityReference null
Entity null
Processinginstuction Content of the PI
Comment Content of the comment
Document Null
DocumentType Null
DocumentFragment Null
Notation Null
Java Node interface provides the getNodeValue() method to inspect this
property.

4. childNodes  is read-only, contains all child nodes of the context node. It


is valid only on element type nodes
5. firstChild  refers to the first child of the context node. Its value is null if
there is no such node.
6. lastChild  refers to the last child of the context node. Its value is null if no
such node exists.
7. nextSibling  returns the node immediately following this node. It returns
null if no such node exists.
8. previousSibling  returns the node immediately preceding this node. It
returns null if no such node exists.
9. attributes  is an unordered collection containing all attributes specified
for the context node or null otherwise. Individual nodes may be accessed
by name.
10. parentNode  returns the parent node of the context node. Document,
DocumentFragment, Attr, Notation, and Entity do not have a parent and
value of this property is null.
Document Node:
Properties:
1. documentElement  this property of the Document node refers to the
root node(document element) of the document.
Example: Element root=nodeobject.documentElement();
2. docType  represents the Document Type Declaration of this
document. Its value is null if there is no DTD.
3. documentURI represents the location of the document. Null if
undefined or document is created dynamically.
4. domConfig  represents document configuration and maintains a
table of recognized parameters.
5. inputEncoding  returns a string indicating the encoding scheme that
was used during the parsing of this document.
6. staticErrorChecking indicates whether error checking was enabled or
not
7. xmlEncoding  the encoding specified in the XML declaration of XML
document
8. xmlStandalone  indicates whether the document can exist
independently or requires other resources
9. xmlVersion  the version specified in the XML declaration of XML
document
Methods: Document interface provides methods to create objects. Each
node has the attribute ownerDocument, which refers to the context
document within which it was created.
1. createAttribute  creates and returns the Attr type node with the
name specified.
setAttribute method available on Element type node to set an attribute
to an element.
2. createAttributeNS  creates and returns the Attr type node with the
name and namespace URI specified.
setAttributeNs method available on Element type node to set an
attribute to an element.
3. createCDATASection  creates and returns a CDATASection type node
with the specified contained string.
4. createComment  creates and returns a Comment type node with the
specified comment string.
5. createDocumentFragment  creates and returns an empty
DocumentFragment node.
6. createElement  creates and returns an Element type node with the
specified element name.
7. createElementNS  creates and returns an Element type node with the
specified element name and namespace URI.
8. createEntityReference  creates an EntityReference node with the
specified node.
9. createProcessingInstruction  creates a ProcessingInstruction node
with the specified name and data string.
10. createTextNode  creates and returns a Text type node with the
specified text content.
11. getElementById  returns the Element node having specified id
attribute
12. getElementsByTagName  returns a list of Element nodes with the
specified element name.
13. getElementsByTagNameNS  returns a list of Element nodes with
the specified element name and namespace URI.
14. importNode it imports a node from another document to this
document. Document and DocumentType cannot be imported.
15. renameNode  renames an existing Element or Attr node. It takes 3
arguments:
original name, name namespace URI, a new qualified name

Element Node:provides properties and methods used to get the information


about the elements of XML document.
Properties:
tagName  read-only property and is valid only for element type nodes. Its
value is the tag name of the element.
Methods:
1. getAttribute  returns the value of attribute with specified attribute
name
2. getAttributeNS returns the value of attribute with specified attribute
name and namespace URI
3. getAttributeNode returns the attribute node with specified attribute
name
4. getAttributeNodeNS returns the attribute node with specified
attribute name and namespace URI
5. getElementsByTagName  returns a list of all descendent elements
with a specified tag name
6. getElementsByTagNameNs returns a list of all descendent elements
with a specified tag name and namespace URI
7. has Attribute returns a Boolean value that specifies whether the
context element has any attribute with the given attribute name
8. hasAttributeNS returns a Boolean value that specifies whether the
context element has any attribute with the given attribute name and
namespace URI
9. removeAttribute  removes the attribute with the specified name
10. removeAttributeNS removes the attribute with the specified name
and namespace URI
11. removeAttributeNode removes the specified attribute node
12. setAttribute
13. setAttributeNS
14. setAttributeNode
15. setIdAttribute
16. setIdAttributeNS
17. setAttributeNode

Text Node:
Properties:
1. isElementContentWhitespace  returns Boolean value indicating
whether the text contains a whitespace character within element
content
2. wholeText  returns the text of this node and the text of all other
logically-adjacent text nodes concatenated to it in document order.
Methods:
1. replaceWholeText  replces the text of this node and text of all logically-
adjacent text nodes with the specified text.
2. splitText  Splits this node into two nodes at the specified offset, and
returns the new node that contains the text after the offset
Attr Node:
Methods:
1. isId  indicates whether the attribute is an ID attribute
2. name  returns the name of the attribute
3. ownerElement  element node to which this attribute is attached
4. schemaTypeInfo  type information for this attribute
5. specified  indicates whether a value for this attribute was specified
explicitly
6. value  returns the value of the attribute. If value contains any entity
references , they are first substituted with their values.
XML Processor (or) XML Parser:
Xml file  xml parser  data used for application
XML parser reads the xml file into an object.
XML parser converts an xml document into an XML DOM object.
There are two approaches to parse an XML document.
1. DOM approach called as DOM Parser
 All elements are accessed through the DOM tree i.e., is a tree based
API
 Parses (reads) the entire xml document into a memory and can be
accessed using no of ways, including tree traversals as well as
random access.
 XML DOM contain methods to traverse XML trees, access, insert and
delete nodes.
 Before an xml document can be accessed and manipulated, it must
be loaded into an XML DOM object.
2. SAX approach called as SAX Parser
 It was developed by XML-DEV users group and widely supported by
xml processors.
 It uses “Event Processing” to process the XML document i.e., is a
event-based API
 SAX parser scans the xml document from beginning to end.
 Every time a syntactic structure of the document is recognized, the
processor signals an event to application by calling an event handler
for the particular structure that was found.
 Syntactic structure includes opening tag, attributes, text, closing tag.
JAVA and DOM:
Java implements the W3C DOM specification as a separate package
org.w3c.dom .
Java provides interfaces and objects together with methods and properties
according to the DOM specification that can be used to navigate and manipulate
the DOM tree.
Following are the steps used while parsing a document using DOM Parser.
Creating a Document:
 Import XML-related packages.
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;

 Create a DocumentBuilder
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
parser parses an xml document , check well-formedness and creates and returns
the Document object that represents the entire xml document.

 Create a Document from a file or stream by using the following overloaded


methods
Document parse(InputStream in)
Document parse(InputStream in,String base)
Document parse(String uri)
Document parse(File xmlFile)
Document doc=parser.parse(“books.xml”);

Navigating DOM Tree:


 Extract the root element i.e, start from the root node and use structural
relationships to reach other
Element root=doc.getDocumentElement();
root.getNodeName();  to get the name of the root node
root.getNodeValue();  to get the value of the root node. It returns null
to an element
 Get and Examine elements i.e., text, or sub-elements child elements
o Using root node methods to get first child, last child and all child
nodes
Node n=root.getFirstChild();

If the content of the root element is text, then this node is a Text
type node
Text tn=(Text)root.getFirstChild();

Similarly for last child.

To get all child elements or child nodes:

NodeList children=root.getChildNodes();  get all child nodes of root


for(int i=0;i<children.getLength();i++)
{
Node node=children.item(i);
if(node.getNodeType()==Node.ELEMENT_NODE)
{
System.out.println(node.getFirstChild().getNodeValue());
}
}

o getElementById() to access a particular node


Element e=doc.getElementById(“b1”);

System.out.println(e.getFirstChild.getNodeValue());

System.out.println(e.getAttribute(“id”));

o getElementsByTagName() to access all element nodes with a


common tag name specified.
NodeList children=doc.getElementsByName(“book”);

for(int i=0;i<children.getLength();i++)
{
Node node=children.item(i);
System.out.println(node.getFirstChild().getNodeValue());
}

 Get attributes of an element


NameNodeMap getAttributes()

NamedNodeMap attributes=node.getAttributes();
for(int j=0;j<attributes.getLength();j++)
{
Node attribute=attributes.getLength();
String attName=attribute.getNodeName();
String attValue=attribute.getNodeValue();
System.out.println(attName+”= “+attValue);
}

Note: Java provides getAttribute() to achieve the above

String value=((Element)node).getAttribute(“attributename”);

Viewing DOM Tree:


A DoM tree may be transformed back to XML document which can be
displayed on the screen or stored in a file.
This helps to visualize and verify the DOM tree after adding and deleting nodes
 Create a Transformer object and a DOMSource object
 Transformer object purpose is to transform the specified XML document
to the specified stream
 To do this, import the following packages:

import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

TransformerFactory tFactory=TransformerFactory.newInstance();
Transformer transformer=tFactory.newTransformer();

DOMSource source=new DOMSource(root);


StreamResult result=new StreamResult(System.out);

transformer.transform(source,result);

refer readXML_viewDOM.java program, compile and execute in command


prompt
Manipulating DOM Tree:
 Creating a node
o Create a text node
createTextNode() on the Document object creates a Text type node

Text createText(String text)


Ex:
Text txt=doc.createTextNode(“Web Technologies”);
o Attach text node to an element
 Create an element node
createElement() on the Document object creates a Element node

Element createElement(String element_name)


Ex:
Element e=doc.createElement(“book”);

 Set an attribute name and value if necessary


 Add a text node to element
 Setting an attribute
 Set an attribute name and value if necessary
setAttribute(String attributeName,String attributeValue)
Ex:
e.setAttribute(“Category”,”computers”);
 Adding a node
appendChild() is used to add a node at the end of the list of child nodes.
Ex:

e.appendChild(txt);  adds a text node to an element

root.appendChild(e);  adds an element node to root element


refer to the program appendBook.java

 Inserting a node
o To insert a node at any other position,
insertBefore(Node newNode , Node referenceNode)
refer to the program insertBook.java
 Deleting a node
o To delete a node from the DOM tree,
Node removeChild(Node node)
Refer to the program removeBook.java
 Cloning a node
o An exact copy of a node is created using cloneNode() method on
the Node interface.
cloneNode(boolean copy)
refer to the program cloneBook.java

Advantages
 XML DOM is language and platform independent.

 XML DOM is traversible - Information in XML DOM is organized in a hierarchy which allows
developer to navigate around the hierarchy looking for specific information.

 XML DOM is modifiable - It is dynamic in nature providing developer a scope to add, edit,
move or remove nodes at any point on the tree.

Disadvantages
 It consumes more memory (if the XML structure is large) as program written once remains
in memory all the time until and unless removed explicitly.

 Due to the larger usage of memory its operational speed compared to SAX is slower
SAX Parser: Simple API for XML

SAX parser is an event-based parser for xml documents. Unlike a DOM parser, a SAX parser
creates no parse tree. SAX is a streaming interface for XML, which means that applications
using SAX receive event notifications about the XML document being processed an element,
and attribute, at a time in sequential order starting at the top of the document, and ending with
the closing of the ROOT element.

 Reads an XML document from top to bottom, recognizing the tokens that make up a well-
formed XML document
 Tokens are processed in the same order that they appear in the document
 Reports the application program the nature of tokens that the parser has encountered as
they occur
 The application program provides an "event" handler that must be registered with the
parser
 As the tokens are identified, callback methods in the handler are invoked with the relevant
information

When to use?
You should use a SAX parser when:

 You can process the XML document in a linear fashion from the top down
 The document is not deeply nested
 You are processing a very large XML document whose DOM tree would consume too
much memory. Typical DOM implementations use ten bytes of memory to represent one
byte of XML
 The problem to be solved involves only part of the XML document
 Data is available as soon as it is seen by the parser, so SAX works well for an XML
document that arrives over a stream

Disadvantages of SAX
 We have no random access to an XML document since it is processed in a forward-only
manner
 If you need to keep track of data the parser has seen or change the order of items, you
must write the code and store the data on your own

ContentHandler Interface
This interface specifies the callback methods that the SAX parser uses to notify an application
program of the components of the XML document that it has seen.

 void startDocument() - Called at the beginning of a document.

 void endDocument() - Called at the end of a document.


 void startElement(String uri, String localName, String qName, Attributes atts) - Called
at the beginning of an element.

 void endElement(String uri,String localName,String qName) - Called at the end of an


element.

 void characters(char[] ch, int start, int length)- Called when character data is
encountered.

 void ignorableWhitespace(char[] ch, int start, int length) - Called when a DTD is present
and ignorable whitespace is encountered.

 void processingInstruction(String target, String data) - Called when a processing


instruction is recognized.

 void setDocumentLocator(Locator locator) - Provides a Locator that can be used to


identify positions in the document.

 void skippedEntity(String name)- Called when an unresolved entity is encountered.

 void startPrefixMapping(String prefix, String uri) - Called when a new namespace


mapping is defined.

 void endPrefixMapping(String prefix) - Called when a namespace definition ends its


scope.

Attributes Interface

This interface specifies methods for processing the attributes connected to an element.
 int getLength - Returns number of attributes.

 String getQName(int index)

 String getValue(int index)

 String getValue(String qname)

Following are the steps used while parsing a document using SAX Parser.
Creating an instance to SAX parser:
 Import XML-related packages.
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.*;
import java.io.*;
 Create a SAXParser
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

 Define the handler class that extends DefaultHandler and create an


instance to handler class.

class UserHandler extends DefaultHandler


{
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes)
throws SAXException
{ …………
}
@Override
public void endElement(String uri, String localName, String qName) throws
SAXException
{ ………………….
}
@Override
public void characters(char ch[],int start, int length) throws SAXException
{ ………………
}
}

UserHandler userhandler = new UserHandler();


saxParser.parse(“test.xml”, userhandler);
refer to the programs SAXDemo.java and SAXQueryDemo.java

You might also like