Professional Documents
Culture Documents
Properties methods
Primary objective of DOM is to identify
Interfaces and objects to be used to represent, access and manipulate
documents.
Semantics of these objects and interfaces including both attributes and
behavior
Collaboration and relationship among these objects and interfaces
DOM tree consists of many types of nodes.
Each of these nodes represent a particular component of the XML component.
Nodes in a DOM tree have structural relationships among them.
The topmost node is the root node.
<?xml version="1.0"?>
<Company>
<Employee category="Technical">
<FirstName>Tanmay</FirstName>
<LastName>Patil</LastName>
<ContactNo>1234567890</ContactNo>
</Employee>
<Employee category="Non-Technical">
<FirstName>Taniya</FirstName>
<LastName>Mishra</LastName>
<ContactNo>1234667898</ContactNo>
</Employee>
</Company>
Category Category
NODE Interface:
It is the primary datatype for the entire DOM. All the properties and methods of
the Node interface are inherited by other kind of node.
NODE Properties: some properties are read-only and some are read-write.
1. nodeType is read-only. It holds a positive integer that indicates the type
of the context node.
Node nodeT Constant defined meaning
ype
value
Element 1 ELEMENT_NODE element type node
Attr 2 ATTRIBUTE-NODE An attribute type
node
Text 3 TEXT_NODE A text node type
node
CDATASection 4 CDATA_SECTION_NODE A CDATASection
type node
EntityReference 5 ENTITY_REFERENCE_NOD entity reference
E type node
Entity 6 ENTITY_NODE entity type node
Processinginstuction 7 PROCESSING_INSTRUCTI Processinf
ON_NODE instruction type
node
Comment 8 COMMENT_NODE Comment type node
Document 9 DOCUMENT_NODE Documente type
node
DocumentType 10 DOCUMENT_TYPE_NODE DocumentType type
node
DocumentFragment 11 DOCUMENT_FRAGMENT DocumentFragment
_NODE type node
Notation 12 NOTATION_NODE Notation type node
Java provides getNodeType() method on the Node object to inspect this
property.
Text Node:
Properties:
1. isElementContentWhitespace returns Boolean value indicating
whether the text contains a whitespace character within element
content
2. wholeText returns the text of this node and the text of all other
logically-adjacent text nodes concatenated to it in document order.
Methods:
1. replaceWholeText replces the text of this node and text of all logically-
adjacent text nodes with the specified text.
2. splitText Splits this node into two nodes at the specified offset, and
returns the new node that contains the text after the offset
Attr Node:
Methods:
1. isId indicates whether the attribute is an ID attribute
2. name returns the name of the attribute
3. ownerElement element node to which this attribute is attached
4. schemaTypeInfo type information for this attribute
5. specified indicates whether a value for this attribute was specified
explicitly
6. value returns the value of the attribute. If value contains any entity
references , they are first substituted with their values.
XML Processor (or) XML Parser:
Xml file xml parser data used for application
XML parser reads the xml file into an object.
XML parser converts an xml document into an XML DOM object.
There are two approaches to parse an XML document.
1. DOM approach called as DOM Parser
All elements are accessed through the DOM tree i.e., is a tree based
API
Parses (reads) the entire xml document into a memory and can be
accessed using no of ways, including tree traversals as well as
random access.
XML DOM contain methods to traverse XML trees, access, insert and
delete nodes.
Before an xml document can be accessed and manipulated, it must
be loaded into an XML DOM object.
2. SAX approach called as SAX Parser
It was developed by XML-DEV users group and widely supported by
xml processors.
It uses “Event Processing” to process the XML document i.e., is a
event-based API
SAX parser scans the xml document from beginning to end.
Every time a syntactic structure of the document is recognized, the
processor signals an event to application by calling an event handler
for the particular structure that was found.
Syntactic structure includes opening tag, attributes, text, closing tag.
JAVA and DOM:
Java implements the W3C DOM specification as a separate package
org.w3c.dom .
Java provides interfaces and objects together with methods and properties
according to the DOM specification that can be used to navigate and manipulate
the DOM tree.
Following are the steps used while parsing a document using DOM Parser.
Creating a Document:
Import XML-related packages.
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
Create a DocumentBuilder
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
parser parses an xml document , check well-formedness and creates and returns
the Document object that represents the entire xml document.
If the content of the root element is text, then this node is a Text
type node
Text tn=(Text)root.getFirstChild();
System.out.println(e.getFirstChild.getNodeValue());
System.out.println(e.getAttribute(“id”));
for(int i=0;i<children.getLength();i++)
{
Node node=children.item(i);
System.out.println(node.getFirstChild().getNodeValue());
}
NamedNodeMap attributes=node.getAttributes();
for(int j=0;j<attributes.getLength();j++)
{
Node attribute=attributes.getLength();
String attName=attribute.getNodeName();
String attValue=attribute.getNodeValue();
System.out.println(attName+”= “+attValue);
}
String value=((Element)node).getAttribute(“attributename”);
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
TransformerFactory tFactory=TransformerFactory.newInstance();
Transformer transformer=tFactory.newTransformer();
transformer.transform(source,result);
Inserting a node
o To insert a node at any other position,
insertBefore(Node newNode , Node referenceNode)
refer to the program insertBook.java
Deleting a node
o To delete a node from the DOM tree,
Node removeChild(Node node)
Refer to the program removeBook.java
Cloning a node
o An exact copy of a node is created using cloneNode() method on
the Node interface.
cloneNode(boolean copy)
refer to the program cloneBook.java
Advantages
XML DOM is language and platform independent.
XML DOM is traversible - Information in XML DOM is organized in a hierarchy which allows
developer to navigate around the hierarchy looking for specific information.
XML DOM is modifiable - It is dynamic in nature providing developer a scope to add, edit,
move or remove nodes at any point on the tree.
Disadvantages
It consumes more memory (if the XML structure is large) as program written once remains
in memory all the time until and unless removed explicitly.
Due to the larger usage of memory its operational speed compared to SAX is slower
SAX Parser: Simple API for XML
SAX parser is an event-based parser for xml documents. Unlike a DOM parser, a SAX parser
creates no parse tree. SAX is a streaming interface for XML, which means that applications
using SAX receive event notifications about the XML document being processed an element,
and attribute, at a time in sequential order starting at the top of the document, and ending with
the closing of the ROOT element.
Reads an XML document from top to bottom, recognizing the tokens that make up a well-
formed XML document
Tokens are processed in the same order that they appear in the document
Reports the application program the nature of tokens that the parser has encountered as
they occur
The application program provides an "event" handler that must be registered with the
parser
As the tokens are identified, callback methods in the handler are invoked with the relevant
information
When to use?
You should use a SAX parser when:
You can process the XML document in a linear fashion from the top down
The document is not deeply nested
You are processing a very large XML document whose DOM tree would consume too
much memory. Typical DOM implementations use ten bytes of memory to represent one
byte of XML
The problem to be solved involves only part of the XML document
Data is available as soon as it is seen by the parser, so SAX works well for an XML
document that arrives over a stream
Disadvantages of SAX
We have no random access to an XML document since it is processed in a forward-only
manner
If you need to keep track of data the parser has seen or change the order of items, you
must write the code and store the data on your own
ContentHandler Interface
This interface specifies the callback methods that the SAX parser uses to notify an application
program of the components of the XML document that it has seen.
void characters(char[] ch, int start, int length)- Called when character data is
encountered.
void ignorableWhitespace(char[] ch, int start, int length) - Called when a DTD is present
and ignorable whitespace is encountered.
Attributes Interface
This interface specifies methods for processing the attributes connected to an element.
int getLength - Returns number of attributes.
Following are the steps used while parsing a document using SAX Parser.
Creating an instance to SAX parser:
Import XML-related packages.
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.*;
import java.io.*;
Create a SAXParser
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();