You are on page 1of 20

Java and XML

Introduction
 Any application that uses an XML document must parse its
information in order to access the contents of the document.
 There are parsers available that are designed to parse an
XML document and are accessible through an Application
Program Interface (API)
 The parser API consists of Java classes that provide easy
access to elements of an XML document
 An XML document can be created, read, and manipulated
using Java Application Program Interfaces that are especially
designed for use with an XML document.
Overview of JAXP
 JAXP : Java API for XML Processing
 Provides a common interface for creating and using the
standard SAX, DOM, and XSLT APIs in Java.
 All JAXP packages are included standard in JDK 1.4+.
The key packages are:
 javax.xml.parsers:The main JAXP APIs, which provide a
common interface for various SAX and DOM parsers.
 org.w3c.dom:Defines the Document class (a DOM), as well as
classes for all of the components of a DOM.
Contd..
org.xml.sax: Defines the basic SAX APIs.
javax.xml.transform: Defines the XSLT APIs that let
you transform XML into other forms.
JAXP XML Parsers
 javax.xml.parsers: It defines abstract classes
 DocumentBuilder (for DOM)
 SAXParser (for SAX)
 It also defines factory classes
 DocumentBuilderFactory
 SAXParserFactory.
 By default, these give you the “reference implementation” of
DocumentBuilder and SAXParser, but they are intended to be
vendor-neutral factory classes, so that you could swap in a
different implementation if you preferred.
SAX vs. DOM
SAX = Simple API for XML DOM = Document Object Model

 Java-specific  platform and language neutral


 interprets XML as a stream of (not Java-specific!)
events  interprets XML as a tree of
 doesn't build data model in nodes
memory  builds data model in memory
 serial access  enables random access to data
 very fast, lightweight  more CPU- and memory-
 good choice when intensive
a) no data model is needed,  good choice when data model
or has natural tree structure
b) natural structure for data
model is list, matrix, etc.
DOM Architecture
Document Structure
XML Input There’s a text node
Document between every pair of
<book> Element <book>
element nodes, even
<chapter> if the text is empty.
<!- -This is comment - -> Text ” “
XML comments
<chapNum>One</chapNum Element <chapter> appear in special
> comment nodes.
Text “ “
< chapTitle>Introduction to .
XML</ chapTitle> Comment “This is comment”
</chapter> Element <chapNum>
</book>
Text “One”
Element <chapTitle>
Text “Introduction”

Element attributes do not appear in tree—available through


Element object.
DOM Parser
 The DOM parser uses following classes to navigate and parse
elements of an XML document
 Node
 Document
 Element
Node class
 The root of inheritance structure for DOM objects is the
Node
 It helps find information about the node, such as its value,
name and type.
 String getNodeName( ) : chapNum
 String getNodeValue( ) :One
 short getNodeType( ) : CDATA_SELECTION_NODE i.e. the
node is CDATA section
 boolean hasChildNodes( ) : returns true or false depending on
whether this node has children
 Node getFirstChild( ) : returns first child of the node or null
 Node getLastChild( ) : returns last child of the node or null
 NodeList getChildNodes( ) : returns a NodeList of the children of
the current node.
Document
 Document class is derived from the Node class
 We can create and retrieve components of an XML
document using Document class
 There are two methods for accessing the children of a node
 Element getDocumentElement( ) : Returns the root element of
the Document
 NodeList getElementsByTagName(String name ) : Returns a
NodeList of all the nodes that match the given tag
Element
 Element class is also derived from Node class and inherits all
of its public methods
 The primary purpose of the Element class is to support the
manipulation of attributes.
 There are a number of methods for manipulating element
attributes:
 String getAttribute(String name) : returns the value of the given
attribute
 void removeAttribute(String name) : Removes the given attribute
from the element.
 void setAttribute(String name, String value) : Sets or replaces the
attribute name with the given value
NodeList
 It is used to process list of nodes
 It returns the nodes from the XML document in the order in
which they are actually specified in the document
 There are two methods in this class
 int getLength( ) : returns the count of nodes within the list
 Node item(int index) : returns the node at index
Example : DOM Parsing
import javax.xml.parsers.*;
// contains DocumentBuilder and DocumentBuilderFactory class
import org.w3c.dom.*;
// contains Node, NodeList, Document, Text, Element and Exception
classes
• Get an instance of DocumentBuilderFactory class. This class Defines
a factory API that enables applications to obtain a parser that
produces DOM object trees from XML documents.
DocumentBuilderFactory db=DocumentBuilderFactory.newInstance();
DocumentBuilder
• Defines the API to obtain DOM Document instances from an XML
document. An instance of this class can be obtained as:
DocumentBuilder d=db.newDocumentBuilder();
• Once an instance of this class is obtained, XML can be parsed from
a variety of input sources like InputStreams, Files, and URLs
• To parse the content of the given file as an XML document and return
a new Document object.
public Document parse(InputStream is)
String book1="book.xml";
Document doc=d.parse(new File(book1));
Contd..
 Get the root element
Element e=doc.getDocumentElement();
 Get a NodeList for chapter tag
NodeList book=e.getElementsByTagName("chapter");
 Loop through the node list and get the items from node list
for(int i=0;i<book.getLength();i++){
Element el = (Element) book.item(i);
System.out.println("Element : "+
el. getNodeName ());
Contd..
 Get each tag containing text
NodeList nl=el.getElementsByTagName("chapNum");
 Get the first child which is the information associated with the
tag
Text t1=(Text) nl.item(0).getFirstChild();
 Print the data associated with the text object
System.out.println("Value of chapNum: "+t1.getData());
NodeList nl1=el.getElementsByTagName("chapTitle");
Text t2=(Text) nl1.item(0).getFirstChild();
System.out.println("Value of chapTitle: "+t2.getData());
}
Book.java

import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class Book {
public static void main(String args[ ]){
try{
DocumentBuilderFactory db=DocumentBuilderFactory.newInstance();
DocumentBuilder d=db.newDocumentBuilder(); //parser
String book1="book.xml";
Document doc=d.parse(new File(book1));
Element e=doc.getDocumentElement(); //returns the root element
NodeList book=e.getElementsByTagName("chapter");
for(int i=0;i<book.getLength();i++){
Element el = (Element) book.item(i);
System.out.println("Element : "+el.getNodeName());
NodeList nl=el.getElementsByTagName("chapNum");
Text t1=(Text) nl.item(0).getFirstChild();
System.out.println("Value of chapNum: "+t1.getData());
NodeList nl1=el.getElementsByTagName("chapTitle");
Text t2=(Text) nl1.item(0).getFirstChild();
System.out.println("Value of chapTitle: "+t2.getData());
}
}catch(Exception e){System.out.println("Error Parsing "+ e.getMessage());}
}
}
Input
Output <book>
Element : chapter
<chapter>
Value of chapNum : One
<chapNum>One</chapNum>
Value of chapTitle : XML
<chapTitle>XML</chapTitle>
Element : chapter
Value of chapNum : Two </chapter>

Value of chapTitle :DOM <chapter>


<chapNum>Two</chapNum>
<chapTitle>DOM</chapTitle>
</chapter>
</book>

You might also like